Interview: Liran Zvibel of WekaIO

December 4, 2018, 11:50 pm

Joakim is the resident interviewer for the D Blog. He has also interviewed members of the D community for This Week in D and is responsible for the Android port of LDC .

WekaIO is a San Jose, CA based startup with engineering inTel Aviv, Israel, that has built Matrix, the world’s fastest file system , in D. Recently, they posted impressive numbers in the IO-500 Node Challenge . Liran Zvibel, the co-founder and CEO of WekaIO, has been a regular speaker at DConf, talking about their use of D at DConf 2015 , 2016 , and 2018 .

WekaIO is an expression of a design goal of D, that you can write your prototype quickly and easily in D, then continue working on the same codebase until it reaches production quality , as opposed to prototyping in a different high-level programming language. Liran took some time out of his busy schedule to answer some questions about WekaIO and their use of D.

Liran Zvibel at DConf 2018

Joakim: Tell us about the enterprise storage market that WekaIO is in. How do you use D in your product?

Liran: Any compute environment has a mix of CPU power, networking, and storage: this is uniform across many organization types and sizes, and whether the infrastructure is in-house (“on premises”) or in the public cloud (“The cloud is just somebody else’s computer,” as they say).

The storage component provides the current state and also the history needed by compute. While compute is stateless (how many times have you rebooted your computer to fix a problem?) and the network is ephemeral, storage must be able to keep its state consistently and coherently while providing enough performance for the compute to do its job (otherwise the running jobs are IO-bound, and nobody likes that).

There are three main types of centralized storage systems:

Block storage systems, that provide the abstraction of a local drive that sits remotely. Several systems may get access to their own “volume” on the centralized system. The AWS equivalent for such a system is Elastic Block Store. These volumes are usually not shared between more than one server, and the reason to use them is failure resiliency, reliability, and performance (also some advanced features such as taking a point in time, backup, integration into a VM environment, etc.). File systems, centralized storage that is also shared, and allows several
servers on the network to access the same data. This is the kind of system
that WekaIO provides. Traditionally, people have turned to block-based solutions if they needed high performance, then created a local file system based on that shared volume, but with WekaIO we show that we enable a shareable file system that is even faster than a local file system over block storage. Object storage solutions, these enable storing objects with reduced semantics (no ability to modify, data stored is only eventually consistent, etc) to enable cost savings, and generally don’t care about performance.

When I review storage systems here, I talk mainly about block-based and file system solutions as object storage is much simpler and is implemented

using different methods.

Requirements for storage systems are:

Reliability Performance (low-latency IOPS and throughput) Features

Traditionally, these systems have a “data path” that cares about reliability and performance, and then a “control path” or “management path” that takes care of higher-level features and making reliability work at a higher level.

For many storage systems, the data path is implemented in some system programming language, such as C or C++, and the management path is implemented in a higher-level language such as python, Java, Go, etc. Our previous company, XIV (that IBM acquired in 2007) used a private version of C that had polymorphism, generic programming, and integrated RPC, for code that was a mixture of XML, C, and weird header files.

At WekaIO, we use the D programming Language to implement both the data path and the control path, as we can use a single language to get both machine-level access with high performance when needed, and a higher-level language for the features.

Joakim: How did you end up choosing D? You mentioned at DConf that you

initially tried a combination of Python and C++.

Liran: The first part of the work we had done on the C++/Python combo was to work on an efficient in-process RPC mechanism between Python and C++, that would

allow us to refer to the same object from C++ and Python in an efficient way. That way the same object could have run in the C++ context when needing performance, and in Python when needing brevity. The idea was to start implementing the system in Python then convert pieces to C++ as needed, where inter-process communication (inter or intra servers) would be done in C++ only. At that time, we had a prototype of the system implemented in Python and we started working on the C++ part, and the RPC definition first.

When we discovered D, we realized we may be able to get a single language to handle both the high-level and performance requirements, and we started by running some pet projects to verify that this was indeed a working language. The next phase was implementing our RPC over D (which was much more elegant than the C++ version), and a tracing system that would allow us to debug (the tracing system was reviewed in my DConf 2015 talk).

The biggest limitation we had initially was the availability of an optimizing compiler, as the reference compiler, DMD, does not provide assembly that is equivalent to LLVM or even GCC when running on modern x86 processors. After DConf 2015, and with the help of David Nadlinger, we were able to get LDC (the D compiler with an LLVM backend) to compile our code and generate results that met our needs.

Joakim: Andrei Alexandrescu, one of the co-architects of the D language, blogged about visiting you in Tel Aviv : how your code is very compact and that you added new features without growing the codebase much. What key features of D do you use to accomplish this and the speed and other benefits of your storage system?

Liran: The strongest part of D is its generic programming. We use generic programming in two ways, that usually are contradictory but with D they actual

↧

LeetCode 第1题：Two Sum

December 5, 2018, 5:08 am

≫ Next: Recommender Systems using Deep Learning in PyTorch from scratch

≪ Previous: Interview: Liran Zvibel of WekaIO

题目

给出一个整数数组，找出其中的两个整数，使相加等于指定整数，并返回这两个整数的索引。

假设每组输入只有一种解法，并且不能重复使用同一个元素。

举例：

给出 nums = [2, 7, 11, 15], target = 9, 因为 nums[0] + nums[1] = 2 + 7 = 9, 所以返回 [0, 1]. 思路最简单也是最直接的想法就是两重循环直接遍历计算就能得到结果，时间复杂度为
LeetCode 第1题：Two Sum

。

但这肯定不是最优解。

题目中提到只会有一种解法，那就意味着不会有重复的整数，不然就会有多解。

既然没有重复的，那用一个哈希来存储数据就最合适了，整数为键，索引为值。

那我们要找两个整数的索引就很方便了，在遍历数组的时候，检查和目标的差值是否在哈希中，有的话就是答案了。

Go 实现 func twoSum(nums []int, target int) []int { h := make(map[int]int) for i, value := range nums { if wanted, ok := h[value]; ok { return []int{wanted, i} } else { h[target-value] = i } } return nil }

运行时间：4ms，超过100%的 golang 提交。

python3 重写 class Solution: def twoSum(self, nums, target): """ :type nums: List[int] :type target: int :rtype: List[int] """ h = {} for i, value in enumerate(nums): if value in h: return [h[value], i] else: h[target - value] = i return None

运行时间：52ms，超过47.04%的 python3 提交。

Python3 优化

从上面的数据上看效率有点低，应该是 enumerate 生成迭代器效率稍差，所以换用 range(len(...)) 的写法。

class Solution: def twoSum(self, nums, target): """ :type nums: List[int] :type target: int :rtype: List[int] """ h = {} for i in range(len(nums)): value = nums[i] if value in h: return [h[value], i] else: h[target - value] = i return None

运行时间：36ms，超过99.71%的 python3 提交。

↧

Recommender Systems using Deep Learning in PyTorch from scratch

December 5, 2018, 5:06 am

≫ Next: Access the Python Windows file

≪ Previous: LeetCode 第1题：Two Sum

Recommender Systems using Deep Learning in PyTorch from scratch

Photo by Susan Yin on Unsplash

Recommender systems (RS) have been around for a long time, and recent advances in deep learning have made them even more exciting. Matrix factorization algorithms have been the workhorse of RS. In this article, I would assume that you are vaguely familiar with collaborative filtering based methods and have basic knowledge about training a neural network in PyTorch.

In this post, my goal is to show you how to implement a RS in PyTorch from scratch. The theory and model presented in this article were made available in this paper . Here is the GitHub repository for this article.

Problem Definition

Given a past record of movies seen by a user, we will build a recommender system that helps the user discover movies of their interest.

Specifically, given <userID, itemID> occurrence pairs, we need to generate a ranked list of movies for each user.

We model the problem as a binary classification problem , where we learn a function to predict whether a particular user will like a particular movie or not.

Our model will learn thismapping Dataset

We use the MovieLens 100K dataset, which has 100,000 ratings from 1000 users on 1700 movies. The dataset can be downloaded from here .

The ratings are given to us in form of <userID,itemID, rating, timestamp> tuples. Each user has a minimum of 20 ratings.

Training

We drop the exact value of rating (1,2,3,4,5) and instead convert it to an implicit scenario i.e. any positive interaction is given value of 1. All other interactions are given a value of zero, by default.

Since we are training a classifier, we need both positive and negative samples. The records present in the dataset are counted as positive samples. We assume that all entries in the user-item interaction matrix are negative samples (a strong assumption, and easy to implement).

We randomly sample 4 items that are not interacted by the user, for every item interacted by the user. This way, if a user has 20 positive interactions, he will have 80 negative interactions. These negative interactions cannot contain any positive interaction by the user, though they may not be all unique due to random sampling.

Evaluation

We randomly sample 100 items that are not interacted by the user, ranking the test item among the 100 items. This same strategy is used in the paper, which is the inspiration for this post (referenced below). We truncate the ranked list at 10.

Since it is too time-consuming to rank all items for every user, for we will have to calculate 1000*1700 ~10 values. With this strategy, we need 1000*100 ~ 10 values, an order of magnitude less.

For each user, we use the latest rating(according to timestamp) in the test set, and we use the rest for training. This evaluation methodology is also known as leave-one-out strategy and is the same as used in the reference paper.

Metrics

We use Hit Ratio(HR), and Normalized Discounted Cumulative Gain(NDCG) to evaluate the performance for our RS.

Our model gives a confidence score between 0 and 1 for each item present in the test set for a given user. The items are sorted in decreasing order of their score, and top 10 items are given as recommendation. If the test item (which is only one for each user) is present in this list, HR is one for this user, else it is zero. The final HR is reported after averaging for all users. A similar calculation is done for NDCG.

While training, we will be minimizing the cross-entropy loss, which is the standard loss function for a classification problem. The real strength of RS lies in giving a ranked list of top-k items, which a user is most likely to interact. Think about why you mostly click on google search results only on the first page, and never go to other pages. Metrics like NDCG and HR help in capturing this phenomenon by indicating the quality of our ranked lists. Here is a good introduction on evaluating recommender systems .

Baseline: Item Popularity model

A baseline model is one we use to provide a first cut, easy, non-sophisticated solution to the problem. In much of use cases for recommender systems, recommending the same list of most popular items to all users gives a tough to beat baseline.

In the GitHub repository, you will also find the code for implementing item popularity model from scratch. Below are the results for the baseline model.

Deep Learning basedmodel

With all the fancy architecture and jargon of neural networks, we aim to beat this item popularity model.

Our next model is a deep multi-layer perceptron (MLP). The input to the model is userID and itemID, which is fed into an embedding layer. Thus, each user and item is given an embedding. There are multiple dense layers afterward, followed by a single neuron with a sigmoid activation. The exact model definition can be found in the file MLP.py .

The output of the sigmoid neuron can be interpreted as the probability the user is likely to interact with an item. It is interesting to observe that we end up training a classifier for the task of recommendation.

Figure 2: The architecture for Neural Collaborative Filtering

Our loss function is Binary Cross-entropy loss. We use Adam for gradient descent and L-2 norm for regularization.

Results

For the popularity based model, which takes less than 5 seconds to train, these are the scores:

HR = 0.4221 | NDCG = 0.2269

For the deep learning model, we obtain these results after nearly 30 epochs of training (~3 minutes on CPU):

HR = 0.6013 | NDCG = 0.3294

The results are exciting. There is a huge jump in metrics we care about. We observe a 30% reduction in error according to HR, which is huge. These numbers are obtained from a very coarse hyper-parameter tuning. It might still be possible to extract more juice by hyper-parameter optimization.

Conclusion

State of the art algorithms for matrix factorization, and much more, can be easily replicated using neural networks. For a non-neural perspective, read this excellent post about matrix factorization for recommender systems .

In this post, we saw how neural networks offer a straightforward way of building recommender systems. The trick is to think of recommendation problem as a classification prob

↧

Access the Python Windows file

December 5, 2018, 5:04 am

≫ Next: Micro:bit uPython: HTTP UART OBLOQ POST request to Flask server

≪ Previous: Recommender Systems using Deep Learning in PyTorch from scratch

I am new to python, and despite my searching, am unable how to properly access a file from a Python Script on windows with Python 3. I am trying to use Mongosm to import openstreetmap OSM data into mongodb, but get an error when trying to access the file. How can I fix this? Thank you. According to the github instructions, all I need to do is python insert_osm_data.py <OSM filename> ( instructions found here )

The error says:

C:\Users\Jusitn>python C:\Users\Jusitn\Desktop\mongosm-master\insert_osm_data G:\OSM\planet-140430.osm File "C:\Users\Jusitn\Desktop\mongosm-master\insert_osm_data.py", line 160 print 'node not found: '+ str(node) SyntaxError: invalid syntax

↧

Micro:bit uPython: HTTP UART OBLOQ POST request to Flask server

December 5, 2018, 4:02 am

≫ Next: Encoding Non-Printable Bytes in Python 3

≪ Previous: Access the Python Windows file

In this tutorial we are going to learn how to send a HTTP POST request using a micro:bit board and a UART OBLOQ .

Introduction

In this tutorial we are going to learn how to send a HTTP POST request using a micro:bit board and a UART OBLOQ . We will be using Micropython to program the micro:bit board.

Please visitthisprevious tutorial to check the connection diagram between the two devices. Also, please checkthis post for a detailed explanation on how to connect the UART OBLOQ to a WiFi network, using the micro:bit board to send the serial commands.

For this tutorial, we will setup a Python Flask server, which will receive our request and return back to the client a very simple message. For an introductory tutorial on Flask, please checkhere.

The Flask code

As usual, we will begin our Python code with the imports. We will need the Flask class, which we will use to configure our server, and the request object, which will be used to access the request body and headers.

from flask import Flask, request

After that, we will create an object of the Flask class we have just imported. We will make use of this object to configure the routes of our server and to start the application.

app = Flask(__name__)

Our server will have a single route, which will only listen to HTTP POST requests. We will call our route “ /post “.

@app.route('/post', methods = ["POST"]) def post():

Then, in the implementation of the route handling function, we will first print the body of the request, so we make sure the received content matches the one sent from the micro:bit.

print(request.data)

Additionally, we will also print the headers of the received request.

print(request.headers)

To finalize the code of our route handling function, we will return a string to the client indicating the request was received.

return 'Received'

To finalize the whole Python code, we need to call the run method on our Flask object, so the server starts listening to incoming requests.

This method receives the IP and the port where the server will be listening. We will use the IP “ 0.0.0.0 “, which indicates the server should listen in all the available interfaces. As port, we will use 8090 .

The final Flask code can be seen below and already includes the run method call.

from flask import Flask, request app = Flask(__name__) @app.route('/post', methods = ["POST"]) def post(): print(request.data) print(request.headers) return 'Received' app.run(host='0.0.0.0', port= 8090) The micro:bit code

We will start the MicroPython code with the imports. We will need the uart object and the sleep function, which are both available in the microbit module.

from microbit import uart, sleep

After that, and like we did onthis previous post, we will define a helper function that allows to read content from the serial port until a given character is received.

This will be a very simple helper function that will not have any timeout safeguard, which means that it will basically loop infinitely until the expected character is received. Naturally, this is just to keep the code simple and in a real application scenario you should implement some safeguard mechanisms to prevent your code from entering an undesired infinite loop.

This function will have as input a uart object and the termination character and it will return all the content read until that character.

def readUntil(uartObject, termination): result = '' while True: if uartObject.any(): byte = uartObject.read(1) result = result + chr(byte[0]) if chr(byte[0]) == termination: break sleep(100) return result

After defining this utility function, we will initialize the serial interface to use the micro:bit pins 0 and 1 for Tx and Rx , respectively.

uart.init(baudrate=9600, tx = pin0, rx = pin1)

Then, we will flush the garbage byte that is sent by the micro:bit when opening the serial connection. We do this by sending a command termination character (“ \r “) to the serial port. This will force the UART OBLOQ to complete the current command that it was building.

Since the command doesn’t exist, the UART OBLOQ should return back the |1|-1| value plus the “ \r ” termination character. So, we can flush it by using the readUntil function to read until the “ \r ” character is received.

Note : Flushing this byte is only needed if you are using an older version of MicroPython. In the newer versions, this bug is no longer present. For this tutorial I’m using version 1.7.0, which still suffers from this issue.

uart.write("\r") readUntil(uart, '\r')

Next we will connect the UART OBLOQ to a WiFi network, which is needed before we try to do the actual POST request. The procedure was covered in detail onthis previous tutorial.

The MicroPython command to use is the one shown below, where you should place the credentials of your WiFi network.

uart.write("|2|1|yourNetworkName,yourNetworkPassword|\r")

After we send this command, the UART OBLOQ should return |2|1| to acknowledge its reception. Then, it should periodically return |2|2| during the WiFi connection procedure. Please take in consideration that it may take sometime until the UART OBLOQ finishes to establish the connection.

After a successful connection, it will return a |2|3|ipAddress| response, where ipAddress corresponds to the IP assigned to the UART OBLOQ on the network.

Taking this in consideration, we can leverage our readUntil function to wait for the character ‘ 3 ‘. This should get all the content until the UART OBLOQ sends back to us the IP it got from connecting to the network.

Then, we can call the readUntil function again to get the rest of the answer, which includes the IP. In this case, since we don’t need the IP for any of the next function calls, we don’t need to store the returned value from the readUntil function.

readUntil(uart, '3') readUntil(uart, '\r')

After this, we can finally send the HTTP POST request. The UART OBLOQ command we should use is covered in greater detail onthis previous post. Basically, we should send the serial command in the following format:

|3|2|destinationURL,postBody|

In our case, since we are reaching a local Flask server, the destination URL takes the following format, where you should change #yourFlaskMachineIp# by the local IP of the computer that is running the server:

http://#yourFlaskMachineIp#:8090/post The full MicroPython command can be seen below. Note that we are simply sending a string with the value “

↧

Encoding Non-Printable Bytes in Python 3

December 5, 2018, 8:22 am

≫ Next: Writing better code with pytorch and einops

≪ Previous: Micro:bit uPython: HTTP UART OBLOQ POST request to Flask server

TL;DR bytes -> str: In []: b'\x90\x90\x90\x90'.decode('latin-1') Out[]: '\x90\x90\x90\x90' str -> bytes: In []: '\x90\x90\x90\x90'.encode('latin-1') Out[]: b'\x90\x90\x90\x90' But Why Tho

Sometimes when you're programming or you're playing CTFs , you'll encounter odd binary file formats. In python 2, it's no problem to use the rb or wb file modes to open those files and process them using standard string methods.

Then, everything changed when Python 3 came out.

In Python 3 it's very likely you will now be dealing with strings and bytes. Bytes are effectively raw computer "bytes" to Python. They are raw 1's and 0's with no context. Strings (which are made up of bytes) must now have encodings which contain their context. Essentially strings are bytes wrapped with an encoding function which dictates how they are to be viewed and processed.

However, strings have some capabilities which bytes do not share. A notable example is the lack of the .format() method in bytes.

In []: b'{}'.format('test') --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-72a53680a88c> in <module> ----> 1 b'{}'.format('test') AttributeError: 'bytes' object has no attribute 'format'

I generally just use Python 2 because I like it and it's not 2020 yet.

Don't follow my bad habits. Python 3 is the future.

Fast-forward to today where I need to use Python 3 to manipulate some binary data from a file and generate a newly processed file.

Things that won't work

Don't use str.encode('ascii', 'replace') because the end result you get will likely not be right despite seeming right.

For example:

In []: b'\x90\x90\x90\x90'.decode('ascii', 'replace') Out[]: '' # wow non-printable looks cool In []: test = b'\x90\x90\x90\x90'.decode('ascii', 'replace') In []: test Out[]: '' In []: test[0] Out[]: '' In []: ord(test[0]) Out[]: 65533 # o no Things that will work

If you are trying to round trip raw binary data that you ripped right out of a random file or read from a socket (effectively 0x00 - 0xff ) between strings and bytes you want to use the latin-1 encoding to get it done.

bytes -> str: In []: b'\x90\x90\x90\x90'.decode('latin-1') Out[]: '\x90\x90\x90\x90' str -> bytes: In []: '\x90\x90\x90\x90'.encode('latin-1') Out[]: b'\x90\x90\x90\x90'

This works and also took me way too long to figure out while Googling. This works because latin-1 a.k.a ISO 8859-1 encodes up to 255 for all 8 bits in a character. However, ascii is only 7 bits which means sad times once you go beyond 127.

Hopefully this saves you some time since it was quite a gotcha for me. :snake:３:hankey:

↧

Writing better code with pytorch and einops

December 5, 2018, 8:20 am

≫ Next: Scrapy - Scraping Different Web Pages into a Scrapy Script

≪ Previous: Encoding Non-Printable Bytes in Python 3

Writing better code with pytorch and einops Rewriting building blocks of deep learning

Below are some fragments of code taken from official tutorials and popular repositories (fragments taken for educational purposes, sometimes shortened). For each fragment an enhanced version proposed with comments.

In most examples, einops was used to make things less complicated. But you'll also find some common recommendations and practices to improve the code.

Left: as it was, Right : improved version

# start from importing some stuff import torch import torch.nn as nn import torch.nn.functional as F import numpy as np import math from einops import rearrange, reduce, asnumpy, parse_shape from einops.layers.torch import Rearrange, Reduce Simple ConvNet class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(10, 20, kernel_size=5) self.conv2_drop = nn.Dropout2d() self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) def forward(self, x): x = F.relu(F.max_pool2d(self.conv1(x), 2)) x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) x = x.view(-1, 320) x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) x = self.fc2(x) return F.log_softmax(x, dim=1) conv_net_old = Net()

conv_net_new = nn.Sequential( nn.Conv2d(1, 10, kernel_size=5), nn.MaxPool2d(kernel_size=2), nn.ReLU(), nn.Conv2d(10, 20, kernel_size=5), nn.MaxPool2d(kernel_size=2), nn.ReLU(), nn.Dropout2d(), Rearrange('b c h w -> b (c h w)'), nn.Linear(320, 50), nn.ReLU(), nn.Dropout(), nn.Linear(50, 10), nn.LogSoftmax(dim=1) )

Reasons to prefer new code:

in the original code if input size is changed and batch size is divisible by 16 (that's usualy so), we'll get something senseless after reshaping new code explicitly drops error in this case we won't forget to use dropout with flag self.training with new version code is straightforward to read and analyze sequential makes printing / saving / passing trivial. And there is no need in your code to load a model ... and we could also add inplace for ReLU Super-resolution class SuperResolutionNetOld(nn.Module): def __init__(self, upscale_factor): super(SuperResolutionNetOld, self).__init__() self.relu = nn.ReLU() self.conv1 = nn.Conv2d(1, 64, (5, 5), (1, 1), (2, 2)) self.conv2 = nn.Conv2d(64, 64, (3, 3), (1, 1), (1, 1)) self.conv3 = nn.Conv2d(64, 32, (3, 3), (1, 1), (1, 1)) self.conv4 = nn.Conv2d(32, upscale_factor ** 2, (3, 3), (1, 1), (1, 1)) self.pixel_shuffle = nn.PixelShuffle(upscale_factor) def forward(self, x): x = self.relu(self.conv1(x)) x = self.relu(self.conv2(x)) x = self.relu(self.conv3(x)) x = self.pixel_shuffle(self.conv4(x)) return x

def SuperResolutionNetNew(upscale_factor): return nn.Sequential( nn.Conv2d(1, 64, kernel_size=5, padding=2), nn.ReLU(inplace=True), nn.Conv2d(64, 64, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(64, 32, kernel_size=3, padding=1), nn.ReLU(inplace=True), nn.Conv2d(32, upscale_factor ** 2, kernel_size=3, padding=1), Rearrange('b (h2 w2) h w -> b (h h2) (w w2)', h2=upscale_factor, w2=upscale_factor), )

Here is the difference:

no need in special instruction pixel_shuffle (and result is transferrable between frameworks) output doesn't contain a fake axis (and we could do the same for the input) inplace ReLU used now, for high resolution pictures that becomes critical and saves us much memory and all the benefits of nn.Sequential again Restyling Gram matrix for style transfer

Original code is already good - its first line shows expected tensor shape

einsum operation should be read like: for each batch and for each pair of channels, we sum over h and w. I've also changed normalization, because that's how Gram matrix is defined, otherwise we should call it normalized Gram matrix or alike def gram_matrix_old(y): (b, ch, h, w) = y.size() features = y.view(b, ch, w * h) features_t = features.transpose(1, 2) gram = features.bmm(features_t) / (ch * h * w) return gram def gram_matrix_new(y): b, ch, h, w = y.shape return torch.einsum('bchw,bdhw->bcd', [y, y]) / (h * w)

It would be great to use just 'b c1 h w,b c2 h w->b c1 c2' , but einsum supports only one-letter axes

Recurrent model

All we did here is just made information about shapes explicit to skip deciphering

class RNNModelOld(nn.Module): """Container module with an encoder, a recurrent module, and a decoder.""" def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5): super(RNNModel, self).__init__() self.drop = nn.Dropout(dropout) self.encoder = nn.Embedding(ntoken, ninp) self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout) self.decoder = nn.Linear(nhid, ntoken) def forward(self, input, hidden): emb = self.drop(self.encoder(input)) output, hidden = self.rnn(emb, hidden) output = self.drop(output) decoded = self.decoder(output.view(output.size(0)*output.size(1), output.size(2))) return decoded.view(output.size(0), output.size(1), decoded.size(1)), hidden

class RNNModelNew(nn.Module): """Container module with an encoder, a recurrent module, and a decoder.""" def __init__(self, ntoken, ninp, nhid, nlayers, dropout=0.5): super(RNNModel, self).__init__() self.drop = nn.Dropout(p=dropout) self.encoder = nn.Embedding(ntoken, ninp) self.rnn = nn.LSTM(ninp, nhid, nlayers, dropout=dropout) self.decoder = nn.Linear(nhid, ntoken) def forward(self, input, hidden): t, b = input.shape emb = self.drop(self.encoder(input)) output, hidden = self.rnn(emb, hidden) output = rearrange(self.drop(output), 't b nhid -> (t b) nhid') decoded = rearrange(self.decoder(output), '(t b) token -> t b token', t=t, b=b) return decoded, hidden

Channel shuffle (from shufflenet) def channel_shuffle_old(x, groups): batchsize, num_channels, height, width = x.data.size() channels_per_group = num_channels // groups # reshape x = x.view(batchsize, groups, channels_per_group, height, width) # transpose # - contiguous() required if transpose() is used before view(). # See https://github.com/pytorch/pytorch/issues/764 x = torch.transpose(x, 1, 2).contiguous() # flatten x = x.view(batchsize, -1, height, width) return x

def channel_shuffle_new(x, groups): return rearrange(x, 'b (c1 c2) h w -> b (c2 c1) h w', c1=groups)

While progress is obvious, this is not the limit. As you'll see below, we don't even need to write these couple of lines.

Shufflenet from collections import OrderedDict def channel_shuffle(x, groups): batchsize, num_channels, height, width = x.data.size() channels_per_group = num_channels // groups # reshape x = x.view(batchsize, groups, channels_per_group, height, width) # transpose # - contiguous() required if transpose() is used before view(). # See https://github.com/pytorch/pytorch/issues/764 x = torch.transpose(x, 1, 2).contiguous() # flatten x = x.view(batchsize, -1, height, width) return x class ShuffleUnitOld(nn.Module): def __init__(self, in_channels, out_channels, groups=3, grouped_conv=True, combine='add'): super(ShuffleUnitOld, self).__init__() self.in_channels = in_channels self.out_channels = out_channels self.grouped_conv = grouped_conv self.combine = combine self.groups = groups self.bottleneck_channels = self.out_channels // 4 # define the type of ShuffleUnit if self.combine == 'add': # ShuffleUnit Figure 2b self.depthwise_stride = 1 self._combine_func = self._add elif self.combine == 'concat': # ShuffleUnit Figure 2c self.depthwise_stride = 2 self._combine_func = self._concat # ensure output of concat has the same channels as # original output channels. self.out_channels -= self.in_channels else: raise ValueError("Cannot combine tensors with \"{}\"" \ "Only \"add\" and \"concat\" are" \ "supported".format(self.combine)) # Use a 1x1 grouped or non-grouped convolution to reduce input channels # to bottleneck channels, as in a ResNet bottleneck module. # NOTE: Do not use group convolution for the first conv1x1 in Stage 2. self.first_1x1_groups = self.groups if grouped_conv else 1 self.g_conv_1x1_compress = self._make_grouped_conv1x1( self.in_channels, self.bottleneck_channels, self.first_1x1_groups, batch_norm=True, relu=True ) # 3x3 depthwise convolution followed by batch normalization self.depthwise_conv3x3 = conv3x3( self.bottleneck_channels, self.bottleneck_channels, stride=self.depthwise_stride, groups=self.bottleneck_channels) self.bn_after_depthwise = nn.BatchNorm2d(self.bottleneck_channels) # Use 1x1 grouped convolution to expand from # bottleneck_channels to out_channels self.g_conv_1x1_expand = self._make_grouped_conv1x1( self.bottleneck_channels, self.out_channels, self.groups, batch_norm=True, relu=False ) @staticmethod def _add(x, out): # residual connection return x + out @staticmethod def _concat(x, out): # concatenate along channel axis return torch.cat((x, out), 1) def _make_grouped_conv1x1(self, in_channels, out_channels, groups, batch_norm=True, relu=False): modules = OrderedDict() conv = conv1x1(in_channels, out_channels, groups=groups) modules['conv1x1'] = conv if batch_norm: modules['batch_norm'] = nn.BatchNorm2d(out_channels) if relu: modules['relu'] = nn.ReLU() if len(modules) > 1: return nn.Sequential(modules) else: return conv def forward(self, x): # save for combining later with output residual = x if self.combine == 'concat': residual = F.avg_pool2d(residual, kernel_size=3, stride=2, padding=1) out = self.g_conv_1x1_compress(x) out = channel_shuffle(out, self.groups) out = self.depthwise_conv3x3(out) out = self.bn_after_depthwise(out) out = self.g_conv_1x1_expand(out) out = self._combine_func(residual, out) return F.relu(out) class ShuffleUnitNew(nn.Module): def __init__(self, in_channels, out_channels, groups=3, grouped_conv=True, combine='add'): super().__init__() first_1x1_groups = groups if grouped_conv else 1 bottleneck_channels = out_channels // 4 self.combine = combine if combine == 'add': # ShuffleUnit Figure 2b self.left = Rearrange('...->...') # identity depthwise_stride = 1 else: # ShuffleUnit Figure 2c self.left = nn.AvgPool2d(kernel_size=3, stride=2, padding=1) depthwise_stride = 2 # ensure output of concat has the same channels as original output channels. out_channels -= in_channels assert out_channels > 0 self.right = nn.Sequential( # Use a 1x1 grouped or non-grouped convolution to reduce input channels # to bottleneck channels, as in a ResNet bottleneck module. conv1x1(in_channels, bottleneck_channels, groups=first_1x1_groups), nn.BatchNorm2d(bottleneck_channels), nn.ReLU(inplace=True), # channel shuffle Rearrange('b (c1 c2) h w -> b (c2 c1) h w', c1=groups), # 3x3 depthwise convolution followed by batch conv3x3(bottleneck_channels, bottleneck_channels, stride=depthwise_stride, groups=bottleneck_channels), nn.BatchNorm2d(bottleneck_channels), # Use 1x1 grouped convolution to expand from # bottleneck_channels to out_channels conv1x1(bottleneck_channels, out_channels, groups=groups), nn.BatchNorm2d(out_channels), ) def forward(self, x): if self.combine == 'add': combined = self.left(x) + self.right(x) else: combined = torch.cat([self.left(x), self.right(x)], dim=1) return F.relu(combined, inplace=True)

Rewriting the code helped to identify:

There is no sense in doing reshuffling and not using groups in the first (indeed, I in the paper it is not so). However, this is equivalent model. It is also strange that first convolution may be not grouped, while last convolution is always grouped (and that is different from the paper)

Other comments:

You've probably noticed that there is an identity layer for pytorch introduced here The last thing left is get rid of conv1x1 and conv3x3 in the code - those are not better than standard Simplifying ResNet class ResNetOld(nn.Module): def __init__(self, block, layers, num_classes=1000): self.inplanes = 64 super(ResNetOld, self).__init__() self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False) self.bn1 = nn.BatchNorm2d(64) self.relu = nn.ReLU(inplace=True) self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1) self.layer1 = self._make_layer(block, 64, layers[0]) self.layer2 = self._make_layer(block, 128, layers[1], stride=2) self.layer3 = self._make_layer(block, 256, layers[2], stride=2) self.layer4 = self._make_layer(block, 512, layers[3], stride=2) self.avgpool = nn.AvgPool2d(7, stride=1) self.fc = nn.Linear(512 * block.expansion, num_classes) for m in self.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() def _make_layer(self, block, planes, blocks, stride=1): downsample = None if stride != 1 or self.inplanes != planes * block.expansion: downsample = nn.Sequential( nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion), ) layers = [] layers.append(block(self.inplanes, planes, stride, downsample)) self.inplanes = planes * block.expansion for i in range(1, blocks): layers.append(block(self.inplanes, planes)) return nn.Sequential(*layers) def forward(self, x): x = self.conv1(x) x = self.bn1(x) x = self.relu(x) x = self.maxpool(x) x = self.layer1(x) x = self.layer2(x) x = self.layer3(x) x = self.layer4(x) x = self.avgpool(x) x = x.view(x.size(0), -1) x = self.fc(x) return x def make_layer(inplanes, planes, block, n_blocks, stride=1): downsample = None if stride != 1 or inplanes != planes * block.expansion: # output size won't match input, so adjust residual downsample = nn.Sequential( nn.Conv2d(inplanes, planes * block.expansion, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes * block.expansion), ) return nn.Sequential( block(inplanes, planes, stride, downsample), *[block(planes * block.expansion, planes) for _ in range(1, n_blocks)] ) def ResNetNew(block, layers, num_classes=1000): e = block.expansion resnet = nn.Sequential( Rearrange('b c h w -> b c h w', c=3, h=224, w=224), nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False), nn.BatchNorm2d(64), nn.ReLU(inplace=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1), make_layer(64, 64, block, layers[0], stride=1), make_layer(64 * e, 128, block, layers[1], stride=2), make_layer(128 * e, 256, block, layers[2], stride=2), make_layer(256 * e, 512, block, layers[3], stride=2), # combined AvgPool and view in one averaging operation Reduce('b c h w -> b c', 'mean'), nn.Linear(512 * e, num_classes), ) # initialization for m in resnet.modules(): if isinstance(m, nn.Conv2d): n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels m.weight.data.normal_(0, math.sqrt(2. / n)) elif isinstance(m, nn.BatchNorm2d): m.weight.data.fill_(1) m.bias.data.zero_() return resnet

Things that were changed

make_layer

Improving RNN language modelling class RNNOld(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional, dropout=dropout) self.fc = nn.Linear(hidden_dim*2, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): #x = [sent len, batch size] embedded = self.dropout(self.embedding(x)) #embedded = [sent len, batch size, emb dim] output, (hidden, cell) = self.rnn(embedded) #output = [sent len, batch size, hid dim * num directions] #hidden = [num layers * num directions, batch size, hid dim] #cell = [num layers * num directions, batch size, hid dim] #concat the final forward (hidden[-2,:,:]) and backward (hidden[-1,:,:]) hidden layers #and apply dropout hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim=1)) #hidden = [batch size, hid dim * num directions] return self.fc(hidden.squeeze(0)) class RNNNew(nn.Module): def __init__(self, vocab_size, embedding_dim, hidden_dim, output_dim, n_layers, bidirectional, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.rnn = nn.LSTM(embedding_dim, hidden_dim, num_layers=n_layers, bidirectional=bidirectional, dropout=dropout) self.dropout = nn.Dropout(dropout) self.directions = 2 if bidirectional else 1 self.fc = nn.Linear(hidden_dim * self.directions, output_dim) def forward(self, x): #x = [sent len, batch size] embedded = self.dropout(self.embedding(x)) #embedded = [sent len, batch size, emb dim] output, (hidden, cell) = self.rnn(embedded) hidden = rearrange(hidden, '(layer dir) b c -> layer b (dir c)', dir=self.directions) # take the final layer's hidden return self.fc(self.dropout(hidden[-1])) original code misbehaves for non-bidirectional models and fails when bidirectional = False, and there is only one layer modification of the code shows both how hidden is structured and how it is modified Writing FastText faster class FastTextOld(nn.Module): def __init__(self, vocab_size, embedding_dim, output_dim): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.fc = nn.Linear(embedding_dim, output_dim) def forward(self, x): #x = [sent len, batch size] embedded = self.embedding(x) #embedded = [sent len, batch size, emb dim] embedded = embedded.permute(1, 0, 2) #embedded = [batch size, sent len, emb dim] pooled = F.avg_pool2d(embedded, (embedded.shape[1], 1)).squeeze(1) #pooled = [batch size, embedding_dim] return self.fc(pooled)

def FastTextNew(vocab_size, embedding_dim, output_dim): return nn.Sequential( Rearrange('t b -> t b'), nn.Embedding(vocab_size, embedding_dim), Reduce('t b c -> b c', 'mean'), nn.Linear(embedding_dim, output_dim), Rearrange('b c -> b c'), )

Some comments on new code:

Rearrange('b t -> t b'),

CNNs for text classification class CNNOld(nn.Module): def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.conv_0 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[0],embedding_dim)) self.conv_1 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[1],embedding_dim)) self.conv_2 = nn.Conv2d(in_channels=1, out_channels=n_filters, kernel_size=(filter_sizes[2],embedding_dim)) self.fc = nn.Linear(len(filter_sizes)*n_filters, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): #x = [sent len, batch size] x = x.permute(1, 0) #x = [batch size, sent len] embedded = self.embedding(x) #embedded = [batch size, sent len, emb dim] embedded = embedded.unsqueeze(1) #embedded = [batch size, 1, sent len, emb dim] conved_0 = F.relu(self.conv_0(embedded).squeeze(3)) conved_1 = F.relu(self.conv_1(embedded).squeeze(3)) conved_2 = F.relu(self.conv_2(embedded).squeeze(3)) #conv_n = [batch size, n_filters, sent len - filter_sizes[n]] pooled_0 = F.max_pool1d(conved_0, conved_0.shape[2]).squeeze(2) pooled_1 = F.max_pool1d(conved_1, conved_1.shape[2]).squeeze(2) pooled_2 = F.max_pool1d(conved_2, conved_2.shape[2]).squeeze(2) #pooled_n = [batch size, n_filters] cat = self.dropout(torch.cat((pooled_0, pooled_1, pooled_2), dim=1)) #cat = [batch size, n_filters * len(filter_sizes)] return self.fc(cat) class CNNNew(nn.Module): def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, dropout): super().__init__() self.embedding = nn.Embedding(vocab_size, embedding_dim) self.convs = nn.ModuleList([ nn.Conv1d(embedding_dim, n_filters, kernel_size=size) for size in filter_sizes ]) self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim) self.dropout = nn.Dropout(dropout) def forward(self, x): x = rearrange(x, 't b -> t b') emb = rearrange(self.embedding(x), 't b c -> b c t') pooled = [reduce(conv(emb), 'b c t -> b c', 'max') for conv in self.convs] concatenated = rearrange(pooled, 'filter b c -> b (filter c)') return self.fc(self.dropout(F.relu(concatenated))) Original code misuses Conv2d, while Conv1d is the right choice Fixed code can work with any number of filter_sizes (and won't fail) First line in new code does nothing, but was added for simplicity

Highway convolutions Highway convolutions are common in TTS systems. Code below makes splitting a bit more explicit. Splitting policy may eventually turn out to be important if input had previously groups over channel axes (group convolutions or bidirectional LSTMs/GRUs) Same applies to GLU and gated units in general

class HighwayConv1dOld(nn.Conv1d): def forward(self, inputs): L = super(HighwayConv1dOld, self).forward(inputs) H1, H2 = torch.chunk(L, 2, 1) # chunk at the feature dim torch.sigmoid_(H1) return H1 * H2 + (1.0 - H1) * inputs

class HighwayConv1dNew(nn.Conv1d): def forward(self, inputs): L = super().forward(inputs) H1, H2 = rearrange(L, 'b (split c) t -> split b c t', split=2) torch.sigmoid_(H1) return H1 * H2 + (1.0 - H1) * inputs

Tacotron's CBHG module class CBHG_Old(nn.Module): """CBHG module: a recurrent neural network composed of: - 1-d convolution banks - Highway networks + residual connections - Bidirectional gated recurrent units """ def __init__(self, in_dim, K=16, projections=[128, 128]): super(CBHG, self).__init__() self.in_dim = in_dim self.relu = nn.ReLU() self.conv1d_banks = nn.ModuleList( [BatchNormConv1d(in_dim, in_dim, kernel_size=k, stride=1, padding=k // 2, activation=self.relu) for k in range(1, K + 1)]) self.max_pool1d = nn.MaxPool1d(kernel_size=2, stride=1, padding=1) in_sizes = [K * in_dim] + projections[:-1] activations = [self.relu] * (len(projections) - 1) + [None] self.conv1d_projections = nn.ModuleList( [BatchNormConv1d(in_size, out_size, kernel_size=3, stride=1, padding=1, activation=ac) for (in_size, out_size, ac) in zip( in_sizes, projections, activations)]) self.pre_highway = nn.Linear(projections[-1], in_dim, bias=False) self.highways = nn.ModuleList( [Highway(in_dim, in_dim) for _ in range(4)]) self.gru = nn.GRU( in_dim, in_dim, 1, batch_first=True, bidirectional=True) def forward_old(self, inputs): # (B, T_in, in_dim) x = inputs # Needed to perform conv1d on time-axis # (B, in_dim, T_in) if x.size(-1) == self.in_dim: x = x.transpose(1, 2) T = x.size(-1) # (B, in_dim*K, T_in) # Concat conv1d bank outputs x = torch.cat([conv1d(x)[:, :, :T] for conv1d in self.conv1d_banks], dim=1) assert x.size(1) == self.in_dim * len(self.conv1d_banks) x = self.max_pool1d(x)[:, :, :T] for conv1d in self.conv1d_projections: x = conv1d(x) # (B, T_in, in_dim) # Back to the original shape x = x.transpose(1, 2) if x.size(-1) != self.in_dim: x = self.pre_highway(x) # Residual connection x += inputs for highway in self.highways: x = highway(x) # (B, T_in, in_dim*2) outputs, _ = self.gru(x) return outputs def forward_new(self, inputs, input_lengths=None): x = rearrange(inputs, 'b t c -> b c t') _, _, T = x.shape # Concat conv1d bank outputs x = rearrange([conv1d(x)[:, :, :T] for conv1d in self.conv1d_banks], 'bank b c t -> b (bank c) t', c=self.in_dim) x = self.max_pool1d(x)[:, :, :T] for conv1d in self.conv1d_projections: x = conv1d(x) x = rearrange(x, 'b c t -> b t c') if x.size(-1) != self.in_dim: x = self.pre_highway(x) # Residual connection x += inputs for highway in self.highways: x = highway(x) # (B, T_in, in_dim*2) outputs, _ = self.gru(self.highways(x)) return outputs

There is still a large room for improvements, but in this example only forward function was changed

Simple attention

Good news: there is no more need to guess order of dimensions. Neither for inputs nor for outputs

class Attention(nn.Module): def __init__(self): super(Attention, self).__init__() def forward(self, K, V, Q): A = torch.bmm(K.transpose(1,2), Q) / np.sqrt(Q.shape[1]) A = F.softmax(A, 1) R = torch.bmm(V, A) return torch.cat((R, Q), dim=1) def attention(K, V, Q): _, n_channels, _ = K.shape A = torch.einsum('bct,bcl->btl', [K, Q]) A = F.softmax(A * n_channels ** (-0.5), 1) R = torch.einsum('bct,btl->bcl', [V, A]) return torch.cat((R, Q), dim=1) Transformer's attention needs more attention class ScaledDotProductAttention(nn.Module): ''' Scaled Dot-Product Attention ''' def __init__(self, temperature, attn_dropout=0.1): super().__init__() self.temperature = temperature self.dropout = nn.Dropout(attn_dropout) self.softmax = nn.Softmax(dim=2) def forward(self, q, k, v, mask=None): attn = torch.bmm(q, k.transpose(1, 2)) attn = attn / self.temperature if mask is not None: attn = attn.masked_fill(mask, -np.inf) attn = self.softmax(attn) attn = self.dropout(attn) output = torch.bmm(attn, v) return output, attn class MultiHeadAttentionOld(nn.Module): ''' Multi-Head Attention module ''' def __init__(self, n_head, d_model, d_k, d_v, dropout=0.1): super().__init__() self.n_head = n_head self.d_k = d_k self.d_v = d_v self.w_qs = nn.Linear(d_model, n_head * d_k) self.w_ks = nn.Linear(d_model, n_head * d_k) self.w_vs = nn.Linear(d_model, n_head * d_v) nn.init.normal_(self.w_qs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_ks.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_vs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_v))) self.attention = ScaledDotProductAttention(temperature=np.power(d_k, 0.5)) self.layer_norm = nn.LayerNorm(d_model) self.fc = nn.Linear(n_head * d_v, d_model) nn.init.xavier_normal_(self.fc.weight) self.dropout = nn.Dropout(dropout) def forward(self, q, k, v, mask=None): d_k, d_v, n_head = self.d_k, self.d_v, self.n_head sz_b, len_q, _ = q.size() sz_b, len_k, _ = k.size() sz_b, len_v, _ = v.size() residual = q q = self.w_qs(q).view(sz_b, len_q, n_head, d_k) k = self.w_ks(k).view(sz_b, len_k, n_head, d_k) v = self.w_vs(v).view(sz_b, len_v, n_head, d_v) q = q.permute(2, 0, 1, 3).contiguous().view(-1, len_q, d_k) # (n*b) x lq x dk k = k.permute(2, 0, 1, 3).contiguous().view(-1, len_k, d_k) # (n*b) x lk x dk v = v.permute(2, 0, 1, 3).contiguous().view(-1, len_v, d_v) # (n*b) x lv x dv mask = mask.repeat(n_head, 1, 1) # (n*b) x .. x .. output, attn = self.attention(q, k, v, mask=mask) output = output.view(n_head, sz_b, len_q, d_v) output = output.permute(1, 2, 0, 3).contiguous().view(sz_b, len_q, -1) # b x lq x (n*dv) output = self.dropout(self.fc(output)) output = self.layer_norm(output + residual) return output, attn class MultiHeadAttentionNew(nn.Module): def __init__(self, n_head, d_model, d_k, d_v, dropout=0.1): super().__init__() self.n_head = n_head self.w_qs = nn.Linear(d_model, n_head * d_k) self.w_ks = nn.Linear(d_model, n_head * d_k) self.w_vs = nn.Linear(d_model, n_head * d_v) nn.init.normal_(self.w_qs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_ks.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_k))) nn.init.normal_(self.w_vs.weight, mean=0, std=np.sqrt(2.0 / (d_model + d_v))) self.fc = nn.Linear(n_head * d_v, d_model) nn.init.xavier_normal_(self.fc.weight) self.dropout = nn.Dropout(p=dropout) self.layer_norm = nn.LayerNorm(d_model) def forward(self, q, k, v, mask=None): residual = q q = rearrange(self.w_qs(q), 'b l (head k) -> head b l k', head=self.n_head) k = rearrange(self.w_ks(k), 'b t (head k) -> head b t k', head=self.n_head) v = rearrange(self.w_vs(v), 'b t (head v) -> head b t v', head=self.n_head) attn = torch.einsum('hblk,hbtk->hblt', [q, k]) / np.sqrt(q.shape[-1]) if mask is not None: attn = attn.masked_fill(mask[None], -np.inf) attn = torch.softmax(attn, dim=3) output = torch.einsum('hblt,hbtv->hblv', [attn, v]) output = rearrange(output, 'head b l v -> b l (head v)') output = self.dropout(self.fc(output)) output = self.layer_norm(output + residual) return output, attn

Benefits of new implementation

we have one module, not two now code does not fail for None mask the amount of caveats in the original code that we removed is huge. Try erasing comments and deciphering what happens there Self-attention GANs

SAGANs are currently SotA for image generation, and can be simplified using same tricks. If torch.einsum supported non-one letter axes, we could improve this solution further.

class Self_Attn_Old(nn.Module): """ Self attention Layer""" def __init__(self,in_dim,activation): super(Self_Attn_Old,self).__init__() self.chanel_in = in_dim self.activation = activation self.query_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.key_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim//8 , kernel_size= 1) self.value_conv = nn.Conv2d(in_channels = in_dim , out_channels = in_dim , kernel_size= 1) self.gamma = nn.Parameter(torch.zeros(1)) self.softmax = nn.Softmax(dim=-1) # def forward(self, x): """ inputs : x : input feature maps( B X C X W X H) returns : out : self attention value + input feature attention: B X N X N (N is Width*Height) """ m_batchsize,C,width ,height = x.size() proj_query = self.query_conv(x).view(m_batchsize,-1,width*height).permute(0,2,1) # B X CX(N) proj_key = self.key_conv(x).view(m_batchsize,-1,width*height) # B X C x (*W*H) energy = torch.bmm(proj_query,proj_key) # transpose check attention = self.softmax(energy) # BX (N) X (N) proj_value = self.value_conv(x).view(m_batchsize,-1,width*height) # B X C X N out = torch.bmm(proj_value,attention.permute(0,2,1) ) out = out.view(m_batchsize,C,width,height) out = self.gamma*out + x return out,attention class Self_Attn_New(nn.Module): """ Self attention Layer""" def __init__(self, in_dim): super().__init__() self.query_conv = nn.Conv2d(in_dim, out_channels=in_dim//8, kernel_size=1) self.key_conv = nn.Conv2d(in_dim, out_channels=in_dim//8, kernel_size=1) self.value_conv = nn.Conv2d(in_dim, out_channels=in_dim, kernel_size=1) self.gamma = nn.Parameter(torch.zeros([1])) def forward(self, x): proj_query = rearrange(self.query_conv(x), 'b c h w -> b (h w) c') proj_key = rearrange(self.key_conv(x), 'b c h w -> b c (h w)') proj_value = rearrange(self.value_conv(x), 'b c h w -> b (h w) c') energy = torch.bmm(proj_query, proj_key) attention = F.softmax(energy, dim=2) out = torch.bmm(attention, proj_value) out = x + self.gamma * rearrange(out, 'b (h w) c -> b c h w', **parse_shape(x, 'b c h w')) return out, attention Improving time sequence prediction

While this example was considered to be simplistic, I had to analyze surrounding code to understand what kind of input was expected. You can try yourself.

One minor change done is now the code works with any dtype, not only double; and new code supports using GPU.

class SequencePredictionOld(nn.Module): def __init__(self): super(SequencePredictionOld, self).__init__() self.lstm1 = nn.LSTMCell(1, 51) self.lstm2 = nn.LSTMCell(51, 51) self.linear = nn.Linear(51, 1) def forward(self, input, future = 0): outputs = [] h_t = torch.zeros(input.size(0), 51, dtype=torch.double) c_t = torch.zeros(input.size(0), 51, dtype=torch.double) h_t2 = torch.zeros(input.size(0), 51, dtype=torch.double) c_t2 = torch.zeros(input.size(0), 51, dtype=torch.double) for i, input_t in enumerate(input.chunk(input.size(1), dim=1)): h_t, c_t = self.lstm1(input_t, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] for i in range(future):# if we should predict the future h_t, c_t = self.lstm1(output, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] outputs = torch.stack(outputs, 1).squeeze(2) return outputs class SequencePredictionNew(nn.Module): def __init__(self): super(SequencePredictionNew, self).__init__() self.lstm1 = nn.LSTMCell(1, 51) self.lstm2 = nn.LSTMCell(51, 51) self.linear = nn.Linear(51, 1) def forward(self, input, future=0): b, t = input.shape h_t, c_t, h_t2, c_t2 = torch.zeros(4, b, 51, dtype=self.linear.weight.dtype, device=self.linear.weight.device) outputs = [] for input_t in rearrange(input, 'b t -> t b ()'): h_t, c_t = self.lstm1(input_t, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] for i in range(future): # if we should predict the future h_t, c_t = self.lstm1(output, (h_t, c_t)) h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2)) output = self.linear(h_t2) outputs += [output] return rearrange(outputs, 't b () -> b t') Transforming spacial transformer network (STN) class SpacialTransformOld(nn.Module): def __init__(self): super(Net, self).__init__() # Spatial transformer localization-network self.localization = nn.Sequential( nn.Conv2d(1, 8, kernel_size=7), nn.MaxPool2d(2, stride=2), nn.ReLU(True), nn.Conv2d(8, 10, kernel_size=5), nn.MaxPool2d(2, stride=2), nn.ReLU(True) ) # Regressor for the 3 * 2 affine matrix self.fc_loc = nn.Sequential( nn.Linear(10 * 3 * 3, 32), nn.ReLU(True), nn.Linear(32, 3 * 2) ) # Initialize the weights/bias with identity transformation self.fc_loc[2].weight.data.zero_() self.fc_loc[2].bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float)) # Spatial transformer network forward function def stn(self, x): xs = self.localization(x) xs = xs.view(-1, 10 * 3 * 3) theta = self.fc_loc(xs) theta = theta.view(-1, 2, 3) grid = F.affine_grid(theta, x.size()) x = F.grid_sample(x, grid) return x class SpacialTransformNew(nn.Module): def __init__(self): super(Net, self).__init__() # Spatial transformer localization-network linear = nn.Linear(32, 3 * 2) # Initialize the weights/bias with identity transformation linear.weight.data.zero_() linear.bias.data.copy_(torch.tensor([1, 0, 0, 0, 1, 0], dtype=torch.float)) self.compute_theta = nn.Sequential( nn.Conv2d(1, 8, kernel_size=7), nn.MaxPool2d(2, stride=2), nn.ReLU(True), nn.Conv2d(8, 10, kernel_size=5), nn.MaxPool2d(2, stride=2), nn.ReLU(True), Rearrange('b c h w -> b (c h w)', h=3, w=3), nn.Linear(10 * 3 * 3, 32), nn.ReLU(True), linear, Rearrange('b (row col) -> b row col', row=2, col=3), ) # Spatial transformer network forward function def stn(self, x): grid = F.affine_grid(self.compute_theta(x), x.size()) return F.grid_sample(x, grid) new code will give reasonable errors when passed image size is different from expected if batch size is divisible by 18, whatever you input in the old code, it'll fail no sooner than affine_grid. Improving GLOW

That's a good old depth-to-space written manually!

Since GLOW is revertible, it will frequently rely on rearrange -like operations.

def unsqueeze2d_old(input, factor=2): assert factor >= 1 and isinstance(factor, int) factor2 = factor ** 2 if factor == 1: return input size = input.size() B = size[0] C = size[1] H = size[2] W = size[3] assert C % (factor2) == 0, "{}".format(C) x = input.view(B, C // factor2, factor, factor, H, W) x = x.permute(0, 1, 4, 2, 5, 3).contiguous() x = x.view(B, C // (factor2), H * factor, W * factor) return x def squeeze2d_old(input, factor=2): assert factor >= 1 and isinstance(factor, int) if factor == 1: return input size = input.size() B = size[0] C = size[1] H = size[2] W = size[3] assert H % factor == 0 and W % factor == 0, "{}".format((H, W)) x = input.view(B, C, H // factor, factor, W // factor, factor) x = x.permute(0, 1, 3, 5, 2, 4).contiguous() x = x.view(B, C * factor * factor, H // factor, W // factor) return x

def unsqueeze2d_new(input, factor=2): return rearrange(input, 'b (c h2 w2) h w -> b c (h h2) (w w2)', h2=factor, w2=factor) def squeeze2d_new(input, factor=2): return rearrange(input, 'b c (h h2) (w w2) -> b (c h2 w2) h w', h2=factor, w2=factor)

term squeeze isn't very helpful: which dimension is squeezed? There is torch.squeeze , but it's very different. in fact, we could skip creating functions completely Detecting problems in YOLO detection def YOLO_prediction_old(input, num_classes, num_anchors, anchors, stride_h, stride_w): bs = input.size(0) in_h = input.size(2) in_w = input.size(3) scaled_anchors = [(a_w / stride_w, a_h / stride_h) for a_w, a_h in anchors] prediction = input.view(bs, num_anchors, 5 + num_classes, in_h, in_w).permute(0, 1, 3, 4, 2).contiguous() # Get outputs x = torch.sigmoid(prediction[..., 0]) # Center x y = torch.sigmoid(prediction[..., 1]) # Center y w = prediction[..., 2] # Width h = prediction[..., 3] # Height conf = torch.sigmoid(prediction[..., 4]) # Conf pred_cls = torch.sigmoid(prediction[..., 5:]) # Cls pred. FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor # Calculate offsets for each grid grid_x = torch.linspace(0, in_w - 1, in_w).repeat(in_w, 1).repeat( bs * num_anchors, 1, 1).view(x.shape).type(FloatTensor) grid_y = torch.linspace(0, in_h - 1, in_h).repeat(in_h, 1).t().repeat( bs * num_anchors, 1, 1).view(y.shape).type(FloatTensor) # Calculate anchor w, h anchor_w = FloatTensor(scaled_anchors).index_select(1, LongTensor([0])) anchor_h = FloatTensor(scaled_anchors).index_select(1, LongTensor([1])) anchor_w = anchor_w.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(w.shape) anchor_h = anchor_h.repeat(bs, 1).repeat(1, 1, in_h * in_w).view(h.shape) # Add offset and scale with anchors pred_boxes = FloatTensor(prediction[..., :4].shape) pred_boxes[..., 0] = x.data + grid_x pred_boxes[..., 1] = y.data + grid_y pred_boxes[..., 2] = torch.exp(w.data) * anchor_w pred_boxes[..., 3] = torch.exp(h.data) * anchor_h # Results _scale = torch.Tensor([stride_w, stride_h] * 2).type(FloatTensor) output = torch.cat((pred_boxes.view(bs, -1, 4) * _scale, conf.view(bs, -1, 1), pred_cls.view(bs, -1, num_classes)), -1) return output def YOLO_prediction_new(input, num_classes, num_anchors, anchors, stride_h, stride_w): raw_predictions = rearrange(input, 'b (anchor prediction) h w -> prediction b anchor h w', anchor=num_anchors, prediction=5 + num_classes) anchors = torch.FloatTensor(anchors).to(input.device) anchor_sizes = rearrange(anchors, 'anchor dim -> dim () anchor () ()') _, _, _, in_h, in_w = raw_predictions.shape grid_h = rearrange(torch.arange(in_h).float(), 'h -> () () h ()').to(input.device) grid_w = rearrange(torch.arange(in_w).float(), 'w -> () () () w').to(input.device) predicted_bboxes = torch.zeros_like(raw_predictions) predicted_bboxes[0] = (raw_predictions[0].sigmoid() + grid_w) * stride_w # center x predicted_bboxes[1] = (raw_predictions[1].sigmoid() + grid_h) * stride_h # center y predicted_bboxes[2:4] = (raw_predictions[2:4].exp()) * anchor_sizes # bbox width and height predicted_bboxes[4] = raw_predictions[4].sigmoid() # confidence predicted_bboxes[5:] = raw_predictions[5:].sigmoid() # class predictions # merging all predicted bboxes for each image return rearrange(predicted_bboxes, 'prediction b anchor h w -> b (anchor h w) prediction')

We changed and fixed a lot:

new code won't fail if input is not on the first GPU old code has wrong grid_x and grid_y for non-square images new code doesn't use replication when broadcasting is sufficient old code strangely sometimes takes .data , but this has no real effect, as some branches preserve gradient till the end if gradients not needed, torch.no_grad should be used, so it's redundant Simpler output for a bunch of pictures

Next time you need to output drawings of you generative models, you can use this trick

device = 'cpu' plt.imshow(np.transpose(vutils.make_grid(fake_batch.to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0))) padded = F.pad(fake_batch[:64], [1, 1, 1, 1]) plt.imshow(rearrange(padded, '(b1 b2) c h w -> (b1 h) (b2 w) c', b1=8).cpu()) Instead of conclusion

Better code is a vague term; to be specific, things that are expected from code are:

reliable: does what expected and does not fail. Explicitly fails for wrong inputs readaility counts maintainable and modifiable reusable: understanding and modifying code should be easier than writing from scratch fast: in my measurements, proposed versions have speed similar to the original code

I've tried to demonstrate how you can improve these criteria for deep learning code. And einops helps you a lot.

Links pytorch and einops significant part of the code was taken from official examples and tutorials (references for other code are given in source of this html, if you're really curious) einops has a tutorial if you want a gentle introduction

↧

Scrapy - Scraping Different Web Pages into a Scrapy Script

December 5, 2018, 1:50 pm

≫ Next: Fibonacci Series using Python Programming

≪ Previous: Writing better code with pytorch and einops

I'm creating a web app that scrapes a long list of shoes from different websites. Here are my two individual scrapy scripts:

http://store.nike.com/us/en_us/pw/mens-clearance-soccer-shoes/47Z7puZ896Zoi3

from scrapy import Spider from scrapy.http import Request class ShoesSpider(Spider): name = "shoes" allowed_domains = ["store.nike.com"] start_urls = ['http://store.nike.com/us/en_us/pw/mens-clearance-soccer-shoes/47Z7puZ896Zoi3'] def parse(self, response): shoes = response.xpath('//*[@class="grid-item-image-wrapper sprite-sheet sprite-index-0"]/a/@href').extract() for shoe in shoes: yield Request(shoe, callback=self.parse_shoes) def parse_shoes(self, response): url = response.url name = response.xpath('//*[@itemprop="name"]/text()').extract_first() price = response.xpath('//*[@itemprop="price"]/text()').extract_first() price = price.replace('$','') shoe_type = response.css('.exp-product-subtitle::text').extract_first() sizes = response.xpath('//*[@class="nsg-form--drop-down exp-pdp-size-dropdown exp-pdp-dropdown two-column-dropdown"]/option') sizes = sizes.xpath('text()[not(parent::option/@class="exp-pdp-size-not-in-stock selectBox-disabled")]').extract() sizes = [s.strip() for s in sizes] yield { 'url': url, 'name' : name, 'price' : price, 'sizes' : sizes, 'shoe_type': shoe_type }

http://www.dickssportinggoods.com/products/clearance-soccer-cleats.jsp

from scrapy import Spider from scrapy.http import Request class ShoesSpider(Spider): name = "shoes" allowed_domains = ["dickssportinggoods.com"] start_urls = ['http://www.dickssportinggoods.com/products/clearance-soccer-cleats.jsp'] def parse(self, response): shoes = response.xpath('//*[@class="fplpTitle header4"]/a/@href').extract() for shoe in shoes: yield Request(shoe, callback=self.parse_shoes) def parse_shoes(self, response): sizes = response.xpath('//*[@class="swatches clearfix"]/input/@value').extract() if sizes == []: pass url = response.url name = response.xpath('.//*[@id="PageHeading_3074457345618261107"]/h1/text()').extract_first() price = response.xpath('.//*[@itemprop="price"]/text()').extract_first() #shoe_type = response.css('.exp-product-subtitle::text').extract_first() yield { 'url': url, 'name' : name, 'price' : price, 'sizes' : sizes, 'shoe_type': '' }

How can I manage to put both of them together? I already went through the scrapy documentation and I haven't seen them mentioning this, it just mentions how to scrape two addresses from a root address. Thanks

Put your both domains in allowed_domains and put your both URLs in start_urls and then use simple if-else to determine what part of code to execute.

from scrapy import Spider from scrapy.http import Request class ShoesSpider(Spider): name = "shoes" allowed_domains = ["store.nike.com", "dickssportinggoods.com"] start_urls = ['http://store.nike.com/us/en_us/pw/mens-clearance-soccer-shoes/47Z7puZ896Zoi3', 'http://www.dickssportinggoods.com/products/clearance-soccer-cleats.jsp'] def parse(self, response): if "store.nike.com" in response.url: shoes = response.xpath('//*[@class="grid-item-image-wrapper sprite-sheet sprite-index-0"]/a/@href').extract() elif "dickssportinggoods.com" in response.url: shoes = response.xpath('//*[@class="fplpTitle header4"]/a/@href').extract() for shoe in shoes: yield Request(shoe, callback=self.parse_shoes) def parse_shoes(self, response): url = response.url if "store.nike.com" in response.url: name = response.xpath('//*[@itemprop="name"]/text()').extract_first() price = response.xpath('//*[@itemprop="price"]/text()').extract_first() price = price.replace('$','') shoe_type = response.css('.exp-product-subtitle::text').extract_first() sizes = response.xpath('//*[@class="nsg-form--drop-down exp-pdp-size-dropdown exp-pdp-dropdown two-column-dropdown"]/option') sizes = sizes.xpath('text()[not(parent::option/@class="exp-pdp-size-not-in-stock selectBox-disabled")]').extract() sizes = [s.strip() for s in sizes] yield { 'url': url, 'name' : name, 'price' : price, 'sizes' : sizes, 'shoe_type': shoe_type } elif "dickssportinggoods.com" in response.url: sizes = response.xpath('//*[@class="swatches clearfix"]/input/@value').extract() if sizes == []: pass url = response.url name = response.xpath('.//*[@id="PageHeading_3074457345618261107"]/h1/text()').extract_first() price = response.xpath('.//*[@itemprop="price"]/text()').extract_first() #shoe_type = response.css('.exp-product-subtitle::text').extract_first() yield { 'url': url, 'name' : name, 'price' : price, 'sizes' : sizes, 'shoe_type': '' }

↧

Fibonacci Series using Python Programming

December 5, 2018, 1:48 pm

≫ Next: 2.CNN图片多标签分类（基于TensorFlow实现验证码识别OCR）

≪ Previous: Scrapy - Scraping Different Web Pages into a Scrapy Script

python

Here we will see fibonacci series using Python programming. A fibonacci series isa series of numbers in which each number ( Fibonacci number ) is the sum of the two preceding numbers. For example, the series is 1, 1, 2, 3, 5, 8, etc.

By definition, the first two numbers in the Fibonacci sequence are either 1 and 1, or 0 and 1, depending on the chosen starting point of the sequence, and each subsequent number is the sum of the previous two.

The Fibonacci sequence is named after Italian mathematician Leonardo of Pisa, known as Fibonacci.

We will write function to print fibonacci series using Python programming.

The below function prints fibonacci series upto n, where n is an integer value passed as an argument to the function:

def fib(n):
a, b = 0, 1
while a < n:
print(a, end=' ')
a, b = b, a+b
print()

In the above function definition:

The keyword def introduces a function definition. It must be followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented.

The first line contains a multiple assignment: the variables a and b simultaneously get the new values 0 and 1. On the second line from the last this is used again, demonstrating that the expressions on the right-hand side are all evaluated first before any of the assignments take place. The right-hand side expressions are evaluated from the left to the right.

The while loop executes as long as the condition (here: a < n) remains true.

The body of the loop is indented: indentation is Python’s way of grouping statements. Note that each line within a basic block must be indented by the same amount.

The print() function writes the value of the argument(s) it is given.

Usage:

fib(100)

Output:

0 1 1 2 3 5 8 13 21 34 55 89

We write another function that will return the fibonacci series instead of just printing:

def fib(n):
result = []
a, b = 0, 1
while a < n:
result.append(a)
a, b = b, a+b
return result

In the above function:

The return statement returns with a value from a function. return without an expression argument returns None. Falling off the end of a function also returns None.

The statement result.append(a) calls a method of the list object result. The method append() shown in the example is defined for list objects; it adds a new element at the end of the list. In this example it is equivalent to result = result + [a], but more efficient.

Usage:

f = fib(100)

Then you need to write the result of f .

Thanks for reading.

↧

2.CNN图片多标签分类（基于TensorFlow实现验证码识别OCR）

December 5, 2018, 1:46 pm

≫ Next: Python’s Frameworks Comparison: Django, Pyramid, Flask, Sanic, Tornado, BottleP ...

≪ Previous: Fibonacci Series using Python Programming

上一篇实现了图片CNN单标签分类（猫狗图片分类任务）

地址： juejin.im/post/5c0739…

预告：下一篇用LSTM+CTC实现不定长文本的OCR，本质上是一种不固定标签个数的多标签分类问题

本文所用到的10w验证码数据集百度网盘下载地址（也可使用下文代码自行生成）：

pan.baidu.com/s/1N7bDHxIM…

利用本文代码训练并生成的模型（对应项目中的model文件夹）：

pan.baidu.com/s/1GyEpLdM5…

项目简介：

（需要预先安装pip install captcha==0.1.1,pip install opencv-python,pip install flask, pip install tensorflow/pip install tensorflow-gpu）本文采用CNN实现4位定长验证码图片OCR（生成的验证码固定由随机的4位大写字母组成），本质上是一张图片多个标签的分类问题（数据如下图所示）

整体训练逻辑：

1，将图像传入到CNN中提取特征

2，将特征图拉伸输入到FC layer中得出分类预测向量

3，通过sigmoid交叉熵函数对预测向量和标签向量进行训练，得出最终模型（注意：多标签分类任务采用sigmoid，单标签分类采用softmax）

整体预测逻辑：

1，将图像传入到CNN（VGG16）中提取特征

2，将特征图拉伸输入到FC layer中得出分类预测向量

3，将预测向量做sigmoid操作，由于验证码固定是4位，所以将向量切分成4条，从每条中找到最大值，并映射到对应的字母上

制作成web服务：

利用flask框架将整个项目启动成web服务，使得项目支持http方式调用启动服务后调用以下地址测试

http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/0_HZDZ.png

http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/1_CKAN.png

后续优化逻辑：

提取特征部分的CNN可以用RNN取代

本方案只能OCR固定长度文本，后续采用LSTM+CTC的方式来OCR非定长文本

运行命令：

自行生成验证码训练寄（本文生成了10w张，修改self.im_total_num变量）： pythonCnnOcr.py create_dataset

对数据集进行训练：pythonCnnOcr.py train

对新的图片进行测试：pythonCnnOcr.py test

启动成http服务：pythonCnnOcr.py start

项目目录结构：

训练过程：

整体代码如下：

# coding:utf-8 from captcha.image import ImageCaptcha import numpy as np import cv2 import tensorflow as tf import random, os, sys from flask import request from flask import Flask import json app = Flask(__name__) class CnnOcr: def __init__(self): self.epoch_max = 6 # 最大迭代epoch次数 self.batch_size = 64 # 训练时每个批次参与训练的图像数目，显存不足的可以调小 self.lr = 1e-3 # 初始学习率 self.save_epoch = 1 # 每相隔多少个epoch保存一次模型 self.im_width = 128 self.im_height = 64 self.im_total_num = 100000 # 总共生成的验证码图片数量 self.train_max_num = self.im_total_num # 训练时读取的最大图片数目 self.val_num = 50 * self.batch_size # 不能大于self.train_max_num 做验证集用 self.words_num = 4 # 每张验证码图片上的数字个数 self.words = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' self.label_num = self.words_num * len(self.words) self.keep_drop = tf.placeholder(tf.float32) self.x = None self.y = None def captchaOcr(self, img_path): """ 验证码识别 :param img_path: :return: """ im = cv2.imread(img_path) im = cv2.resize(im, (self.im_width, self.im_height)) im = [im] im = np.array(im, dtype=np.float32) im -= 147 output = self.sess.run(self.max_idx_p, feed_dict={self.x: im, self.keep_drop: 1.}) ret = '' for i in output.tolist()[0]: ret = ret + self.words[int(i)] return ret def test(self, img_path): """ 测试接口 :param img_path: :return: """ self.x = tf.placeholder(tf.float32, [None, self.im_height, self.im_width, 3]) # 输入数据 self.pred = self.cnnNet() self.output = tf.nn.sigmoid(self.pred) self.predict = tf.reshape(self.pred, [-1, self.words_num, len(self.words)]) self.max_idx_p = tf.argmax(self.predict, 2) saver = tf.train.Saver() # tfconfig = tf.ConfigProto(allow_soft_placement=True) # tfconfig.gpu_options.per_process_gpu_memory_fraction = 0.3 # 占用显存的比例 # self.ses = tf.Session(config=tfconfig) self.sess = tf.Session() self.sess.run(tf.global_variables_initializer()) # 全局tf变量初始化 # 加载w,b参数 saver.restore(self.sess, './model/CnnOcr-6') im = cv2.imread(img_path) im = cv2.resize(im, (self.im_width, self.im_height)) im = [im] im = np.array(im, dtype=np.float32) im -= 147 output = self.sess.run(self.max_idx_p, feed_dict={self.x: im, self.keep_drop: 1.}) ret = '' for i in output.tolist()[0]: ret = ret + self.words[int(i)] print(ret) def train(self): x_train_list, y_train_list, x_val_list, y_val_list = self.getTrainDataset() print('开始转换tensor队列') x_train_list_tensor = tf.convert_to_tensor(x_train_list, dtype=tf.string) y_train_list_tensor = tf.convert_to_tensor(y_train_list, dtype=tf.float32) x_val_list_tensor = tf.convert_to_tensor(x_val_list, dtype=tf.string) y_val_list_tensor = tf.convert_to_tensor(y_val_list, dtype=tf.float32) x_train_queue = tf.train.slice_input_producer(tensor_list=[x_train_list_tensor], shuffle=False) y_train_queue = tf.train.slice_input_producer(tensor_list=[y_train_list_tensor], shuffle=False) x_val_queue = tf.train.slice_input_producer(tensor_list=[x_val_list_tensor], shuffle=False) y_val_queue = tf.train.slice_input_producer(tensor_list=[y_val_list_tensor], shuffle=False) train_im, train_label = self.dataset_opt(x_train_queue, y_train_queue) train_batch = tf.train.batch(tensors=[train_im, train_label], batch_size=self.batch_size, num_threads=2) val_im, val_label = self.dataset_opt(x_val_queue, y_val_queue) val_batch = tf.train.batch(tensors=[val_im, val_label], batch_size=self.batch_size, num_threads=2) print('开启训练') self.learning_rate = tf.placeholder(dtype=tf.float32) # 动态学习率 self.x = tf.placeholder(tf.float32, [None, self.im_height, self.im_width, 3]) # 训练数据 self.y = tf.placeholder(tf.float32, [None, self.label_num]) # 标签 self.pred = self.cnnNet() self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=self.pred, labels=self.y)) self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate).minimize(self.loss) self.predict = tf.reshape(self.pred, [-1, self.words_num, len(self.words)]) self.max_idx_p = tf.argmax(self.predict, 2) self.y_predict = tf.reshape(self.y, [-1, self.words_num, len(self.words)]) self.max_idx_l = tf.argmax(self.y_predict, 2) self.correct_pred = tf.equal(self.max_idx_p, self.max_idx_l) self.accuracy = tf.reduce_mean(tf.cast(self.correct_pred, tf.float32)) with tf.Session() as self.sess: # 全局tf变量初始化 self.sess.run(tf.global_variables_initializer()) coordinator = tf.train.Coordinator() threads = tf.train.start_queue_runners(sess=self.sess, coord=coordinator) # 模型保存 saver = tf.train.Saver() batch_max = len(x_train_list) // self.batch_size total_step = 1 for epoch_num in range(self.epoch_max): lr = self.lr * (1 - (epoch_num/self.epoch_max) ** 2) # 动态学习率 for batch_num in range(batch_max): x_train_tmp, y_train_tmp = self.sess.run(train_batch) # print(x_train_tmp.shape, y_train_tmp.shape) # sys.exit() self.sess.run(self.optimizer, feed_dict={self.x: x_train_tmp, self.y: y_train_tmp, self.learning_rate: lr, self.keep_drop: .5}) # 输出评价标准 if total_step % 50 == 0 or total_step == 1: print() print('epoch:%d/%d batch:%d/%d step:%d lr:%.10f' % ((epoch_num + 1), self.epoch_max, (batch_num + 1), batch_max, total_step, lr)) # 输出训练集评价 train_loss, train_acc = self.sess.run([self.loss, self.accuracy], feed_dict={self.x: x_train_tmp, self.y: y_train_tmp, self.keep_drop: 1.}) print('train_loss:%.10f train_acc:%.10f' % (np.mean(train_loss), train_acc)) # 输出验证集评价 val_loss_list, val_acc_list = [], [] for i in range(int(self.val_num/self.batch_size)): x_val_tmp, y_val_tmp = self.sess.run(val_batch) val_loss, val_acc = self.sess.run([self.loss, self.accuracy], feed_dict={self.x: x_val_tmp, self.y: y_val_tmp, self.keep_drop: 1.}) val_loss_list.append(np.mean(val_loss)) val_acc_list.append(np.mean(val_acc)) print(' val_loss:%.10f val_acc:%.10f' % (np.mean(val_loss), np.mean(val_acc))) total_step += 1 # 保存模型 if (epoch_num + 1) % self.save_epoch == 0: print('正在保存模型：') saver.save(self.sess, './model/CnnOcr', global_step=(epoch_num + 1)) coordinator.request_stop() coordinator.join(threads) def cnnNet(self): """ cnn网络 :return: """ weight = { # 输入 128*64*3 # 第一层 'wc1_1': tf.get_variable('wc1_1', [5, 5, 3, 32]), # 卷积输出：128*64*32 'wc1_2': tf.get_variable('wc1_2', [5, 5, 32, 32]), # 卷积输出：128*64*32 # 池化输出：64*32*32 # 第二层 'wc2_1': tf.get_variable('wc2_1', [5, 5, 32, 64]), # 卷积输出：64*32*64 'wc2_2': tf.get_variable('wc2_2', [5, 5, 64, 64]), # 卷积输出：64*32*64 # 池化输出：32*16*64 # 第三层 'wc3_1': tf.get_variable('wc3_1', [3, 3, 64, 64]), # 卷积输出：32*16*256 'wc3_2': tf.get_variable('wc3_2', [3, 3, 64, 64]), # 卷积输出：32*16*256 # 池化输出：16*8*256 # 第四层 'wc4_1': tf.get_variable('wc4_1', [3, 3, 64, 64]), # 卷积输出：16*8*64 'wc4_2': tf.get_variable('wc4_2', [3, 3, 64, 64]), # 卷积输出：16*8*64 # 池化输出：8*4*64 # 全链接第一层 'wfc_1': tf.get_variable('wfc_1', [8*4*64, 2048]), # 全链接第二层 'wfc_2': tf.get_variable('wfc_2', [2048, 2048]), # 全链接第三层 'wfc_3': tf.get_variable('wfc_3', [2048, self.label_num]), } biase = { # 第一层 'bc1_1': tf.get_variable('bc1_1', [32]), 'bc1_2': tf.get_variable('bc1_2', [32]), # 第二层 'bc2_1': tf.get_variable('bc2_1', [64]), 'bc2_2': tf.get_variable('bc2_2', [64]), # 第三层 'bc3_1': tf.get_variable('bc3_1', [64]), 'bc3_2': tf.get_variable('bc3_2', [64]), # 第四层 'bc4_1': tf.get_variable('bc4_1', [64]), 'bc4_2': tf.get_variable('bc4_2', [64]), # 全链接第一层 'bfc_1': tf.get_variable('bfc_1', [2048]), # 全链接第二层 'bfc_2': tf.get_variable('bfc_2', [2048]), # 全链接第三层 'bfc_3': tf.get_variable('bfc_3', [self.label_num]), } # 第一层 net = tf.nn.conv2d(self.x, weight['wc1_1'], [1, 1, 1, 1], 'SAME') # 卷积 net = tf.nn.bias_add(net, biase['bc1_1']) net = tf.nn.relu(net) # 加b 然后激活 print('conv1', net) net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') # 池化 print('pool1', net) # 第二层 net = tf.nn.conv2d(net, weight['wc2_1'], [1, 1, 1, 1], padding='SAME') # 卷积 net = tf.nn.bias_add(net, biase['bc2_1']) net = tf.nn.relu(net) # 加b 然后激活 print('conv2', net) net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') # 池化 print('pool2', net) # 第三层 net = tf.nn.conv2d(net, weight['wc3_1'], [1, 1, 1, 1], padding='SAME') # 卷积 net = tf.nn.bias_add(net, biase['bc3_1']) net = tf.nn.relu(net) # 加b 然后激活 print('conv3', net) net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') # 池化 print('pool3', net) # 第四层 net = tf.nn.conv2d(net, weight['wc4_1'], [1, 1, 1, 1], padding='SAME') # 卷积 net = tf.nn.bias_add(net, biase['bc4_1']) net = tf.nn.relu(net) # 加b 然后激活 print('conv4', net) net = tf.nn.max_pool(net, [1, 2, 2, 1], [1, 2, 2, 1], padding='VALID') # 池化 print('pool4', net) # 拉伸flatten，把多个图片同时分别拉伸成一条向量 net = tf.reshape(net, shape=[-1, weight['wfc_1'].get_shape()[0]]) print('拉伸flatten', net) # 全链接层 # fc第一层 net = tf.matmul(net, weight['wfc_1']) + biase['bfc_1'] net = tf.nn.dropout(net, self.keep_drop) net = tf.nn.relu(net) print('fc第一层', net) # fc第二层 net = tf.matmul(net, weight['wfc_2']) + biase['bfc_2'] net = tf.nn.dropout(net, self.keep_drop) net = tf.nn.relu(net) print('fc第二层', net) # fc第三层 net = tf.matmul(net, weight['wfc_3']) + biase['bfc_3'] print('fc第三层', net) return net def getTrainDataset(self): """ 整理数据集，把图像resize为128*64*3，训练集做成self.im_total_num*128*64*3，把label做成0,1向量形式 :return: """ train_data_list = os.listdir('./dataset/train/') print('共有%d张训练图片，读取%d张：' % (len(train_data_list), self.train_max_num)) random.shuffle(train_data_list) # 打乱顺序 y_val_list, y_train_list = [], [] x_val_list = train_data_list[:self.val_num] for x_val in x_val_list: words_tmp = x_val.split('.')[0].split('_')[1] y_val_list.append([1 if _w == w else 0 for w in words_tmp for _w in self.words]) x_train_list = train_data_list[self.val_num:self.train_max_num] for x_train in x_train_list: words_tmp = x_train.split('.')[0].split('_')[1] y_train_list.append([1 if _w == w else 0 for w in words_tmp for _w in self.words]) return x_train_list, y_train_list, x_val_list, y_val_list def createCaptchaDataset(self): """ 生成训练用图片数据集 :return: """ image = ImageCaptcha(width=self.im_width, height=self.im_height, font_sizes=(56,)) for i in range(self.im_total_num): words_tmp = '' for j in range(self.words_num): words_tmp = words_tmp + random.choice(self.words) print(words_tmp, type(words_tmp)) im_path = './dataset/train/%d_%s.png' % (i, words_tmp) print(im_path) image.write(words_tmp, im_path) return True def dataset_opt(self, x_train_queue, y_train_queue): """ 处理图片和标签 :param queue: :return: """ queue = x_train_queue[0] contents = tf.read_file('./dataset/train/' + queue) im = tf.image.decode_jpeg(contents) im = tf.image.resize_images(images=im, size=[self.im_height, self.im_width]) im = tf.reshape(im, tf.stack([self.im_height, self.im_width, 3])) im -= 147 # 去均值化 # im /= 255 # 将像素处理在0~1之间，加速收敛 # im -= 0.5 # 将像素处理在-0.5~0.5之间 return im, y_train_queue[0] if __name__ == '__main__': opt_type = sys.argv[1:][0] instance = CnnOcr() if opt_type == 'create_dataset': instance.createCaptchaDataset() elif opt_type == 'train': instance.train() elif opt_type == 'test': instance.test('./dataset/test/0_HZDZ.png') elif opt_type == 'start': # 将session持久化到内存中 instance.test('./dataset/test/0_HZDZ.png') # 启动web服务 # http://127.0.0.1:5050/captchaOcr?img_path=./dataset/test/2_SYVD.png @app.route('/captchaOcr', methods=['GET']) def captchaOcr(): img_path = request.args.to_dict().get('img_path') print(img_path) ret = instance.captchaOcr(img_path) print(ret) return json.dumps({'img_path': img_path, 'ocr_ret': ret}) app.run(host='0.0.0.0', port=5050, debug=False) 复制代码

↧

Python’s Frameworks Comparison: Django, Pyramid, Flask, Sanic, Tornado, BottleP ...

December 5, 2018, 1:44 pm

≫ Next: Install OpenCV 4 on Red Hat (C++ and Python)

≪ Previous: 2.CNN图片多标签分类（基于TensorFlow实现验证码识别OCR）

python frameworks can be divided into a few areas as Python is a very diverse language and can be used in various fields. Each of these fields has its own frameworks, some of which are more popular than others. One of the most popular fields Python is applied in is web development, which we will focus on today.

The presented frameworks can be broken down into three categories: full stack frameworks, which offer a lot of out of the box features for the server and client side; microframeworks, which offer server-side support (sometimes, they can be extended to the client side) and allow for creating a web application just by using a single Python file; and, finally, asynchronous frameworks, which handle requests asynchronously .

Full-stack web frameworks Django

Django is one of the most popular Python frameworks . It offers a lot of out-of-the-box functionalities like Admin Panel or Generic Views and Forms. Django’s main characteristics are:

one managing script (“manage.py”) that can be used for performing most of the framework specific actions (like starting the development server, creating an admin user, collecting static files etc.),

synchronous request processing,

MTV (model-template-view) architecture pattern (which is variation of the model-view-controller pattern),

custom object-relational mapping (ORM) for communicating with the database,

usage of functions and classes for view context creation and action handling,

Django is strict and forces its own coding style on the developer a lot of meta programming,

very good, extensive documentation with examples,

custom HTML templates rendering engine,

custom URL routing system,

compliance with the WSGI standard,

support for static files - URL routing as well as detection and collection,

a large number of external modules, e.g. Django REST Framework, Django CMS, Django Channels ( websockets ).

Django is a good fit for bigger projects, where extensive backend and frontend support is required or in cases where time plays a crucial role, as Django offers a large number of ready components. Coding in Django mostly relies on customizing generic parts of code. The developer must follow a set of rules that come with given element. For projects where a lot of code flexibility is desired, Django might not be the best choice.

37,514 Github stars / 183,588 StackOverflow questions

Web2py

Web2py focuses on security, development speed and ease of use. It offers a lot of features out of the box: a web server, database, admin panel, wiki or grid widgets. Framework main characteristics are:

synchronous request processing,

custom Database Abstraction Layer (DAL) that acts as ORM,

forces an MVC structure,

functions and classes can be used for creating Controllers,

strict “ There should be only one way of doing things” philosophy,

rich documentation with a lot of examples,

custom HTML engine that allows Python code to be used in the templates,

custom routing - URL function that generates internal paths for the actions and static files,

support for the WSGI standard, but it’s possible to use CGI (Common Gateway Interface), FastCGI, GAE (Google App Engine) or other,

offers static file routing and streaming during development,

has built-in REST services but requires the Tornado framework for Web Socket usage.

Web2py was highly inspired by Ruby on Rails and Django frameworks and takes what’s the best from both of them. It can be a good choice for programmers who want to migrate from Ruby or for ones that are bored with Django but are looking for another big and feature-rich framework. It offers an “admin” app, which acts as a web-based IDE for application development and management (e.g. app creations, code editor). It is also supported by PyCharm. In general, Web2Py doesn’t lack any functionalities that Django has. These two frameworks can be used to fulfil the same tasks. Web2Py is younger, and it has a smaller community than Django, so it might be a little harder to find help in case of trouble.

1,665 GitHub stars / 2,004 StackOverflow questions

TurboGears

TurboGears connects a lot of external services to create one functional framework:

synchronous request processing,

model-view-controller (MVC) pattern,

uses SQLAlchemy ORM,

↧

Install OpenCV 4 on Red Hat (C++ and Python)

December 5, 2018, 1:42 pm

≫ Next: Install OpenCV 3.4.4 on Red Hat (C++ and Python)

≪ Previous: Python’s Frameworks Comparison: Django, Pyramid, Flask, Sanic, Tornado, BottleP ...

Install OpenCV 4 on Red Hat (C++ and Python)

OpenCV released OpenCV-3.4.4 and OpenCV-4.0.0 on 20th November. There have been a lot of bug fixes and other changes in these versions. The release highlights are as follows:

OpenCV is now C++11 library and requires C++11-compliant compiler. Minimum required CMake version has been raised to 3.5.1. A lot of C API from OpenCV 1.x has been removed. Persistence (storing and loading structured data to/from XML, YAML or JSON) in the core module has been completely reimplemented in C++ and lost the C API as well. New module G-API has been added, it acts as an engine for very efficient graph-based image procesing pipelines. dnn module now includes experimental Vulkan backend and supports networks in ONNX format. The popular Kinect Fusion algorithm has been implemented and optimized for CPU and GPU (OpenCL)
QR code detector and decoder have been added to the objdetect module. Very efficient and yet high-quality DIS dense optical flow algorithm has been moved from opencv_contrib to the video module.

In this post, we will provide a bash script for installing OpenCV-4.0 (C++, python 2.7 and Python 3.4) on Red Hat Enterprise linux 7.6 . We will also briefly study the script to understand what’s going in it. Note that this script will install OpenCV in a local directory and not on the entire system. Let’s jump in

If you are still not able to install OpenCV on your system, but want to get started with it, we suggest using our docker images with pre-installed OpenCV, Dlib, miniconda and jupyter notebooks along with other dependencies as described in this blog .

1. Select OpenCV Version to install echo "OpenCV installation by learnOpenCV.com" echo "Installing OpenCV - 4.0" #Specify OpenCV version cvVersion="master"

We are also going to clean build directories and create installation directory.

# Clean build directories rm -rf opencv rm -rf opencv_contrib mkdir installation mkdir installation/OpenCV-"$cvVersion"

Finally, we will be storing the current working directory in cwd variable. We are also going to refer to this directory as OpenCV_Home_Dir throughout this blog.

# Save current working directory cwd=$(pwd) 2. Install packages

Next we are going to install Python 3.4, and other libraries and packages that will be required for OpenCV installation.

sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm sudo yum -y install epel-release sudo yum -y install git gcc gcc-c++ cmake3 sudo yum -y install qt5-qtbase-devel sudo yum install -y python34 python34-devel python34-pip sudo yum install -y python python-devel python-pip sudo yum -y install python-devel numpy python34-numpy sudo yum -y install gtk2-devel sudo rpm --import http://li.nux.ro/download/nux/RPM-GPG-KEY-nux.ro sudo rpm -Uvh http://li.nux.ro/download/nux/dextop/el7/x86_64/nux-dextop-release-0-5.el7.nux.noarch.rpm sudo yum install -y ffmpeg sudo yum install -y ffmpeg-devel sudo yum install -y libpng-devel sudo yum install -y openexr-devel sudo yum install -y libwebp-devel sudo yum -y install libjpeg-turbo-devel sudo yum install -y freeglut-devel mesa-libGL mesa-libGL-devel sudo yum -y install libtiff-devel sudo yum -y install libdc1394-devel --skip-broken sudo yum -y install tbb-devel eigen3-devel sudo yum -y install boost boost-thread boost-devel

Download CodeTo easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Download Code

3. Create Python Virtual Environments

We will create Python virtual environments to properly differentiate between different OpenCV versions.

We are going to install virtualenv and virtualenvwrapper modules to create Python virtual environments

sudo pip3 install virtualenv virtualenvwrapper echo "export WORKON_HOME=$HOME/.virtualenvs" >> ~/.bashrc echo "export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3" >> ~/.bashrc echo "source /usr/bin/virtualenvwrapper.sh" >> ~/.bashrc export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3 source /usr/bin/virtualenvwrapper.sh

Now let’s create the virtual environments and install some modules.

3.1. Python 3.4 Virtual Environment mkvirtualenv OpenCV-"$cvVersion"-py3 -p python3 workon OpenCV-"$cvVersion"-py3 pip install cmake pip install numpy scipy matplotlib scikit-image scikit-learn ipython dlib # quit virtual environment deactivate 3.2. Python 2.7 Virtual Environment mkvirtualenv OpenCV-"$cvVersion"-py2 -p python2 workon OpenCV-"$cvVersion"-py2 pip install cmake pip install numpy scipy matplotlib scikit-image scikit-learn ipython dlib # quit virtual environment deactivate 4. Clone GitHub Repositories

We will next clone opencv and opencv_contrib GitHub repositories.

git clone https://github.com/opencv/opencv.git cd opencv git checkout $cvVersion cd .. git clone https://github.com/opencv/opencv_contrib.git cd opencv_contrib git checkout $cvVersion cd .. 5. Build OpenCV

Now comes the part we have been so eagerly waiting for building OpenCV. First, we will create build directory.

December 4 2018: There is an issue with CMakeLists.txt file in opencv/samples/cpp so we will edit the file (temporary fix).

cd opencv echo "find_package(OpenGL REQUIRED)" >>./samples/cpp/CMakeLists.txt echo "find_package(GLUT REQUIRED)" >> ./samples/cpp/CMakeLists.txt sed -i '38s/.*/ ocv_target_link_libraries(${tgt} ${OPENCV_LINKER_LIBS} ${OPENCV_CPP_SAMPLES_REQUIRED_DEPS} ${OPENGL_LIBRARIES} ${GLUT_LIBRARY})/' ./samples/cpp/CMakeLists.txt mkdir build cd build

Next, we will use CMake and make to build OpenCV.

cmake3 -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=$cwd/installation/OpenCV-"$cvVersion" \ -D INSTALL_C_EXAMPLES=ON \ -D INSTALL_PYTHON_EXAMPLES=ON \ -D WITH_TBB=ON \ -D WITH_V4L=ON \ -D OPENCV_SKIP_PYTHON_LOADER=ON \ -D OPENCV_GENERATE_PKGCONFIG=ON \ -D OPENCV_PYTHON3_INSTALL_PATH=$HOME/.virtualenvs/OpenCV-$cvVersion-py3/lib/python3.4/site-packages \ -D OPENCV_PYTHON2_INSTALL_PATH=$HOME/.virtualenvs/OpenCV-$cvVersion-py2/lib/python2.7/site-packages \ -D WITH_QT=ON \ -D WITH_OPENGL=ON \ -D PYTHON_DEFAULT_EXECUTABLE=/usr/bin/python3 \ -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules \ -D ENABLE_CXX11=ON \ -D BUILD_EXAMPLES=ON .. make -j$(nproc) make install cd $cwd

And voila! We have installed OpenCV!

Become an expert in Computer Vision , Machine Learning , and AI in 12-weeks! Check out our course

Computer Vision Course

6. How to use OpenCV in C++

There are two ways to use OpenCV in C++, the preferred way is to use CMake , the other one being command line compilation using g++ . We will have a look at both ways.

6.1. Using CMakeLists.txt

The basic structure of your CMakeLists.txt will stay the same. Only difference being, that you will have to set OpenCV_DIR as shown below.

SET(OpenCV_DIR <OpenCV_Home_Dir>/installation/OpenCV-master/share/OpenCV/)

Make sure that you replace OpenCV_Home_Dir with correct path. For example, in my case:

SET(OpenCV_DIR /home/hp/OpenCV_installation/installation/OpenCV-master/share/OpenCV/)

Once you have made your CMakeLists.txt, follow the steps given below.

mkdir build && cd build cmake .. cmake --build . --config Release

This will generate your executable file in build directory.

6.2. Using g++

To compile a sample file (let’s say my_sample_file.cpp ), use the following command.

g++ `pkg-config --cflags --libs <OpenCV_Home_Dir>/installation/OpenCV-master/lib/pkgconfig/opencv.pc` my_sample_file.cpp -o my_sample_file 7. How to use OpenCV in Python

To use the OpenCV version installed using Python script, first we activate the Python Virtual Environment.

For OpenCV-4.0 : Python 3 workon OpenCV-master-py3 For OpenCV-4.0 : Python 2 workon OpenCV-master-py2

Once you have activated the virtual environment, you can enter Python shell and test OpenCV version.

ipython import cv2 print(cv2.__version__)

Hope this script proves to be useful for you :). Stay tuned for more interesting stuff. In case of any queries, feel free to comment below and we will get back to you as soon as possible.

Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in this post, please subscribe to our newsletter. You will also receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now

↧

Install OpenCV 3.4.4 on Red Hat (C++ and Python)

December 5, 2018, 1:40 pm

≫ Next: 一个非常牛逼的仓库

≪ Previous: Install OpenCV 4 on Red Hat (C++ and Python)

Install OpenCV 3.4.4 on Red Hat (C++ and Python)

OpenCV released OpenCV-3.4.4 and OpenCV-4.0.0 on 20th November. There have been a lot of bug fixes and other changes in these versions. The release highlights are as follows:

In this post, we will provide a bash script for installing OpenCV-3.4.4 (C++, python 2.7 and Python 3.4) on Red Hat Enterprise linux 7.6 . We will also briefly study the script to understand what’s going in it. Note that this script will install OpenCV in a local directory and not on the entire system. Let’s jump in

1. Select OpenCV Version to install echo "OpenCV installation by learnOpenCV.com" echo "Installing OpenCV - 3.4.4" #Specify OpenCV version cvVersion="3.4.4"

We are also going to clean build directories and create installation directory.

# Clean build directories rm -rf opencv rm -rf opencv_contrib mkdir installation mkdir installation/OpenCV-"$cvVersion"

Finally, we will be storing the current working directory in cwd variable. We are also going to refer to this directory as OpenCV_Home_Dir throughout this blog.

# Save current working directory cwd=$(pwd) 2. Install packages

Next we are going to install Python 3.4, and other libraries and packages that will be required for OpenCV installation.

Download CodeTo easily follow along this tutorial, please download code by clicking on the button below. It's FREE!

Download Code

3. Create Python Virtual Environments

We will create Python virtual environments to properly differentiate between different OpenCV versions.

We are going to install virtualenv and virtualenvwrapper modules to create Python virtual environments

Now let’s create the virtual environments and install some modules.

We will next clone opencv and opencv_contrib GitHub repositories.

git clone https://github.com/opencv/opencv.git cd opencv git checkout 3.4 cd .. git clone https://github.com/opencv/opencv_contrib.git cd opencv_contrib git checkout 3.4 cd .. 5. Build OpenCV

Now comes the part we have been so eagerly waiting for building OpenCV. First, we will create build directory.

cd opencv mkdir build cd build

Next, we will use CMake and make to build OpenCV.

And voila! We have installed OpenCV!

Become an expert in Computer Vision , Machine Learning , and AI in 12-weeks! Check out our course

Computer Vision Course

6. How to use OpenCV in C++

There are two ways to use OpenCV in C++, the preferred way is to use CMake , the other one being command line compilation using g++ . We will have a look at both ways.

6.1. Using CMakeLists.txt

The basic structure of your CMakeLists.txt will stay the same. Only difference being, that you will have to set OpenCV_DIR as shown below.

SET(OpenCV_DIR <OpenCV_Home_Dir>/installation/OpenCV-3.4.4/share/OpenCV/)

Make sure that you replace OpenCV_Home_Dir with correct path. For example, in my case:

SET(OpenCV_DIR /home/hp/OpenCV_installation/installation/OpenCV-3.4.4/share/OpenCV/)

Once you have made your CMakeLists.txt, follow the steps given below.

mkdir build && cd build cmake .. cmake --build . --config Release

This will generate your executable file in build directory.

6.2. Using g++

To compile a sample file (let’s say my_sample_file.cpp ), use the following command.

g++ `pkg-config --cflags --libs <OpenCV_Home_Dir>/installation/OpenCV-3.4.4/lib/pkgconfig/opencv.pc` my_sample_file.cpp -o my_sample_file 7. How to use OpenCV in Python

To use the OpenCV version installed using Python script, first we activate the Python Virtual Environment.

For OpenCV-3.4.4 : Python 3 workon OpenCV-3.4.4-py3 For OpenCV-3.4.4 : Python 2 workon OpenCV-3.4.4-py2

Once you have activated the virtual environment, you can enter Python shell and test OpenCV version.

ipython import cv2 print(cv2.__version__)

Hope this script proves to be useful for you :). Stay tuned for more interesting stuff. In case of any queries, feel free to comment below and we will get back to you as soon as possible.

Subscribe & Download Code

Subscribe Now

↧

一个非常牛逼的仓库

December 5, 2018, 12:12 pm

≫ Next: 深入理解Python[3] 当我们在讨论变量时，我们在讨论什么

≪ Previous: Install OpenCV 3.4.4 on Red Hat (C++ and Python)

GitHub 上有个非常牛逼的仓库，作者收集了很多有趣且鲜为人知的 python “特性”。这些特性有些是真正的特性，也有些是陷阱，因为光从表面看起来就觉得反人类直觉，而所有的一切，其实都是 Cpython 内部实现中利弊之间的的权衡，有优点就有缺点。举个和字符串相关的特别例子：

>>> a = "wtf" >>> b = "wtf" >>> a is b True 复制代码

这个比较好理解，a 和 b 指向同一个对象。

>>> a = "wtf!" >>> b = "wtf!" >>> a is b False 复制代码

多了一个感叹号:exclamation:，两个变量指向的就不是同一个对象了。

再来看：

>>> a, b = "wtf!", "wtf!" >>> a is b True 复制代码

如果把两个变量写在一行，又指向同一个对象了。初学者看了一脸懵逼，即使你是有经验的 Python 开发者，第一次见到这种情况也会大为惊讶，原来 Python 中还有如此骚操作。如果把代码放在IDE中执行，结果又不一样：

a = "wtf!" b = "wtf!" print(a is b) # True 复制代码

输出结果是 True，是不是觉得太匪夷所思了。其实，这里面涉及到的知识点是CPython 为了提高性能对字符串对象做的优化，涉及到字符串的 intern 机制，还有代码块的知识等概念。

当然这个库还总结了很多很多类似让你惊讶的操作，比如关于哈希操作，有关字典特性，默认可变参数陷阱等等。总共有近100来个知识点总结。强烈建议大家好好研读，有利于对 Pytho 的进一步提高。目前这个库已经有将近10000人关注，而且现在有网友提供了中文版了，这无疑给了不习惯阅读英语阅的人有一个利好消息。

英文地址： github.com/satwikkansa…

中文地址： github.com/leisurelich…

方便你阅读，我将其制作了一份PDF格式了，方便你离线阅读。你可以在微信公众号「Python之禅」后台回复关键词「2」获取这本电子书！

↧

深入理解Python[3] 当我们在讨论变量时，我们在讨论什么

December 5, 2018, 3:28 pm

≫ Next: codingdirectional: Search for any duplicate file with python

≪ Previous: 一个非常牛逼的仓库

这个概念其实大多数人都晓得了，就简单提一下

当我们给一个变量赋值时，python会在内存地址中新建一个对象，然后变量名只是指向了这个对象的内存地址，变量名只是一个引用而已

引用计数(Reference Counting) 定义

我们先看看维基百科对引用计数的定义

引用计数是计算机编程语言中的一种内存管理技术，是指将资源（可以是对象、内存或磁盘空间等等）的被引用次数保存起来，当被引用次数变为零时就将其释放的过程。使用引用计数技术可以实现自动资源管理的目的。同时引用计数还可以指使用引用计数技术回收未使用资源的垃圾回收算法。

字面意思还是很好理解这个概念的

my_var = 10 #这时候指向存储“10”这个对象的引用为1 my_var_2 = my_var #此时引用数目变成了2 my_var = "hello" #引用数目变成1 my_var_2 = "goodbye" #引用数目变成0，内存得以释放复制代码获取引用数

Python中有两种方法获取对象的应用数目

一个是 sys.getrefcount(spam)

还有一个是 ctypes.c_long.from_address(address).value

import sys, ctypes spam = [4,5,6] r1 = sys.getrefcount(spam) r2 = ctypes.c_long.from_address(id(spam)).value print(r1,r2) 复制代码

输出

2 1

这里要注意下，当我们使用函数时，有一个实参传给形参的过程，在这个过程中，形参也是指向了同样的地址，所以在函数终结之前，会有两个引用数

而第二个虽然也是用了id函数，但是id函数调用结束后，行参就被“销毁”了，所以这时候再检测内存地址时，就只有一个引用数了

垃圾回收(Garbage Collector) 循环引用(Circular References)

我们先来看一个简单的例子

定义一个my_var变量，指向ObjectA的内存地址，而ObjectA里面又有一个变量var_1指向ObjectB

如果我们将my_var 指向其他内存地址，比如说 my_var = None ，这时候ObjectA的内存地址没有被引用，因此被销毁，这导致var_1也被跟着一起销毁，ObjectB的内存地址也在此时没被引用，被销毁

上面一切都没有问题，但是我们看看下面这个情况

与之前不同的是，ObjectB内的变量var_2如果指向ObjectA的话，如果取消my_var 对ObjectA的引用，按照内存管理的规则，最终会导致一个内部循环

这些对象无法被外界获取，但是又实实在在地存在内存中，大量的积累就会导致内存泄漏的情况，这并不是我们希望的，所以就有了垃圾回收的机制

垃圾回收(Garbage Collector) 可以通过导入gc模块来控制默认情况下是开启的如果能确保自己不会写出循环引用的代码可以选择将其关闭，以此来释放资源(其实这释放的量很小啦)

下面我们通过代码来看看这个机制

import ctypes, gc gc.disable() # 关闭垃圾回收 def ref_count(address): return ctypes.c_long.from_address(address).value def object_by_id(object_id): for obj in gc.get_objects(): if id(obj) == object_id: return "Object exists" return "Not Found" class A(): def __init__(self): self.b = B(self) print('A: self: {0} , b: {1}'.format(hex(id(self)),hex(id(self.b)))) class B(): def __init__(self, a): self.a = a print('B: self: {0} , a: {1}'.format(hex(id(self)),hex(id(self.a)))) my_var = A() 复制代码

我们建立来两个类，A中的b变量指向B的实例，而B中的a变量又指向A本身，最后建立A的实例，正好符合上面的循环

我们现在看下A，B的实例个有多少个引用

a_id = id(my_var) b_id = id(my_var.b) print("ref_count_of_a:{0}, ref_count_of_b:{1}".format(ref_count(a_id),ref_count(b_id))) 复制代码

输出

B: self: 0x1068480f0 , a: 0x1068480b8 A: self: 0x1068480b8 , b: 0x1068480f0 ref_count_of_a:2, ref_count_of_b:1 复制代码

没问题

现在我们令 my_var = None

再来打印输出结果输出

B: self: 0x10b0ec128 , a: 0x10b0ec0f0 A: self: 0x10b0ec0f0 , b: 0x10b0ec128 ref_count_of_a:1, ref_count_of_b:1 复制代码

由于之前关闭了垃圾回收，所以这里的内部循环就没有被销毁

这时我们即时执行下垃圾回收 gc.collect() ,就会输出我们想要的结果了

ref_count_of_a:0, ref_count_of_b:0 复制代码动态类型与静态类型(Dynamic Typing and Static Typing)

Python的定义变量时是动态的，也就是说一个变量名可以直接从一个字符串对象重新指向数字对象，像其它的静态类型的语言，比如Java，如果重新给一个字符串变量赋值数字，就会出错

String my_var = "hello" mv_var = 123 //(会报错，因为Java的变量是静态类型) 复制代码

有个点提一下，当我们给变量重新赋值时，Python做的事情是在内存中新建一个对象，而不是在原有内存地址上改变对象内容

可变性(Mutability)

对象内部的数据(state)如果可以改变,就叫可变(Mutable)的；否则就是不可变的(Immutable)

Python中不可变的数据类型

Numbers(int,float,Booleans,etx) Strings Tuples Frozen Sets User-Defined Classes

可变的数据类型

Lists Sets Dictionaries User-Defined Classes

:warning: 元组是不可变的，但是却可以包含可变的元素，例如如下例子

a = [1,2] b = [3,4] t = (a,b) # 此时t = ([1,2],[3,4]) a.append(3) b.append(5) # 此时t = ([1,2,3],[3,4,5]) 复制代码共享引用(Shared References)和可变性(Mutablility)

如果你在Python中这样定义

a = 10 b = 10 # id(a) 等于 id(b) 复制代码

Python的内存管理会让两个变量名自动共享同一个内存地址(相当于执行来b)，这是对于部分简单的不可变的对象而言

而如果定义任何两个可变的对象，Python则不会这么做

a = [1,2,3] b = [1,2,3] # id(a) 不等于 id(b) 复制代码一切皆为对象

“一切都是对象”在别的编程语言中可能还有点牵强；但是在Python中，这是真真切切的道理

除了简单的数据类型，像一些运算符(+,-,*,/,is,...)这些也是某个类的实例

我们平常接触的function，class(类本身，不是实例)也都是Function Type, Class Type的实例

所以可以得到一个这样的结论

所有的对象(包括函数)都给可以赋值给一个变量

而所有的对象(包括函数)有可以作为参数传递给函数

函数又可以返回一个对象(包括函数)

可以参考如下例子

def square(a): return a ** 2 def cube(a): return a ** 3 def select_function(fn_id): return square if fn_id == 1 else cube f = select_function(1) print(f(3)) f = select_function(3) print(f(3)) def exec_function(fn, n): return fn(n) print(exec_function(cube,5)) 复制代码

输出

9 27 125 复制代码驻留(Interning) 概念

按需复用对象(reusing objects on-demand)

整数驻留在启动时，Python(CPython)会预载一定范围的整数类型([-5,256])

任何这个范围内的整数在创建时，都会产生一个共享引用,从而减少内存的占用

字符串驻留

当Python代码被编译的时候，有写标识符(identifier)会被驻留，其中包括变量名称，函数名称，类名称等

如果一个字符串长得像标识符(identifier)，哪怕是一个无效的的标识符，例如 1spam 这样的字符串，也有可能会被驻留

通过 sys.intern() 方法可以强制驻留特定的字符串

为什么需要驻留

速度优化！

如果让Python比较两个字符串是否相等，python是要从字符串的第一个字符开始进行逐个比较的，如果字符串相当长，那么基于逐个比较的方法速度就会特别慢，而驻留之后只需要比较内存地址，可以极大地优化速度

我们来做一个测试

import time, sys def compare_using_equals(n): a = 'a long string that is not interned' * 500 b = 'a long string that is not interned' * 500 for i in range(n): if a == b: pass def compare_using_interning(n): a = sys.intern('a long string that is not interned' * 500) b = sys.intern('a long string that is not interned' * 500) for i in range(n): if a is b: pass e_start = time.perf_counter() compare_using_equals(10000000) e_end = time.perf_counter() i_start = time.perf_counter() compare_using_interning(10000000) i_end = time.perf_counter() print('Compare using equals finished test in {0} seconds \n Compare using interning finished test in {1} seconds'.format(e_end-e_start,i_end-i_start)) 复制代码

我们看下输出

Compare using equals finished test in 8.94854034400123 seconds Compare using interning finished test in 0.4564070480009832 seconds 复制代码

我们可以看出，驻留的效率比逐个比较的效率快了近20倍

一些其它的优化常量表达式

当我们输入 a = 24*60 时，python会提前计算数值，并在编译的时候直接替换该数值

当常量序列表达式的结果的长度小于20时，也会被提前计算

def my_func(): a = 24*60 b = (1,2) * 5 c = 'abc' * 3 d = 'ab' * 11 e = 'the quick brown fox' * 5 f = ['a', 'b'] * 3 print(my_func.__code__.co_consts) 复制代码

输出

(None, 24, 60, 1, 2, 5, 'abc', 3, 'ab', 11, 'the quick brown fox', 'a', 'b', 1440, (1, 2), (1, 2, 1, 2, 1, 2, 1, 2, 1, 2), 'abcabcabc') 复制代码

从上面的结果看到，a，b，c的值被计算后放在了co_consts里面，而d，e的值大于20了，f的值是可变的,所以这三个并没有放到co_consts里面

成员测验

当我们写

def my_func(e): if e in [1,2,3]: pass 复制代码

这样的代码时，python会将阵列转为元组。

同样的，如果是数组,python也会自动将其转换为冷冻数组(frozenset)

def my_func(e): if e in [1,2,3]: pass print(my_func.__code__.co_consts) 复制代码

输出 (None, 1, 2, 3, (1, 2, 3))

def my_func(e): if e in {1,2,3}: pass print(my_func.__code__.co_consts) 复制代码

输出 (None, 1, 2, 3, frozenset({1, 2, 3}))

在成员测试中,set的效率要远远高于阵列或者元组，来做一个测验

import string, time def membership_test(n,container): start = time.perf_counter() for i in range(n): if 'z' in container: pass end = time.perf_counter() return(end-start) print('list: %s' % membership_test(10000000,list(string.ascii_letters)), 'tuple: %s' % membership_test(10000000,tuple(string.ascii_letters)), 'set: %s' % membership_test(10000000,set(string.ascii_letters)), sep='\n') 复制代码

输出

list: 6.4466956019969075 tuple: 6.477438930000062 set: 0.6009954499968444 [Finished in 13.6s] 复制代码

从上面看出，set的速度明显要快很多

↧

codingdirectional: Search for any duplicate file with python

December 5, 2018, 3:26 pm

≫ Next: [译] 鲜为人知的数据科学 Python 库

≪ Previous: 深入理解Python[3] 当我们在讨论变量时，我们在讨论什么

Welcome to the new chapter of this remove duplicate file project. After creating a thread class, open a file and open a folder in the previous few chapters, we can now start to move toward our main objective which is to search for a duplicate file in another folder.

What we will do in this chapter is to write a program to select a file and select a folder which we want to look for the file with the same filename as the one which we have just selected earlier. If the file with the same name does exists in another folder then we will print the file found statement on the application’s label or else we will print out no file with the same name found statement on that same application’s label.

Lets get started. First of all we will edit the main program file to select a file and the folder, the open folder dialog will only open if and only if a file has been selected.

from tkinter import *
from tkinter import filedialog
from Remove import Remove
win = Tk() # 1 Create instance
win.title("Multitas") # 2 Add a title
win.resizable(0, 0) # 3 Disable resizing the GUI
win.configure(background='black') # 4 change background color
# 5 Create a label
aLabel = Label(win, text="Remove duplicate file", anchor="center")
aLabel.grid(column=0, row=1)
aLabel.configure(foreground="white")
aLabel.configure(background="black")
# 6 Create a selectFile function to be used by button
def selectFile():
filename = filedialog.askopenfilename(initialdir="/", title="Select file")
if(filename != ''):
filename = filename.split('/')[-1] # this is for the windows separator only
folder = filedialog.askdirectory() # 7 open a folder then create and start a new thread to print those filenames from the selected folder
remove = Remove(folder, aLabel, filename)
remove.start()
# 8 Adding a Button
action = Button(win, text="Open Folder", command=selectFile)
action.grid(column=0, row=0) # 9 Position the button
action.configure(background='brown')
action.configure(foreground='white')
win.mainloop() # 10 start GUI

Now we will modify our Remove thread class to search for a file inside another folder.

import threading
import os
class Remove(threading.Thread):
def __init__(self, massage, aLabel, filename):
threading.Thread.__init__(self)
self.massage = massage
self.label = aLabel
self.filename = filename
def run(self):
text_filename = 'There is no duplicate item'
filepaths = os.listdir(self.massage)
for filepath in filepaths:
if(filepath == self.filename):
text_filename = 'Found duplicate item'
self.label.config(text=text_filename)
return

If you run the above program you will see this statement appears if the duplicate file has been found!

codingdirectional: Search for any duplicate file with python

Found the duplicate item

In the next chapter we will start to remove the duplicate file from another folder!

↧

[译] 鲜为人知的数据科学 Python 库

December 5, 2018, 3:24 pm

≫ Next: Write a data frame on &period;xlsx too slow

≪ Previous: codingdirectional: Search for any duplicate file with python

PC：Hitesh Choudhary 来自于Unsplash

python 是一个很棒的语言。它是世界上发展最快的编程语言之一。它一次又一次地证明了在开发人员职位中和跨行业的数据科学职位中的实用性。整个 Python 及其库的生态系统使它成为全世界用户（初学者和高级用户）的合适选择。它的成功和流行的原因之一是它强大的第三方库的集合，这些库使它可以保持活力和高效。

在本文中，我们会研究一些用于数据科学任务的 Python 库，而不是常见的比如 panda、scikit-learn 和 matplotlib 等的库。尽管像 panda 和 scikit-learn 这样的库，是在机器学习任务中经常出现的，但是了解这个领域中的其它 Python 产品总是很有好处的。

Wget

从网络上提取数据是数据科学家的重要任务之一。 Wget 是一个免费的实用程序，可以用于从网络上下载非交互式的文件。它支持 HTTP、HTTPS 和 FTP 协议，以及通过 HTTP 的代理进行文件检索。由于它是非交互式的，即使用户没有登录，它也可以在后台工作。所以下次当你想要下载一个网站或者一个页面上的所有图片时， wget 可以帮助你。

安装： $ pip install wget 复制代码例子： import wget url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3' filename = wget.download(url) 100% [................................................] 3841532 / 3841532 filename 'razorback.mp3' 复制代码 Pendulum

对于那些在 python 中处理日期时间时会感到沮丧的人来说，Pendulum 很适合你。它是一个简化日期时间操作的 Python 包。它是 Python 原生类的简易替代。请参阅文档深入学习。

安装： $ pip install pendulum 复制代码例子： import pendulum dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto') dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver') print(dt_vancouver.diff(dt_toronto).in_hours()) 3 复制代码 imbalanced-learn

可以看出，当每个类的样本数量基本相同时，大多数分类算法的效果是最好的，即需要保持数据平衡。但现实案例中大多是不平衡的数据集，这些数据集对机器学习算法的学习阶段和后续预测都有很大影响。幸运的是，这个库就是用来解决此问题的。它与 scikit-learn 兼容，是 scikit-lear-contrib 项目的一部分。下次当你遇到不平衡的数据集时，请尝试使用它。

安装： pip install -U imbalanced-learn # 或者 conda install -c conda-forge imbalanced-learn 复制代码例子：

使用方法和例子请参考文档。

FlashText

在 NLP 任务中，清理文本数据往往需要替换句子中的关键字或从句子中提取关键字。通常，这种操作可以使用正则表达式来完成，但是如果要搜索的术语数量达到数千个，这就会变得很麻烦。Python 的 FlashText 模块是基于FlashText 算法为这种情况提供了一个合适的替代方案。FlashText 最棒的一点是，不管搜索词的数量如何，运行时间都是相同的。你可以在这里了解更多内容。

安装： $ pip install flashtext 复制代码例子：提取关键字 from flashtext import KeywordProcessor keyword_processor = KeywordProcessor() # keyword_processor.add_keyword(<unclean name>, <standardised name>) keyword_processor.add_keyword('Big Apple', 'New York') keyword_processor.add_keyword('Bay Area') keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.') keywords_found ['New York', 'Bay Area'] 复制代码替换关键字 keyword_processor.add_keyword('New Delhi', 'NCR region') new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.') new_sentence 'I love New York and NCR region.' 复制代码

更多实用案例，请参考官方文档。

Fuzzywuzzy

这个库的名字听起来很奇怪，但是在字符串匹配方面，fuzzywuzzy 是一个非常有用的库。可以很方便地实现计算字符串匹配度、令牌匹配度等操作，也可以很方便地匹配保存在不同数据库中的记录。

安装： $ pip install fuzzywuzzy 复制代码例子： from fuzzywuzzy import fuzz from fuzzywuzzy import process # 简单匹配度 fuzz.ratio("this is a test", "this is a test!") 97 # 模糊匹配度 fuzz.partial_ratio("this is a test", "this is a test!") 100 复制代码

更多有趣例子可以在 GitHub 仓库找到。

PyFlux

时间序列分析是机器学习领域中最常见的问题之一。 PyFlux 是 Python 中的一个开源库，它是为处理时间序列问题而构建的。该库拥有一系列优秀的现代时间序列模型，包括但不限于 ARIMA、GARCH 和 VAR 模型。简而言之，PyFlux 为时间序列建模提供了一种概率方法。值得尝试一下。

安装 pip install pyflux 复制代码例子

详细用法和例子请参考官方文档。

Ipyvolume

结果展示也是数据科学中的一个重要方面。能够将结果进行可视化将具有很大优势。IPyvolume 是一个可以在 Jupyter notebook 中可视化三维体和图形（例如三维散点图等）的 Python 库，并且只需要少量配置。但它目前还是 1.0 之前的版本阶段。用一个比较恰当的比喻来解释就是：IPyvolume 的 volshow 对于三维数组就像 matplotlib 的 imshow 对于二维数组一样好用。可以在这里获取更多。

使用 pip $ pip install ipyvolume 使用 Conda/Anaconda $ conda install -c conda-forge ipyvolume 复制代码例子动画
[译] 鲜为人知的数据科学 Python 库

体绘制

Dash

Dash 是一个高效的用于构建 web 应用程序的 Python 框架。它是在 Flask、Plotly.js 和 React.js 基础上设计而成的，绑定了很多比如下拉框、滑动条和图表的现代 UI 元素，你可以直接使用 Python 代码来写相关分析，而无需再使用 javascript。Dash 非常适合构建数据可视化应用程序。然后，这些应用程序可以在 web 浏览器中呈现。用户指南可以在这里获取。

安装 pip install dash==0.29.0 # 核心 dash 后端 pip install dash-html-components==0.13.2 # HTML 组件 pip install dash-core-components==0.36.0 # 增强组件 pip install dash-table==3.1.3 # 交互式 DataTable 组件（最新！）复制代码例子

下面的例子展示了一个具有下拉功能的高度交互式图表。当用户在下拉菜单中选择一个值时，应用程序代码将动态地将数据从 Google Finance 导出到 panda DataFrame。源码在这里

Gym

OpenAI 的 Gym 是一款用于增强学习算法的开发和比较工具包。它兼容任何数值计算库，如 TensorFlow 或 Theano。Gym 库是测试问题集合的必备工具，这个集合也称为环境 ―― 你可以用它来开发你的强化学习算法。这些环境有一个共享接口，允许你进行通用算法的编写。

安装 pip install gym 复制代码例子

这个例子会运行 CartPole-v0 环境中的一个实例，它的时间步数为 1000，每一步都会渲染整个场景。

你可以在这里获取其它环境的相关资料。

↧

Write a data frame on &period;xlsx too slow

December 5, 2018, 3:22 pm

≫ Next: gamingdirectional: Lets move on to the next game level!

≪ Previous: [译] 鲜为人知的数据科学 Python 库

I have a 40MB dataframe 'dfScore' I am writing to .xlsx。
Write a data frame on &period;xlsx too slow

the code is as follow, writer = pandas.ExcelWriter('test.xlsx', engine='xlsxwriter') dfScore.to_excel(writer,sheet_name='Sheet1') writer.save()

the code dfScore.to_excel take almost an hour , the code writer.save() takes another hour. Is this normal? Is there a good way to take less than 10 min?

i already searched in stackoverflow ,but it seems some suggestions not working on my problem.

the code dfScore.to_excel take almost an hour ,the code writer.save() takes another hour. Is this normal?

That sounds a bit too high. I ran an XlsxWriter test writing 1,000,000 rows x 5 columns and it took ~ 100s. The time will vary based on the CPU and Memory of the test machine but 1 hour is 36 times slower which doesn't seem right.

Note, Excel, and thus XlsxWriter, only supports 1,048,576 rows per worksheet so you are effectively throwing away 3/4s of your data and wasting time doing it.

Is there a good way to take less than 10 min?

For pure XlsxWriter programs pypy gives a good speed up. For example rerunning my 1,000,000 rows x 5 columns testcase with pypy the time went from 99.15s to 16.49s. I don't know if Pandas works with pypy though.

↧

gamingdirectional: Lets move on to the next game level!

December 5, 2018, 5:14 pm

≫ Next: 我的2018年总结

≪ Previous: Write a data frame on &period;xlsx too slow

Welcome back, in this article we will create the next level setup scene for our pygame project. Yesterday I had a deep thought about the method which we should use to do the game objects setup for each game level and had finally decided to leave the game object managers to do their own game object setups. The reasons why I have selected to let game object managers to manage their own game object setup for each level are because:-

1) The level manager class is not suitable to do all the game objects setup for each new level and should always stay away from dealing directly with the game object.

2) The game manager class is also not a place where we will want to do the game objects setup because this will increase the complexity of this class and thus makes the game developer really hard to follow the code when he needs to modify the program in the future.

Thus leaves all the individual game object related topic to the object manager class is the best choice and that is also what the object manager class should do after all.

OK lets get started, we are not going to edit anything in the background manager, the enemy missile manager and the player manager because there is nothing changed each time a new level has been reached. The only class we will update is the enemy manger class which we will add in the new type of enemy spaceship on each level of the game. Here is the modify version of the enemy manager class.

from Enemy import Enemy
from GameSprite import GameSprite
from pygame.locals import *
from EnemyMissileManager import EnemyMissileManager
import random
from Objectpool import Objectpool
class EnemyManager(object):
def __init__(self, scene, player, game_level):
self.enemy_missile_manager = EnemyMissileManager()
self.scene = scene
self.player = player
self.enemy_count = 10
self.missile_count = 60
self.enemy_list = []
self.image = 'Asset/enemy0.png'
self.image1 = 'Asset/enemy1.png'
self.width = 30
self.height = 30
self.rect = Rect(0, 0, self.width, self.height)
self.more_enemy = 0
self.y = -50
self.boundary_width = 660
self.boundary_height = 660
self.object_pool = Objectpool(self.enemy_count)
self.next_enemy = 0
self.level = game_level
# initialize game sprite object
self.sprite = GameSprite(self.image, self.rect)
self.sprite1 = GameSprite(self.image1, self.rect)
def create_enemy(self, x, y):
if(self.enemy_count > 0):
if(self.object_pool.getSize() > 0): # get the ship from object pool if the pool is not empty
self.enemy_list.append(self.object_pool.obtain())
else: # objects setup based on the level of the game
if(self.level == 1):
self.enemy_surface = self.sprite.getImage()
elif(self.level == 2):
if(self.next_enemy == 0):
self.enemy_surface = self.sprite.getImage()
self.next_enemy += 1
elif(self.next_enemy == 1):
self.enemy_surface = self.sprite1.getImage()
self.next_enemy = 0
self.enemy_list.append(Enemy(self.enemy_surface, x, y))
self.enemy_count -= 1
def update(self):
if (self.more_enemy > 600):
self.more_enemy = 0
x = random.randint(30, self.boundary_width - 50)
self.create_enemy(x , self.y) # create more enemy
else:
self.more_enemy += 1 # increase time
self.enemy_update()
self.check_boundary()
self.create_enemy_missile()
def create_enemy_missile(self):
for item in list(self.enemy_list):
if(self.player.pos.y - item.y < 100 and abs(self.player.pos.x - item.x) < 60 ):
item.create_enemy_missile(self.enemy_missile_manager)
def enemy_update(self):
for item in list(self.enemy_list):
if(item.on == False):
self.enemy_list.remove(item)
self.enemy_count += 1
item.y = self.y
item.on = True
self.object_pool.recycle(item)
else:
item.update()
def check_boundary(self):
for i in range(len(self.enemy_list)):
if (self.enemy_list[i].y > self.boundary_height):
self.enemy_list[i].on = False
def draw(self):
# blit the enemy and enemy missiles on the scene
for i in range(len(self.enemy_list)):
self.scene.blit(self.enemy_list[i].enemy_surface, self.enemy_list[i].enemy_pos)
self.enemy_list[i].missile_draw(self.scene)

At the moment we are only dealing with two levels, the code will get more complicated when the level increases, at this moment when a player has reached level 2 the enemy manager will create two type of enemy ships and which type of enemy ship will the enemy manager generates will depend on the next enemy variable which acts like a switch to switch from one enemy type to another. The enemies that the enemy manager has created have the same type of game properties but they will act differently starting from level 3 onward, the most important thing here is to get the level logic correct first then we can further modify the game code if we need to in the future.

Next thing we need to do is to modify the game manager file.

from Player import Player
from Background import Background
from EnemyManager import EnemyManager
from Overlap import Overlap
from ExplosionManager import ExplosionManager
from Score import Score
from StartScene import StartScene
from pygame.locals import *
from LevelManager import LevelManager
import pygame
class GameManager(object):
def __init__(self, scene):
self.scene = scene
self.start_scene = StartScene(scene)
self.load_music()
self.play_music()
self.overlap_manager = Overlap()
self.level_manager = LevelManager(self)
self.setup(self.level_manager.get_level())
#game state
self.LOAD = 0
self.GAME = 1
self.OVER = 2
self.NEXT = 3
self.WIN = 4
self.state = self.LOAD
def setup(self, game_level):
self.game_level = game_level
self.score_manager = Score(self.scene)
self.background = Background(self.scene)
self.player = Player(self.scene)
self.enemy_manager = EnemyManager(self.scene, self.player, game_level)
self.explosion_manager = ExplosionManager(self.scene)
def loop(self):
if(self.state == self.LOAD or self.state == self.OVER or self.state == self.NEXT or self.state == self.WIN):
self.start_scene.draw(self.state)
elif(self.state == self.GAME):
self.update()
self.draw()
def isAreaClick(self, pos):
if (self.state == self.LOAD or self.state == self.OVER or self.state == self.NEXT or self.state == self.WIN):
self.rect = Rect(177, 274, 306, 112) # the position of the play button on the scene
x, y = pos
if(self.rect.collidepoint(x, y)):
self.state = self.GAME
def load_music(self):
pygame.mixer_music.load('Music/winternight.ogg')
def play_music(self):
pygame.mixer_music.play(-1) #play the music infinite time
def set_player_x(self, _x):
if (self.state == self.GAME):
self.player.setX(_x)
def set_player_y(self, _y):
if (self.state == self.GAME):
self.player.setY(_y)
def set_missile_strike(self, strike):
if (self.state == self.GAME):
self.player.setStrike(strike)
def update(self):
self.player.update()
self.enemy_manager.update()
self.isOverlap()
self.explosion_manager.explosion_update()
# check for player, enemy, missiles overlap
def isOverlap(self):
self.overlap_manager.isOverlap(self.player, self.enemy_manager, self.explosion_manager, self.score_manager, self)
def draw(self):
self.background.draw()
self.player.draw()
self.enemy_manager.draw()
self.explosion_manager.draw()
self.score_manager.draw()
pygame.display.flip()

The only thing you need to take note is the program has passed in the game level variable to the enemy manager class which will be used to decide what type of enemy we need for different level of the game.

The next thing we need to do is to modify the level manager class.

class LevelManager(object):
def __init__(self, gm):
self.game_manager = gm
self.level = 1
self.MAX_LEVEL = 2
def increase_level(self):
self.level += 1
if(self.level > self.MAX_LEVEL):
self.game_manager.state = self.game_manager.WIN
self.level = 1
self.game_manager.setup(self.level)
else:
self.game_manager.state = self.game_manager.NEXT
self.game_manager.setup(self.level)
def get_level(self):
return self.level

Nothing much here but just to tidy up the code and changes the start level to 1 instead of 0.

Last part is to modify the overlap class.

from pygame.locals import *
class Overlap(object):
def __init__(self):
pass # nothing here
# is player and enemy, player missile, enemy missile overlap
def isOverlap(self, player, em, ex, score, gm):
self.player_rect = Rect(player.pos.x, player.pos.y, player.width, player.height)
for i in range(len(em.enemy_list)): # is player collides with enemy
self.em_rect = Rect(em.enemy_list[i].x, em.enemy_list[i].y, em.width, em.height)
if (self.player_rect.colliderect(self.em_rect)):
em.enemy_list[i].on = False
if(em.enemy_list[i].hit == False):
ex.create_explosion(player.pos.x + 2, player.pos.y + 2)
em.enemy_list[i].hit = True
gm.state = gm.OVER
gm.setup(gm.level_manager.get_level())
for i in range(len(em.enemy_list)): # is enemy missile hits player
for j in range(len(em.enemy_list[i].missile_list)):
self.em_rect = Rect(em.enemy_list[i].missile_list[j].x, em.enemy_list[i].missile_list[j].y, em.enemy_missile_manager.width, em.enemy_missile_manager.height)
if (self.player_rect.colliderect(self.em_rect)):
em.enemy_list[i].missile_list[j].on = False
ex.create_explosion(player.pos.x + 2, player.pos.y + 2)
score.set_score(-1)
if(score.score < 0):
gm.state = gm.OVER
gm.setup(gm.level_manager.get_level())
for i in range(len(em.enemy_list)): # is player missile hits enemy
self.em_rect = Rect(em.enemy_list[i].x, em.enemy_list[i].y, em.width, em.height)
for j in range(len(player.getMissileManager().missile_list)):
self.mm_rect = Rect(player.getMissileManager().missile_list[j].x, player.getMissileManager().missile_list[j].y, player.getMissileManager().width, player.getMissileManager().height)
if (self.em_rect.colliderect(self.mm_rect)):
em.enemy_list[i].on = False
player.getMissileManager().missile_list[j].on = False
if (em.enemy_list[i].hit == False):
ex.create_explosion(em.enemy_list[i].x, em.enemy_list[i].y + 2)
em.enemy_list[i].hit = True
score.set_score(1)
if(score.score >= gm.level_manager.get_level() * 30):
gm.level_manager.increase_level()

As you can see, you will need each level * 30 points to reach the next game level.

Now we will play the game…

http://gamingdirectional.com/wp-content/uploads/2018/12/game_strike-3.mp4

↧

我的2018年总结

December 5, 2018, 4:36 pm

≫ Next: Python 的技巧和方法你了解多少？

≪ Previous: gamingdirectional: Lets move on to the next game level!

开源贡献

把最自豪的事写在前面。

在看过 cpython 源码的一年，终于成功地为它打了一个补丁：

当时也是兴奋极了，补丁也得到顶级开发者的认可，能为编程语言贡献源码实在是太棒了。我想短时间内我很难再达到这个高度了。

同样的贡献开源项目的还有 thinkphp ，这是我还是 php 开发者的时候很喜欢的一个开发框架，而且还是国产的：

第一次使用它时还是大一时候的寒假，那时还是 3.1 的版本，用它完成西二在线布置的选课系统。帮助我通过了考核。现在我也终于有能力来反馈它了。

其他的零零散散也有给其他项目提交 PR，如 xadmin ， webvirtcloud ， nagios-api ， typecho 等。

额外的收益

2018年1月份时候比较火的答题赢奖金，当时入的是《冲顶大会》，那段时间内真的挺花精力的，有时牺牲午休去答题，为了复活卡去买虚拟号注册，同时还给同事刷复活卡回本。另一方面，通过 fiddler 抓包知道了答题的接口与传递的参数，利用这些虚拟号和 django 开发一个批量答题的工具。可惜这种直播答题大过年后就凉凉了。

另一个有额外收益的是我的世界开发者。上传了两款皮肤，和小米主题一样，上架后基本什么都不用管了。mc 要满一千才允许体现。

技能成长

在威哥的帮助下，我基本上能熟练运用 kvm 了；对 python 的基础知识也开始从一个个小方面进行深入，比如 isinstance ，整型为何不溢出等。Django 源码也看了一些，大体已经知道了一个请求到中间件再到视图函数的一系列过程，接下来准备看 ORM 部分和路由部分。

↧