深度学习实现之空间变换网络

我的“深度学习论文实现”系列的前三个博客将涵盖2016年由Google Deepmind的Max Jaderberg, Karen Simonyan, Andrew Zisserman and Koray Kavukcuoglu提出的空间变换网络概念。空间变换网络是一个可学习模型，旨在提升卷积神经网络在计算和参数方面的空间恒定性。

在第一部分中，我们将介绍两个非常重要的概念，在理解空间变化层次的内在工作机制上起决定性作用。我们将从检验一个基于仿射变换的图像变换技术的子集开始，然后深入到服从双线性插值的一般变换过程。

在第二部分中，我们将细细重温空间变化层次并总结这篇论文。在最后的第三部分中，我们将用scratch在 TensorFlow 上编程并将其应用到 GTSRB 数据集（德国交通标志识别标准）上。

完整代码参见我的。

我在Github Repository （文章开始时提及了）中上传了2张猫的图片，请下载并存在桌面上叫 data/ 的文件夹中或更改为正确路径。

我还写了一个 load_img() 函数将图片插入 numpy 数组中，我没有细讲，但是我们要用到 PIL 和 Numpy 才能再现结果。

配上函数，加载图片并将它们连接到单输入数组中。为了使代码尽可能通俗简洁，，我们要对两张图片进行操作。

import numpy as np

from PIL import Image

# params

DIMS = ( 400 , 400 )

CAT1 = 'cat1.jpg'

CAT2 = 'cat2.jpg'

# load both cat images

img1 = load_img(CAT1, DIMS)

img2 = load_img(CAT2, DIMS, view = True )

# concat into tensor of shape (2, 400, 400, 3)

input_img = np . concatenate([img1, img2], axis = 0 )

# dimension sanity check

print ( "Input Img Shape: {}" . format(input_img . shape))

我们的批处理规模是2，这表示我们需要等量的变换矩阵M对应批次中的每张图。

请初始化两个恒等变换矩阵。如果正确运用了双线性取样器，那么输出图片应该与输入的近乎一致。

# grab shape

num_batch, H, W, C = input_img . shape

# initialize M to identity transform

M = np . array([[ 1. , 0. , 0. ], [ 0. , 1. , 0. ]])

# repeat num_batch times

M = np . resize(M, (num_batch, 2 , 3 ))

（再次声明：如果要包含位移，仿射变换的通用矩阵是2×3的）

现在我们需要写一个产生格点矩阵的函数并输出一个由格点矩阵产生取样网格和变换矩阵M。

我们要创造标准化格点矩阵，即x，y的值在-1到1之间，并且分别有 width 和 height 。事实上，对于图片，x代表图片的宽度（矩阵的列数），y代表高度（矩阵的行数）。

# create normalized 2D grid

x = np . linspace( - 1 , 1 , W)

y = np . linspace( - 1 , 1 , H)

x_t, y_t = np . meshgrid(x, y)

然后，我们要增加维度来构建齐次坐标系。

# reshape to (xt, yt, 1)

ones = np . ones(np . prod(x_t . shape))

sampling_grid = np . vstack([x_t . flatten(), y_t . flatten(), ones])

尽管我们已经构建了一个网格，我们仍然需要 num_batch 网格。同上，下一步仍需重复数组 num_batch 的次数。

# repeat grid num_batch times

sampling_grid = np . resize(sampling_grid, (num_batch, 3 , H * W))

让我们继续图像变换的第2步。

# transform the sampling grid i.e. batch multiply

batch_grids = np . matmul(M, sampling_grid)

# batch grid has shape (num_batch, 2, H*W)

# reshape to (num_batch, height, width, 2)

batch_grids = batch_grids . reshape(num_batch, 2 , H, W)

batch_grids = np . moveaxis(batch_grids, 1 , - 1 )

最后，我们来写双线性取样器。通过取样网格中的 x 和 y 我们希望得到原始图像中像素的插入值。

尝试分离x，y维度并重新调节它们，使之适应高度或宽度的间隔。

x_s = batch_grids[:, :, :, 0 : 1 ] . squeeze() y_s = batch_grids[:, :, :, 1 : 2 ] . squeeze() # rescale x and y to [0, W/H]

x = ((x_s + 1. ) * W) * 0.5

y = ((y_s + 1. ) * H) * 0.5

对任意坐标 (xi,yi)我们希望获得4个角坐标。

# grab 4 nearest corner points for each (x_i, y_i)

x0 = np . floor(x) . astype(np . int64)

x1 = x0 + 1

y0 = np . floor(y) . astype(np . int64)

y1 = y0 + 1

( 注意：我们只能用ceiling函数而不是增量1)

现在我们必须确定没有值超过了图像边界。假设 x=399 ，则x0=399且x1=x0+1=400将导致一个numpy错误。因此我们用如下方法修剪角坐标。

# make sure it's inside img range [0, H] or [0, W]

x0 = np . clip(x0, 0 , W - 1 )

x1 = np . clip(x1, 0 , W - 1 )

y0 = np . clip(y0, 0 , H - 1 )

y1 = np . clip(y1, 0 , H - 1 )

我们用进阶的numpy索引来抓取每个角坐标的像素值，与 (x0, y0) , (x0, y1) , (x1, y0) and (x_1, y_1) 一致。

# look up pixel values at corner coords

Ia = input_img[np . arange(num_batch)[:, None , None ], y0, x0] Ib = input_img[np . arange(num_batch)[:, None , None ], y1, x0] Ic = input_img[np . a

深度学习实现之空间变换网络

Trending Articles

《沈冰自述——我和周永康的故事》全本

Moog - Subsequent 25

出售: 林憶蓮•回來愛的身邊 (東芝1A1頭版)

筆記 - 使用 PowerShell 清除停用 AD 帳號與 OU

df-dferh-01 中国区 Android 安装 Google Play Store 后报错的解决办法

「一棒接一棒、棒棒強棒」108學年度家長會長交接典禮

吸烟与MBTI类型判断捷径 (豆瓣 INFJ的奇幻之旅小组)

acermark龍璿國際展出多款包裝設備

枋寮北勢寮隆山宮睽違12年再辦迎王祭典

日本女优有村千佳COS集锦：狂三&黑白岩&亚丝娜&绫波丽

有遇到过这个问题么。/jsb-videoplayer.js not found, possible missing file.

MAS v2.8 magicgenius 汉化版 - 11.11更新

出售: Monster Cable Interlink Reference 2

福建佛教人士望云和尚(林斌)的九仙禅寺被强行收走，望云妈妈被赶出寺庙

R 语言中的OpenBLAS*和英特尔® 数学核心函数库的性能比较

[转载]煞貢、直星、人專吉日\金神七煞歌

HAKERS哈克士戶外 12月8~14日廠拍

OBS Studio 23.2.1 免安裝中文版 - 免費網路實況廣播軟體實況主必備軟體取代Fraps

<請教>行駛中安卓機會重新開機

Udp2raw-tunnel 及其一键安装脚本