『计算机视觉』Mask-RCNN_锚框生成

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

『计算机视觉』Mask-RCNN_锚框⽣成
Github地址：
⼀、和SSD锚框对⽐
Mask_RCNN的锚框本质上来说和SSD的是⼀样的（），
中⼼点的个数等于特征层像素数
框体⽣成是围绕中⼼点的
最终的框体坐标需要归⼀化到01之间，都是对于输⼊图⽚的相对⼤⼩
RCNN系列⼀般都是⼀个共享特征，但在Mask_RCNN结构引⼊了FPN结构后，和SSD⼀样，使⽤了多层特征，这样两者的锚框⽣成算法可以说是如出⼀辙了，只不过是⽣成策略有所微调：
SSD中不同特征层对应着不同的⽹格增强⽐例参数；Mask_RCNN不通层的⽐例（anchor_ratios）则完全⼀致
SSD每⼀层每⼀个中⼼点⽣成该层ratio+2个框；Mask_RCNN⽣成固定3个框
SSD中⼼点为feat像素偏移0.5步长；Mask_RCNN中⼼点直接选为feat像素位置
⽽基本⽣成⽅式两者完全⼀致：
h乘anchor_ratios**0.5
w除anchor_ratios**0.5
h、w初始值为给定的参考尺⼨，即感受野控制实际依赖的参数为每⼀层的anchor_ratios和参考尺⼨，对SSD：
anchor_sizes=[(21., 45.),
(45., 99.),
(99., 153.),
(153., 207.),
(207., 261.),
(261., 315.)]
anchor_ratios=[[2, .5],
[2, .5, 3, 1./3],
[2, .5, 3, 1./3],
[2, .5, 3, 1./3],
[2, .5],
[2, .5]]
对Mask_RCNN（h、w参考尺⼨⼤⼩⼀致）:
self.config.BACKBONE_STRIDES = [4, 8, 16, 32, 64] # 特征层的下采样倍数，中⼼点计算使⽤
self.config.RPN_ANCHOR_RATIOS = [0.5, 1, 2] # 特征层锚框⽣成参数
self.config.RPN_ANCHOR_SCALES = [32, 64, 128, 256, 512] # 特征层锚框感受野
⼆、锚框⽣成
锚框⽣成⼊⼝函数位于中的get_anchor函数，需要参数image_shape，保证含有[h, w]即可，也可以包含[h, w, c]，
def get_anchors(self, image_shape):
"""Returns anchor pyramid for the given image size."""
# [N, (height, width)]
backbone_shapes = compute_backbone_shapes(self.config, image_shape)
# Cache anchors and reuse if image shape is the same
if not hasattr(self, "_anchor_cache"):
self._anchor_cache = {}
if not tuple(image_shape) in self._anchor_cache:
# Generate Anchors: [anchor_count, (y1, x1, y2, x2)]
a = utils.generate_pyramid_anchors(
self.config.RPN_ANCHOR_SCALES, # (32, 64, 128, 256, 512)
self.config.RPN_ANCHOR_RATIOS, # [0.5, 1, 2]
backbone_shapes, # with shape [N, (height, width)]
self.config.BACKBONE_STRIDES, # [4, 8, 16, 32, 64]
self.config.RPN_ANCHOR_STRIDE) # 1
# Keep a copy of the latest anchors in pixel coordinates because
# it's used in inspect_model notebooks.
# TODO: Remove this after the notebook are refactored to not use it
self.anchors = a
# Normalize coordinates
self._anchor_cache[tuple(image_shape)] = utils.norm_boxes(a, image_shape[:2])
return self._anchor_cache[tuple(image_shape)]
调⽤函数compute_backbone_shapes计算各个特征层shape：
def compute_backbone_shapes(config, image_shape):
"""Computes the width and height of each stage of the backbone network.
Returns:
[N, (height, width)]. Where N is the number of stages
"""
if callable(config.BACKBONE):
return PUTE_BACKBONE_SHAPE(image_shape)
# Currently supports ResNet only
assert config.BACKBONE in ["resnet50", "resnet101"]
return np.array(
[[int(math.ceil(image_shape[0] / stride)),
int(math.ceil(image_shape[1] / stride))]
for stride in config.BACKBONE_STRIDES]) # [4, 8, 16, 32, 64]
调⽤函数utils.generate_pyramid_anchors⽣成全部锚框：
def generate_pyramid_anchors(scales, ratios, feature_shapes, feature_strides,
anchor_stride):
"""Generate anchors at different levels of a feature pyramid. Each scale
is associated with a level of the pyramid, but each ratio is used in
all levels of the pyramid.
Returns:
anchors: [N, (y1, x1, y2, x2)]. All generated anchors in one array. Sorted
with the same order of the given scales. So, anchors of scale[0] come
first, then anchors of scale[1], and so on.
"""
# Anchors
# [anchor_count, (y1, x1, y2, x2)]
anchors = []
for i in range(len(scales)):
anchors.append(generate_anchors(scales[i],
ratios,
feature_shapes[i],
feature_strides[i],
anchor_stride))
# [anchor_count, (y1, x1, y2, x2)]
return np.concatenate(anchors, axis=0)
utils.generate_pyramid_anchors会调⽤utils.generate_anchors来⽣成每⼀层的锚框（这⼀步较多的使⽤了函数meshgrid，介绍见）：def generate_anchors(scales, ratios, shape, feature_stride, anchor_stride):
"""
scales: 1D array of anchor sizes in pixels. Example: [32, 64, 128]
ratios: 1D array of anchor ratios of width/height. Example: [0.5, 1, 2]
shape: [height, width] spatial shape of the feature map over which
to generate anchors.
feature_stride: Stride of the feature map relative to the image in pixels.
anchor_stride: Stride of anchors on the feature map. For example, if the
value is 2 then generate anchors for every other feature map pixel.
"""
# Get all combinations of scales and ratios
scales, ratios = np.meshgrid(np.array(scales), np.array(ratios))
scales = scales.flatten()
ratios = ratios.flatten()
# Enumerate heights and widths from scales and ratios
heights = scales / np.sqrt(ratios)
widths = scales * np.sqrt(ratios)
# Enumerate shifts in feature space
shifts_y = np.arange(0, shape[0], anchor_stride) * feature_stride
shifts_x = np.arange(0, shape[1], anchor_stride) * feature_stride
shifts_x, shifts_y = np.meshgrid(shifts_x, shifts_y)
# Enumerate combinations of shifts, widths, and heights
box_widths, box_centers_x = np.meshgrid(widths, shifts_x) # (n, 3) (n, 3)
box_heights, box_centers_y = np.meshgrid(heights, shifts_y) # (n, 3) (n, 3)
# Reshape to get a list of (y, x) and a list of (h, w)
# (n, 3, 2) -> (3n, 2)
box_centers = np.stack([box_centers_y, box_centers_x], axis=2).reshape([-1, 2])
box_sizes = np.stack([box_heights, box_widths], axis=2).reshape([-1, 2])
# Convert to corner coordinates (y1, x1, y2, x2)
boxes = np.concatenate([box_centers - 0.5 * box_sizes,
box_centers + 0.5 * box_sizes], axis=1)
# 框体信息是相对于原图的, [N, (y1, x1, y2, x2)]
return boxes
模拟某层的中⼼点分布最后回到get_anchor，调⽤utils.norm_boxes将锚框坐标化为01之间：
def norm_boxes(boxes, shape):
"""Converts boxes from pixel coordinates to normalized coordinates.
boxes: [N, (y1, x1, y2, x2)] in pixel coordinates
shape: [..., (height, width)] in pixels
Note: In pixel coordinates (y2, x2) is outside the box. But in normalized
coordinates it's inside the box.
Returns:
[N, (y1, x1, y2, x2)] in normalized coordinates
"""
h, w = shape
scale = np.array([h - 1, w - 1, h - 1, w - 1])
shift = np.array([0, 0, 1, 1])
return np.divide((boxes - shift), scale).astype(np.float32)
最终返回相对坐标下的锚框，shape：[anchor_count, (y1, x1, y2, x2)]。