improving Foreground Segmentation with Probabilistic Superpixel Markov Random Fields
面向非规则排列汉字文本的字符分割方法
关键词: 数字图像处理; 曲线排列文本; 透视文本; 汉字字符分割
中图法分类号: TP391.1
DOI: 10.3724/SP.J.1089.2019.17608
Character Segmentation Method for Irregularly Arranged Text in Chinese
视频图像和自然场景图像 4 类对象, 除文档扫描图 像中大多为沿直线排列的无透视变形的文本外, 其余类型的图像中常常会出现沿曲线排列的文本 及存在透视变形的文本, 因此实现曲线排列文本
收稿日期: 2018-10-26; 修回日期: 2019-05-17. 基金项目: 道路施工技术与装备教育部重点实验室开放基金(300102259506);
Yang Xieliu, Niu Xihui, and Liang Wenfeng
(School of Mechanical Engineering, Shenyang Jianzhu University, Shenyang 110168)
Abstract: The existing character segmentation methods have low segmentation accuracy when dealing with irregularly arranged Chinese text. A character segmentation method based on connected components is proposed to solve this problem. First, the text foreground is extracted and the text connected components are labeled. Second, the centroid and radius of each connected component are calculated to construct the bounding circle. Third, the false text connected components are removed according to the size of the bounding circles. Fourth, two bundling rules are customized considering the structural features of Chinese characters and then the character segmentation is realized for Chinese text. Experimental results show that, compared with the existing methods, the proposed method achieved much higher segmentation accuracy when dealing with irregularly arranged Chinese text and show good applicability to regularly arranged Chinese text.
cs231n_2017_lecture12
Lecture 12: Visualizing and UnderstandingAdministrativeMilestones due tonight on Canvas, 11:59pmMidterm grades released on Gradescope this weekA3 due next Friday, 5/26HyperQuest deadline extended to Sunday 5/21, 11:59pm Poster session is June 6Last Time: Lots of Computer Vision TasksClassification + LocalizationSemantic SegmentationObject DetectionInstance SegmentationCATGRASS , CAT , TREE , SKYDOG , DOG , CATDOG , DOG , CATSingle ObjectMultiple ObjectNo objects, just pixelsThis image is CC0 public domainThis image is CC0 public domainWhat’s going on inside ConvNets?This image is CC0 public domainClass Scores:1000 numbers Input Image:3 x 224 x 224What are the intermediate features looking for?Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.Figure reproduced with permission.First Layer: Visualize FiltersAlexNet:64 x 3 x 11 x 11ResNet-18:64 x 3 x 7 x 7ResNet-101:64 x 3 x 7 x 7DenseNet-121:64 x 3 x 7 x 7Krizhevsky, “One weird trick for parallelizing convolutional neural networks”, arXiv 2014 He et al, “Deep Residual Learning for Image Recognition”, CVPR 2016Huang et al, “Densely Connected Convolutional Networks”, CVPR 2017Visualize the filters/kernels (raw weights)We can visualize filters at higher layers, but not that interesting (these are taken from ConvNetJS CIFAR-10 demo)layer 1 weightslayer 2 weightslayer 3 weights16 x 3 x 7 x 720 x 16 x 7 x 720 x 20 x 7 x 7FC7 layerLast Layer4096-dimensional feature vector for an image (layer immediately before the classifier)Run the network on many images, collect the feature vectorsLast Layer: Nearest Neighbors4096-dim vectorTest image L2 Nearest neighbors in feature spaceRecall: Nearest neighborsin pixel spaceKrizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012.Figures reproduced with permission.Visualize the “space” of FC7 feature vectors by reducing dimensionality of vectors from 4096 to 2 dimensionsSimple algorithm: Principle Component Analysis (PCA) More complex: t-SNEVan der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008Figure copyright Laurens van der Maaten and Geoff Hinton, 2008. Reproduced with permission.Van der Maaten and Hinton, “Visualizing Data using t-SNE”, JMLR 2008Krizhevsky et al, “ImageNet Classification with Deep Convolutional Neural Networks”, NIPS 2012. Figure reproduced with permission.See high-resolution versions at/people/karpathy/cnnembed/Visualizing ActivationsYosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, 2014. Reproduced with permission.conv5 feature map is128x13x13; visualizeas 128 13x13grayscale imagesMaximally Activating PatchesPick a layer and a channel; e.g. conv5 is128 x 13 x 13, pick channel 17/128Run many images through the network,record values of chosen channelVisualize image patches that correspondto maximal activationsSpringenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015;reproduced with permission.Occlusion Experiments Mask part of the image beforefeeding to CNN, draw heatmap ofprobability at each mask locationZeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014Boat image is CC0 public domain Elephant image is CC0 public domain Go-Karts image is CC0 public domainHow to tell which pixels matter for classification?Dog Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Modelsand Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.How to tell which pixels matter for classification?Dog Compute gradient of (unnormalized) classscore with respect to image pixels, takeabsolute value and max over RGB channelsSimonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Modelsand Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.Saliency Maps: Segmentation without supervision Simonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Modelsand Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.Rother et al, “Grabcut: Interactive foreground extraction using iterated graph cuts”, ACM TOG 2004Use GrabCut onsaliency mapPick a single intermediate neuron, e.g. onevalue in 128 x 13 x 13 conv5 feature mapCompute gradient of neuron value with respectto image pixelsZeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015Pick a single intermediate neuron, e.g. onevalue in 128 x 13 x 13 conv5 feature mapCompute gradient of neuron value with respectto image pixelsImages come out nicer if you onlybackprop positive gradients througheach ReLU (guided backprop)ReLUZeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014 Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015; reproduced with permission.Zeiler and Fergus, “Visualizing and Understanding Convolutional Networks”, ECCV 2014Springenberg et al, “Striving for Simplicity: The All Convolutional Net”, ICLR Workshop 2015Figure copyright Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, 2015; reproduced with permission.(Guided) backprop: Find the part of an image that a neuron responds to Gradient ascent: Generate a synthetic image that maximally activates a neuronI* = arg maxIf(I) + R(I)Neuron value Natural image regularizer1.Initialize image to zerosscore for class c (before Softmax) zero imageRepeat:2. Forward image to compute current scores3. Backprop to get gradient of neuron value with respect to image pixels4. Make a small update to the imageSimple regularizer: Penalize L2norm of generated imageSimonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.Simple regularizer: Penalize L2norm of generated imageSimonyan, Vedaldi, and Zisserman, “Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps”, ICLR Workshop 2014.Figures copyright Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman, 2014; reproduced with permission.Simple regularizer: Penalize L2norm of generated imageYosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014. Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.Better regularizer: Penalize L2 norm of image; also during optimization periodically(1)Gaussian blur image(2)Clip pixels with small values to 0(3)Clip pixels with small gradients to 0 Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Better regularizer: Penalize L2 norm ofimage; also during optimizationperiodically(1)Gaussian blur image(2)Clip pixels with small values to 0(3)Clip pixels with small gradients to 0Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.Better regularizer: Penalize L2 norm ofimage; also during optimizationperiodically(1)Gaussian blur image(2)Clip pixels with small values to 0(3)Clip pixels with small gradients to 0Yosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.Use the same approach to visualize intermediate featuresYosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.Use the same approach to visualize intermediate featuresYosinski et al, “Understanding Neural Networks Through Deep Visualization”, ICML DL Workshop 2014.Figure copyright Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, and Hod Lipson, 2014. Reproduced with permission.Adding “multi-faceted” visualization gives even nicer results:(Plus more careful regularization, center-bias)Nguyen et al, “Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks”, ICML Visualization for Deep Learning Workshop 2016. Figures copyright Anh Nguyen, Jason Yosinski, and Jeff Clune, 2016; reproduced with permission.Figures copyright Anh Nguyen, Jason Yosinski, and Jeff Clune, 2016; reproduced with permission.Optimize in FC6 latent space instead of pixel space:Nguyen et al, “Synthesizing the preferred inputs for neurons in neural networks via deep generator networks,” NIPS 2016Figure copyright Nguyen et al, 2016; reproduced with permission.(1)Start from an arbitrary image(2)Pick an arbitrary class(3)Modify the image to maximize the class(4)Repeat until network is fooledBoat image is CC0 public domain Elephant image is CC0 public domainWhat is going on? Ian Goodfellow will explain Boat image is CC0 public domainElephant image is CC0 public domainRather than synthesizing an image to maximize a specific neuron, insteadtry to amplify the neuron activations at some layer in the networkChoose an image and a layer in a CNN; repeat:1.Forward: compute activations at chosen layer2.Set gradient of chosen layer equal to its activation3.Backward: Compute gradient on image4.Update imageMordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into NeuralNetworks”, Google Research Blog. Images are licensed under CC-BY4.0Rather than synthesizing an image to maximize a specific neuron, instead try to amplify the neuron activations at some layer in the networkEquivalent to:I* = arg max I ∑i f i (I)2Mordvintsev, Olah, and Tyka, “Inceptionism: Going Deeper into Neural Networks”, Google Research Blog . Images are licensed under CC-BY 4.0Choose an image and a layer in a CNN; repeat:1.Forward: compute activations at chosen layer 2.Set gradient of chosen layer equal to its activation 3.Backward: Compute gradient on image 4.Update imageCode is very simple but it uses a couple tricks: (Code is licensed under Apache 2.0)Code is very simple but it uses a couple tricks: (Code is licensed under Apache 2.0)Jitter imageCode is very simple but it uses a couple tricks: (Code is licensed under Apache 2.0)Jitter imageL1 Normalize gradientsCode is very simple butit uses a couple tricks:(Code is licensed under Apache 2.0)Jitter imageL1 Normalize gradientsClip pixel valuesAlso uses multiscale processing for a fractal effect (not shown)Sky image is licensed under CC-BY SA 3.0Image is licensed under CC-BY 4.0Image is licensed under CC-BY 4.0Image is licensed under CC-BY 3.0Image is licensed under CC-BY 3.0Image is licensed under CC-BY 4.0Given a CNN feature vector for an image, find a new image that: -Matches the given feature vector-“looks natural” (image prior regularization)Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015Given feature vectorFeatures of new imageTotal Variation regularizer (encourages spatial smoothness)Reconstructing from different layers of VGG-16Mahendran and Vedaldi, “Understanding Deep Image Representations by Inverting Them”, CVPR 2015Figure from Johnson, Alahi, and Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution”, ECCV 2016. Copyright Springer, 2016.Reproduced for educational purposes.。
图像处理专业英语词汇
FFT 滤波器FFT filtersVGA 调色板和许多其他参数VGA palette and many others 按名称排序sort by name包括角度和刻度including angle and scale保持目标keep targets保存save保存和装载save and load饱和度saturation饱和加法和减法add and subtract with saturate背景淡化background flatten背景发现find background边缘和条纹测量Edge and Stripe/Measurement边缘和条纹的提取find edge and stripe编辑Edit编辑edit编辑或删除相关区域edit or delete relative region编码Code编码条Coda Bar变换forward or reverse fast Fourier transformation变量和自定义的行为variables and custom actions变量检测examine variables变形warping变形系数warping coefficients标题tile标注和影响区域label and zone of influence标准normal标准偏差standard deviation表面弯曲convex并入图像merge to image采集栏digitizer bar采集类型grab type菜单形式menu item参数Preferences参数轴和角度reference axis and angle测量measurement测量方法提取extract measurements from测量结果显示和统计display measurement results and statistics测量转换transfer to measurement插入Insert插入条件检查Insert condition checks查找最大值find extreme maximum长度length超过50 个不同特征的计算calculate over 50 different features area 撤销次数number of undo levels乘multiply尺寸size处理Processing处理/采集图像到一个新的窗口processed/grabbed image into new window 窗口window窗口监视watch window窗位window leveling创建create垂直边沿vertical edge从表格新建new from grid从工具条按钮from toolbar button从用户窗口融合merge from user form粗糙roughness错误纠正error correction错误匹配fit error打开open打开近期的文件或脚本open recent file or script打印print打印设置print setup打印预览print preview大小和日期size and date带通band pass带有调色板的8- bit带有动态预览的直方图和x, y 线曲线椭圆轮廓histogram and x, y line curve ellipse profiles with dynamic preview带阻band reject代码类型code type单步single step单一simple单帧采集snap shot导入VB等等etc.低通low pass第一帧first点point调色板预览palette viewer调试方式debug mode调用外部的DLL调整大小resize调整轮廓滤波器的平滑度和轮廓的最小域值adjust smoothness of contour filter and minimum threshold for contours定点除fixed point divide定位精度positional accuracy定义一个包含有不相关的不一致的或无特征区域的模板define model including mask for irrelevant inconsistent or featureless areas定制制定-配置菜单Customize - configure menus动态预览with dynamic preview读出或产生一个条形或矩阵码read or generate bar and matrix codes读取和查验特征字符串erify character strings断点break points对比度contrast对比度拉伸contrast stretch对称symmetry对模板应用“不关心的”像素标注apply don't care pixel mask to model 多边形polygon二进制binary二进制分离separate binary二值和灰度binary and grayscale翻转reverse返回return放大或缩小7 个级别zoom in or out 7 levels分类结果sort results分水岭Watershed分析Analysis分组视图view components浮点float腐蚀erode复合视图view composite复合输入combined with input复制duplicate复制duplicateselect all傅立叶变换Fourier transform改变热点值change hotspot values感兴趣区域ROI高级几何学Advanced geometry高通high pass格式栏formatbar更改默认的搜索参数modify default search parameters 工具Utilities工具栏toolbar工具属性tool properties工具条toolbar工作区workspace bar共享轮廓shared contours构件build构造表格construct grid关闭close和/或and/or和逆FFT画图工具drawing tools缓存buffer换算convert灰度grayscale恢复目标restore targets回放playback绘图连结connect map获得/装载标注make/load mask获取选定粒子draw selected blobs或从一个相关区域创建一个ROI or create an ROI from a relative region基线score基于校准映射的畸变校正distortion correction based on calibration mapping 极性polarity极坐标转换polar coordinate transformation几何学Geometry记录record加粗thick加法add间隔spacing间距distance兼容compatible简洁compactness剪切cut减法subtract减小缩进outdent交互式的定义字体参数包括搜索限制ine font parameters including search constraints 脚本栏script bar角度angle角度和缩放范围angle and scale range接收和确定域值acceptance and certainty thresholds结果栏result bar解开目标unlock targets精确度和时间间隔accuracy and timeout interval矩形rectangle矩形rectangular绝对差分absolute difference绝对值absolute value均匀uniform均值average拷贝copy拷贝序列copy sequence可接收的域值acceptance threshold克隆clone控制control控制controls快捷健shortcut key宽度breadth宽度width拉普拉斯Laplacians拉伸elongation蓝blue类型type粒子blob粒子标注label blobs粒子分离segment blobs粒子内的孔数目number of holes in a blob 亮度brightness亮度luminance另存为save as滤波器filters绿green轮廓profile overlay轮廓极性contour polarity逻辑运算logical operations面积area模板编辑edit model模板覆盖model coverage模板和目标覆盖model and target coverage 模板索引model index模板探测器Model Finder模板位置和角度model position and angle 模板中心model center模糊mask模块import VB module模块modules模式匹配Pattern matching默认案例default cases目标Targets目标分离separate objects目标评价target score欧拉数Euler number盆basins膨胀dilate匹配率match scores匹配数目number of matches平方和sum of the squares平滑smooth平均average平均averaged平均值mean平移translation前景色foreground color清除缓冲区为一个恒量clear buffer to a constant清除特定部分delete special区域增长region-growing ROI取反negate全部删除delete all缺省填充和相连粒子分离fill holes and separate touching blobs任意指定位置的中心矩和二阶矩central and ordinary moments of any order location: X, Y 锐化sharpen三维视图view 3D色度hue删除delete删除帧delete frame设置settings设置相机类型enable digitizer camera type设置要点set main示例demos事件发现数量number of occurrences事件数目number of occurrences视图View收藏collectionDICOM手动manually手绘曲线freehand输出选项output options输出选择结果export selected results输入通道input channel属性页properties page数据矩阵DataMatrix数字化设置Digitizer settings双缓存double buffer双域值two-level水平边沿horizontal edge搜索find搜索和其他应用Windows Finder and other applications 搜索角度search angle搜索结果search results搜索区域search area搜索区域search region搜索速度search speed速度speed算法arithmetic缩放scaling缩放和偏移scale and offset锁定目标lock destination锁定实时图像处理效果预览lock live preview of processing effects on images 锁定预览Lock preview锁定源lock source特定角度at specific angle特定匹配操作hit or miss梯度rank替换replace添加噪声add noise条带直径ferret diameter停止stop停止采集halt grab同步synchronize同步通道sync channel统计Statistics图像Image图像大小image size图像拷贝copy image图像属性image properties图形graph退出exit椭圆ellipse椭圆ellipses外形shape伪彩pseudo-color位置position文本查看view as text文件File文件MIL MFO font file文件load and save as MIL MMF files文件load and save models as MIL MMO files OCR文件中的函数make calls to functions in external DLL files文件转换器file converterActiveMIL Builder ActiveMIL Builder 无符号抽取部分Extract band -细化thin下一帧next显示表现字体的灰度级ayscale representations of fonts显示代码show code线line线lines相对起点relative origin像素总数sum of all pixels向前或向后移动Move to front or back向上或向下up or down校准Calibration校准calibrate新的/感兴趣区域粘贴paste into New/ROI新建new信息/ 图形层DICOM information/overlay形态morphology行为actions修改modify修改路径modify paths修改搜索参数modify default search parameters 序列采集sequence旋转rotation旋转模板rotate model选择select选择selector循环loops移动move移动shift应用过滤器和分类器apply filters and classifiers 影响区域zone of influence映射mapping用户定义user defined用基于变化上的控制实时预览分水岭转化结果阻止过分切割live preview of resulting watershed transformations with control over variation to prevent over segmentation用某个值填充fill with value优化和编辑调色板palette optimization/editor有条件的conditional域值threshold预处理模板优化搜索速度循环全部扫描preprocess model to optimize search speed circular over-scan预览previous元件数目和开始(自动或手动)number of cells and threshold auto or manual元件最小/最大尺寸cell size min/max源source允许的匹配错误率和加权fit error and weight运行run在目标中匹配数目number of modelmatches in target暂停pause增大缩进indent整数除integer divide正FFT正常连续continuous normal支持象征学supported symbologies: BC 412直方图均衡histogram equalization执行execute执行外部程序和自动完成VBA only execute external programs and perform Automation VBA only指定specify指数exponential Rayleigh中值median重复repeat重建reconstruct重建和修改字体restore and modify fonts重新操作redo重心center of gravity周长perimeter注释annotations转换Convert转换convert装载load装载和保存模板为MIL MMO装载和另存为MIL MFO装载和另存为MIL MMF状态栏status bar资源管理器拖放图像drag-and-drop images from Windows ExplorerWindows自动或手动automatic or manual自动或手动模板创建automatic or manual model creation字符产大小string size字符串string字体font最大maximum最大化maximum最大数maxima最后一帧last frame最小minimum最小化minimum最小间隔标准minimum separation criteria最小数minima坐标盒的范围bounding box coordinates图像数据操作Image data manipulation内存分配与释放allocation release图像复制copying设定和转换setting and conversion图像/视频的输入输出Image and video I/O支持文件或摄像头的输入file and camera based input图像/视频文件的输出image/video file output矩阵/向量数据操作及线性代数运算Matrix and vector manipulation and linear algebra routines 矩阵乘积products矩阵方程求解solvers特征值eigenvalues奇异值分解SVD支持多种动态数据结构Various dynamic data structures 链表lists队列queues数据集sets树trees图graphs基本图像处理Basic image processing去噪filtering边缘检测edge detection角点检测corner detection采样与插值sampling and interpolation色彩变换color conversion形态学处理morphological operations直方图histograms图像金字塔结构image pyramids结构分析Structural analysis连通域/分支connected components轮廓处理contour processing距离转换distance transform图像矩various moments模板匹配template matching霍夫变换Hough transform多项式逼近polygonal approximation曲线拟合line fitting椭圆拟合ellipse fitting狄劳尼三角化Delaunay triangulation摄像头定标Camera calibration寻找和跟踪定标模式finding and tracking calibration patterns 参数定标calibration,基本矩阵估计fundamental matrix estimation单应矩阵估计homography estimation立体视觉匹配stereo correspondence)运动分析Motion analysis光流optical flow动作分割motion segmentation目标跟踪tracking目标识别Object recognition特征方法eigen-methodsHMM模型HMM基本的GUI Basic GUI显示图像/视频display image/video键盘/鼠标操作keyboard and mouse handling滑动条scroll-bars图像标注Image labeling直线line曲线conic多边形polygon、文本标注text drawing梯度方向gradient directions系数coefficient空间频率spatial frequencies串级过滤cascade filtering卷积运算convolution operation有限差分近似the finite difference approximation 对数刻度logarithmic scale仿射参数affine parameters斑点Blob差距disparityAlgebraic operation 代数运算;一种图像处理运算,包括两幅图像对应像素的和、差、积、商。
surveillance
Human Detection in Surveillance ApplicationsAshish DesaiAbstractTo attack the problem of detecting humans in a surveillance video, there are a few techniques that can be used. In this project, both a color based algorithm and a motion based algorithm were implemented, eventually leading to a combined approach. While each method had advantages and disadvantages, the combined approach led to an algorithm that worked exceptionally well with lateral movement relative to the camera, as well as with objects that were sufficiently far from the camera. While movement towards and away from the camera, especially very near to the camera caused some problems, I believe extensions to the project can improve upon the algorithm to make it very robust. IntroductionWhen trying to determine a good project to pursue regarding digital video processing, I decided to look around in my everyday life and look for a problem to solve. During this time, the parking garage I used often had petty vandalism problems, where criminals were breaking into the garage and vandalizing cars. To prevent this, our landlord had installed cameras, but this did not deter criminals, because they knew that the viewing of the tapes would happen later when they were long gone. So, I began to wonder: what if we had a real-time count of all the people in the garage, along with their pictures?Problem StatementReally this problem can be easily expanded to any surveillance application. In terms of our class, this breaks down to a foreground/background segmentation problem, which can then be coupled with segmenting the resulting foreground into separate objects. Formally, given a video sequence, I would like to determine the number of people present at every frame, as well as save a picture of them.MethodologyTo attack the segmentation problem, I decided to attempt two separate approaches (and eventually I decided to combine them). The first approach was to use a color based segmentation algorithm that will look at each pixel in a frame and determine whether it is in the foreground or background based on its color. The second approach was to use a segmentation algorithm based on motion estimation to determine whether each block in a particular frame was in the foreground or background based on the next frame in the sequence. Each of these methods is described in detail below.A 25-second video sequence was captured at 10 frames per second in my apartment where each frame contained between zero and three people. The algorithms described below were used to estimate the number of people in each frame and the results will be discussed in the subsequent section.Color Based SegmentationThe color-based segmentation that I used was based on the Staufer and Grimson paper given in class (see references). I do not want to restate the entire algorithm, but the basis for this method is that every pixel in a frame has a probability of being a particular color based on a mixture of multiple Gaussian probability distributions. If a particular pixel is considered a part of the background, it should remain constant (within a small variance due to noise sources) for long periods of time. However, as objects move in front of a background, a particular pixel will change color briefly and then return to the original color. Therefore, if we keep track of multiple Gaussian distributions for each pixel, and determine which of these Gaussians is in the foreground and which are in the background, we can make a good estimation for our current pixel, given the various probabilities. This is done by updating the mean and variance of each Gaussian based on the current values, determining the most probable Gaussian and determining whether this Gaussian is in the foreground or background. For more details, please refer to the Staufer and Grimson paper given in the reference section.For the most part, the algorithm was used as described in the paper, using a learning constant of α = 0.7 and K = 5 Gaussians. Some notable differences between my algorithm and that of the paper is that I determined that a pixel value must be closer than 1.5 standard deviations from the mean to be considered a sample from a particular Gaussian, versus Staufer and Grimson using 2.5 standard deviations. Also, I initialized the mean and variances of the Gaussians to random values with relatively large variances, which leads to the initial frames of the sequence to be considered complete background.Without regurgitating the Staufer and Grimson paper, each pixel is considered background if it is assigned to the Gaussian with the highest ratio of weight/variance (this effectively means it is the most recently present Gaussian with the highest probability). When simply implementing this algorithm, I found that in addition to the desired targets, many other pixels were still considered foreground. However, often they were isolated pixels, so I combined the above algorithm with a connectivity requirement that led to the elimination of any foreground pixel that was not immediately adjacent to four other classified foreground pixels.Finally, I took advantage of the fact that I knew I was looking for fairly large objects. After the connectivity requirement, there were clusters of pixels that were very close to each other but not touching. Therefore, I connected groups that were within a4x4 block of each other. To take advantage of the large target size, I finally eliminated any group that contained fewer than 100 pixels.As a note, I purposely wanted to set up my color segmentation algorithm to prevent any false detection of foreground. This was done so that when I later combined this with the motion estimation (described later to have high false detection) I could weight the confidence of the color segmentation very high (this will be addressed more in the combination section). Therefore, I chose a fairly high value for the learning constant so objects will quickly be determined as background. Also, because I only had 10 fps, and I wanted to converge very quickly, the high learning rate was essential.Motion Based SegmentationThe method for motion based segmentation was very simplistic, both because more time was spent to improve the color based segmentation, and because of the desireto reduce complexity for the long-term goal of a real-time algorithm. Basically, for each frame in the sequence, a simple block-matching algorithm was implemented between the current frame and the next frame. For the block matching, I used 16x16 blocks with full search over a +16/-15 search area with a SAD criterion. Then, a simple threshold criterion was chosen, in which any block that was determined to move more than 8 pixels (in any direction) was considered a foreground object. While I did look at other means to do segmentation, such as K-means clustering, I was trying to reduce complexity for the reasons stated above. Additionally, I wanted the output for the motion-based segmentation to have the property that it would not eliminate any foreground blocks by classifying them as background. This was imperative for the combination approach described below. Again, to reduce the effects of noise, I also put size constraints on any foreground objects such that any object that contained fewer than four blocks was no longer considered part of the foreground.Combination of Color and Motion Based SegmentationAs both of the previously described methods had their benefits and disadvantages (described in the Results section), I decided to combine the two methods to provide the best detection throughout the video sequence. As described above, the parameters for the color-based segmentation were chosen such that there was a very high confidence that its output was indeed foreground pixels, but perhaps not all of them were captured. Additionally, the motion-based segmentation was developed such that all the foreground pixels were captured, but perhaps many background pixels as well. Therefore, a weighting was done with each of the outputs where more weight was given to higher connected foreground pixels, and those in the color output were weighted heavier than those in the motion-based output. Finally, a threshold was heuristically determined to provide a nice balance between the two methods.The block diagram of the overall system is shown below in Figure 1.ResultsThroughout the methodology section, I alluded to some of the results of the various methods, as they led to modifications of some of the algorithms. Overall, the color based segmentation performed very well (I spent the most time adjusting the parameters of this method), while the motion based segmentation had an adequate performance.More specifically, the color based segmentation mainly had problems when there were large occlusions for a number frames, and then they disappeared. This is probably because the learning constant was chosen to be fairly high, so these large occlusions caused the previous background to be lost from memory. Also, the color based segmentation had problems with shadows being considered a foreground object. Overall, however, most of the output of the color based method was in fact foreground (but not necessarily all of the foreground).On the other hand, the motion based segmentation was designed to identify all foreground objects, with the trade-off of falsely identifying background as foreground. This method successfully did this, where the falsely detected background occurred mostly from reflective surfaces (possibly detected moving objects out of camera view) and the walls near the camera (possibly from automatic gain control of the camera when foreground objects moved towards or away from camera). Additionally, the motion estimation had some problems with aspect ratio changes, because this was based on block matching, rather than affine parameters. Finally, there were a few frames that underwent global motion by a person bumping the camera. This was not handled well by the motion based segmentation, because of the use of a simple threshold versus clustering of motion vectors.Finally, the combination of these methods actually worked quite well. For lateral movement only, the detection worked exceptionally well, whereas with large occlusions or aspect ratio changes still posed problems. Again, the combination performed better than the individual algorithms. Specifically, the problems with shadows and global motion were completely eliminated.Figure 2. Comparison of MethodsFigure 2 shows the output of the various methods, where a black pixel value means that it was considered background. Here we see that the color based segmentation was pretty accurate, the motion based identified the object, as well as some of the wall, and the combined method did an excellent job. The picture with the box around the foreground object is the final output given by the algorithm to provide the goal of counting the number of people and providing a picture. Note that this example frame involved only lateral motion.Figure 3. Number of People per FrameFigure 3 shows the actual number of people in each frame, as well as the estimated number given from the combined algorithm. From frames 0 through 100, there was either no motion, or only lateral motion. As was mentioned earlier, the algorithm works exceptionally well (mean error of 0.13 people per frame) with this type of motion. Frames 100 through 175, as well as Frames 195 through 251 include very large occlusions, as well as large changes in the aspect ratio of objects. Both because of lack of affine parameters and automatic gain control of the camera, the algorithm contains many errors during this period of time (mean error of 0.99 people per frame).Figure 4. Lateral Motion FrameFigure 4 shows a frame where only lateral motion occurs, and the algorithm works very well.Figure 5. Frame with Occlusion/Aspect Ratio ChangesFigure 5 shows a frame with changes in aspect ratio, occlusion, and possible automatic gain control changes (object suddenly appears immediately in front of camera). Notice the large number of false detections, as well as the missing detection of the large foreground object that just appeared in front of the camera (on the left).Figure 6. Aspect Ratio Change Far from CameraFigure 6 shows that the change in aspect ratio was detected properly as long as the object remained further away from the camera. Again, this may be due to the fact that the automatic gain control of the camera is not initiated, and the color based segmentation is weighted more heavily than the motion based segmentation.Figure 7. Connected Foreground ObjectsFigure 7 shows an instance where the algorithm tries to identify two connected people (on the left), but cannot accurately separate them. This is due to the fact that the only connectivity requirements and thresholds were used. A possible improvement upon this method would be to use the motion vectors to segment two foreground objects that may be connected based on other algorithms.ConclusionOverall, I believe that this project was very successful in achieving the goal of identifying the number of people in a frame as well as providing a picture of them. While there is always room for improvement, the algorithm works exceptionally well for lateral movement relative to the camera, and for objects that are not close enough to the camera to cause large occlusions of the background.To improve upon this algorithm, many other methods could be considered. For example, the use of affine parameters for motion-based segmentation would improve the problems with large aspect ratio changes. The use of a log-based search for block matching could provide an opportunity to make the algorithm truly real-time (I expect that this could have been achieved for the color based segmentation). If K-means clustering was used for motion-based segmentation, this would better handle global motion, as well as perhaps segment multiple people that are connected from the color algorithm. Finally, the implementation of a temporal constraint that would utilize the knowledge of detected humans from previous and/or future frames could definitely improve upon this algorithm.References1. Chris Stauffer and W.E.L. Grimson. “Adaptive background mixture models for real-time tracking”, CVPR99, Fort Colins, CO, (June 1999).。
对码本模型中码字结构的改进
对码本模型中码字结构的改进李文辉;李慧春;王莹;姜园媛;孙明玉【摘要】针对码本结构,提出一种简化算法.该算法通过将码字元组中判断该码字是否冗余的元素——最大未使用时间改为由元组的其他变量直接计算而不存储在码字中,去除了该变量所占用的空间,将6元组替换为5元组.实验结果表明,该改进不会对运动目标检测增加额外计算,准确性和实时性不受影响,并可减少码本模型占用的内存.%The codeword space was reduced according to calculating the longest interval so that codeword is never recurred by other variable in tuple, and the interval is not stored in codeword, thus 6-tuple based codeword is replaced by 5-tuple. The experimental result shows that the new codebook model is as fast and accurate as the original model. Moreover, the memory space demanded is reduced.【期刊名称】《吉林大学学报(理学版)》【年(卷),期】2012(050)003【总页数】6页(P517-522)【关键词】运动目标检测;码本模型;码字结构;5元组码字【作者】李文辉;李慧春;王莹;姜园媛;孙明玉【作者单位】吉林大学计算机科学与技术学院,长春130012;吉林大学计算机科学与技术学院,长春130012;吉林大学计算机科学与技术学院,长春130012;吉林大学计算机科学与技术学院,长春130012;吉林大学计算机科学与技术学院,长春130012【正文语种】中文【中图分类】TP391.4视频图像的运动目标检测是智能视频监控系统中最基本、最重要的技术. 提取运动目标较普遍的方法是背景相减法. 该方法的原理是将当前帧与背景模型做比较, 如果同位置的像素特征、像素区域特征或其他特征存在一定程度的相似性, 则当前帧这些位置的像素点或区域是背景, 其他区域构成前景运动目标区域[1].码本算法是Chalidabhongse和Kim等[2-3]提出的建立背景模型的方法. 码本的思想是:根据每个像素点连续采样值的颜色距离和亮度范围将背景像素值量化后用码本表示, 然后利用背景相减法的思想把新输入像素值与该点对应的码本做比较判断, 从而提取出前景运动目标.由于码本方法具有对复杂环境适应性强, 实时性好的优点, 因此在智能视频监控中作为运动目标检测方法得到广泛应用. 进一步, Kim等[4]又在码本算法中加入了两个重要改进----层次建模和自适应码本的更新, 增强了码本模型适应光线缓慢变化、场景物体运动等动态变化环境的能力. 在改善检测性能方面, 引入Markov随机场的码本模型在动态背景中能更有效地提取前景[5]. 把码本方法和HSV阴影去除方法相结合的“锥体-柱体混合”码本模型, 能消除阴影和强光对前景提取的影响[6]. 文献[7]提出的块均值码本模型(BMCB)和文献[8]提出的块和像素级连的码本模型都考虑了像素与其邻近像素的关系, 在复杂环境中可获得更准确的运动目标. 在提高码本算法的实时性方面, 文献[9]根据经验值设置每个码字长度的上限, 可减小码本算法对内存的需求; 文献[10]提出基于“盒子”的码本模型, 比Kim等[3]的码本算法计算量更少, 实时性更好.目前, 多数对码本算法的改进都关注于改善码本模型的检测效果和提高算法实时性两方面, 对于码字结构的改进却很少关注. 本文在不改变Kim等所提出约束条件的前提下, 对码字结构进行改进, 去除了码字中表示最大未使用时间的元素. 对码字结构的简化可减少码本模型的内存开销, 且不影响运动目标检测的准确性与实时性.1 码本背景模型描述1.1 构建像素码本假设训练阶段单个像素的采样值序列为X={x1,x2,…,xN}, X中的每个元素都是RGB向量, 训练帧数为N. 设C={c1,c2,…,cL}为该像素的码本, 码本中含有L个码字. 每个像素码本中的码字个数由采样值的变化情况决定. Kim等[3]提出的码字ci(i=1,2,…,L)包括两部分: RGB向量和6元组其中:和分别表示码字中的最小和最大亮度值; fi表示码字出现的频率;λi表示该码字没有出现的最大时间间隔;pi和qi分别表示码字第一次出现和最后一次出现的时间.训练阶段每个采样值xt(1≤t≤N)都和已有的码字进行比较. 找到(如果存在)最匹配的码字cm, 并对该码字进行更新;如果找不到匹配码字, 则为其创建一个新的码字存入码本中. 码本提取过程如下.算法1 构建像素码本.1) C ← Ø, L ← 0;在集合C={ci,1≤i≤L}中根据以下条件找到与xt匹配的码字cm:为采样阈值;如果C=Ø或无匹配, L ← L+1, 产生一个新的码字cL:vL←(R,G,B), auxL←〈I,I,1,t-1,t,t〉;(1)否则更新匹配的码字cm:end for;3) 消除冗余的码字. 对于ci(i=1,2,…,L):temp λi←max{λi,N-qi+pi-1};(4)初始码本为:M←{ckck∈C∧temp λk<Tλ}, k为码字的索引 //阈值Tλ常取训练帧数的一半, 即Tλ=N/2.1.2 颜色和亮度计算颜色距离和亮度范围的公式如下:其中α(α<1)和β(β>1)是限定亮度变化范围的因子, 通常取0.4≤α≤0.7, 1.1≤β≤1.5.1.3 用码本检测运动目标码本背景模型建立后, 可直接使用背景相减法获得运动目标. 利用码本方法检测x是否属于运动目标的算法过程BGS(x)如下.算法2 运动目标提取.2) 在M中根据以下条件寻找与x匹配的码字:colordist(x,vi)≤ε2,算法2中, ε2是检测阈值, 通常ε2>ε1.1.4 码本模型的更新初始训练后, 场景可能会发生变化. 如在街道上, 交通工具会进入或离开停车场. 此外, 光照变化也会导致背景的变化. 为了码本模型的更新, Kim等[3]引入了缓存码本, 缓存码本中的码字和背景码本中的码字结构相同. 码本的动态更新过程如下.算法3 码本模型更新.1) 训练结束后, 获得背景码本M, 建立缓存码本M′;2) 对于新像素, 在M中寻找匹配码字, 如果找到, 更新该码字;3) 如果没有找到, 在M′中寻找匹配码字并更新. 如果M′中没有匹配, 则建立新码字h, 并插入到M′中;4) 根据TM′精简M′, 即M′←M′-{hk′hk′∈M′, λk′>TM′};(7)5) 将在M′中停留足够时间的码字移到M中, 即M←M+{hk′hk′∈M′, fk′>Tadd};(8)6) 从M中删除超过一定时间未被匹配的码字, 即M←M-{ckck∈M, λk>TM}.(9)2 对码字结构的改进2.1 理论分析元素λi的作用是在训练结束和码本更新时作为删除冗余码字的依据. 训练过程中, λi的更新公式如下:λi=max{λi,t-qi}.(10)令λ′=t-qi,(11)则λ′表示码字再出现时未使用的时间, 由式(10)可见, λi是训练过程中最大的λ′. 精简码本时, 如果码字最后的λi≥Tλ, 则为冗余码字, 需要删除. 事实上, 并不需要找到λ′的最大值. 如果码字在t时刻, 已有λ′≥Tλ, 即可认为该码字为冗余的.同理, 在码本模型的更新中, 也不需要根据码字的最大未使用时间删除冗余码字. 如果背景码本M中码字的未使用时间超过TM, 或缓存码本M′中码字的未使用时间超过TM′, 则认为该码字可被删除.2.2 算法实现在去除表示码字最大未使用时间所占用的空间后, 还可以进一步减少训练过程所用时间:背景码本中的码字一定是在前Tλ帧中第一次出现的, 在后Tλ帧中才出现的码字一定不会是背景码本中的码字. 这是因为新码字建立时, 按式(1), λ=t-1, t为当前的时间, 即码字最大未使用时间λ的初值为码字第一次出现的时间减1, 在以后的训练过程中, λ的值不会小于该初值. 如果λ≥Tλ, 则训练结束后, 该码字也会被当作冗余码字去除.因此, 设码字结构中auxi为五元组:算法步骤如下.算法4 改进后的算法过程.1) for t=1 to Tλdo寻找和xt匹配的码字, 如果存在更新该码字;如果不存在建立新的码字;end for;2) for t=Tλ+1 to N do寻找与xt匹配的码字cm, 如果t-qm≥Tλ, 删除该码字;否则更新该码字;不为新出现的像素建立码字;end for;3) 训练结束后, 精简码本M←{ckck∈C∧(N-qk+pk-1)<Tλ},(12)k为码字的索引;4) for t>N to end do检测运动目标, 更新匹配的码字;更新码本:M′←M′{hk′hk′∈M′, t-qk′>TM′},M←M+{hk′hk′∈M′, fk′>Tadd},M←M-{ckck∈M, t-qk>TM}.算法4中k和k′为码字的索引. 为了提高码本算法的效率, 步骤4)中更新码本时可以隔一定帧数进行一次码本的更新, 如10帧, 即不必每帧都更新码本.3 实验结果与分析为了验证应用本文方法所建的模型占用内存空间少、并能有效地检测运动目标、实时性较Kim等[3]提出的方法好, 本文在微软公司及IBM公司提供的测试视频库上进行了测试, 所用机器配置为:双核CPU, 频率2.8 GHz, 1 G内存, 环境为VC++. 实验分为三部分:检测精度、处理时间及存储空间的对比. 实验中使用的相关数据如下:α=0.6, β=1.3, ε1=20, ε2=23.图1 运动目标检测实验效果Fig.1 Experimental results of motion detection 3.1 检测精度的对比图1为从两个视频中捕获的帧图像检测实验结果, 分别为人物视频和车辆视频.由图1可见, 本文方法和Kim等[3]提出的码本算法检测结果基本一致. 为了定量比较本文算法和码本算法的性能差异, 分别计算了图1中两帧图像的错误前景点率(FP rate)、正确前景点率(TP rate)和精度(Precision)[11-13], 各项指标计算方法如下:FP rate=, TP rate =, Precision=,(13)其中: fp表示错误前景点数; tp表示正确前景点数; fn表示错误背景点数; tn表示正确背景点数; (fp+tn)表示真实前景图像中的背景点总数; (tp+fn)表示真实前景图像中的前景点总数. 计算结果列于表1.表1 性能参数对比Table 1 Performance parameters comparison视频FP rate 码本算法本文算法TP rate码本算法本文算法Precision码本算法本文算法人物视频0.079 20.072 70.990 80.990 60.846 00.856 7车辆视频0.006 20.003 70.849 10.849 10.566 80.689 3由表1可见, 本文方法和Kim等[3]码本算法的检测结果存在一定的差异, 这是因为在码本算法中, 新像素与码本中各个码字进行匹配时, 只需找到第一个满足条件的码字即可, 并不需要遍历整个码本链表后找到最佳匹配的码字, 而各个码字之间存在交集是可能的. 排在前面的码字被匹配的机会大, 精简码本时, 留在码本背景模型中的机会也大;排在后面的码字被匹配的机会小, 所以更容易被当成冗余码字从码本中删除. 此外, 对匹配上的码字更新过程也会使码字表示的范围发生改变. 因为本文方法不为训练后半阶段出现的新像素建立码字, 并及时删除冗余码字, 所以“准冗余码字”在训练阶段不会参与匹配, 给码本中其他码字更多匹配和更新的机会.3.2 处理时间的对比针对样本视频分别计算应用本文方法和码本方法平均每帧的处理时间, 结果列于表2.表2 处理时间的对比(ms)Table 2 Processing time comparison(ms)视频训练阶段码本算法本文算法检测阶段码本算法本文算法人物视频22.752 921.521 525.883 225.189 9车辆视频18.578 317.695 921.585 319.622 0由表2可见, 本文方法的处理时间较少.3.3 存储空间的对比因为本文对码字结构改进的目的是减少码本模型所占用的内存空间, 所以分别测试了本文算法和码本算法应用在所选视频上时, 模型所占用内存的情况, 结果列于表3. 表3 内存的对比(Kb)Table 3 Memory comparison(Kb)视频码本算法本文算法人物视频4 6424 180车辆视频3 1242 812由表3可见, 改进后码本模型所占用的内存空间约减少了1/9. 实验中按浮点型占用4个字节, 整型占用2个字节计算, 导致内存使用量改变的原因是:码本算法每个码字包括5个浮点型数据和4个整型数据(f,λ,p,q), 平均每个像素处的码本包括4个码字, 所以模型所占用的空间是112个字节[4]. 本文算法的码字结构相比于Kim等[3]提出的码本算法节省了一个整型数据的空间, 每个码字所占用的空间是104个字节.综上所述, 本文改进了码本结构, 提出了一种减小码本模型所需要内存开销的方法. 该方法具有广泛的实用性, 可作为有关码本模型各种算法的补充, 在不影响其背景建模结果的前提下, 减少了内存需求.参考文献【相关文献】[1] ZHANG Jun, DAI Ke-xue, LI Guo-hui. HSV Color-Space and Codebook Model Based Moving Objects Detection [J]. Systems Engineering and Electronics, 2008, 30(3): 423-427. (张军, 代科学, 李国辉. 基于HSV颜色空间和码本模型的运动目标检测 [J]. 系统工程与电子技术, 2008, 30(3): 423-427.)[2] Chalidabhongse T H, Kim K, Harwood D, et al. A Perturbation Method for Evaluating Background Subtraction Algorithms [C]//Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance. Nice, France: [s.n.], 2003, 10: 11-12.[3] Kim K, Chalidabhongse T H, Harwood D, et al. Background Modeling and Subtraction by Codebook Construction [C]//2004 International Conference on Image Processing. New York: IEEE Press, 2004: 3061-3064.[4] Kim K, Chalidabhongse T H, Harwood D, et al. Real-Time Foreground-Background Segmentation Using Codebook Model [J]. Real-Time Imaging, 2005, 11(3): 172-185. [5] WU Ming-jun, PENG Xian-rong. Spatio-Temporal Context for Codebook-Based Dynamic Background Subtraction [J]. AEU-International Journal of Electronics and Communications, 2010, 64(8): 739-747.[6] Doshi A, Trivedi M. “Hybrid Cone-Cylinder” Codebook Model for Foreground Detection with Shadow and Highlight Suppression [C]//Proc IEEE International Conference on Video and Signal Based Surveillance. Washington DC: IEEE Computer Society, 2006: 19.[7] LI Qi, SHAO Chun-fu, YUE Hao, et al. Real-Time Foreground-Background Segmentation Based on Improved Codebook Model [C]//2010 3rd International Congress on Image and Signal Processing. Yantai: IEEE Xplore, 2010: 269-273.[8] GUO Jing-ming, HSO Chih-sheng. Cascaded Background Subtraction Using Block-Based and Pixel-Based Codebooks [C]//2010 International Conference on Pattern Recognition. Washington DC: IEEE Computer Society, 2010: 1373-1376.[9] ZHANG Zhao-hui, CHEN Rui-qing, LU Han-qing, et al. Moving Foreground Detection Based on Modified Codebook [C]//2009 2nd International Congress on Image and Signal Processing. Washington DC: IEEE Computer Society, 2009: 1-5.[10] TU Qiu, XU Yi-ping, ZHOU Man-li. Box-Based Codebook Model for Real-Time Objects Detection [C]//7th World Congress on Intelligent Control and Automation. Washington DC: IEEE Computer Society, 2008: 7621-7625.[11] Maddalena L, Petrosino A. A Self-organizing Approach to Background Subtraction for Visual Surveillance Applications [J]. IEEE Transaction on Image Processing, 2008, 17(7): 1168-1177.[12] LIU Yang-yang, SHEN Xuan-jing, WANG Yi-qi, et al. Design and Implementation of Embedded Intelligent Monitor System Based on ARM [J]. Journal of Jilin University: Information Science Edition, 2011, 29(2): 158-163. (刘阳阳, 申铉京, 王一棋, 等. 基于ARM的智能监控系统的设计与实现 [J]. 吉林大学学报: 信息科学版, 2011, 29(2): 158-163.)[13] DING Ying, LI Wen-hui, FAN Jing-tao, et al. Fuzzy Integral Feature Based Algorithm for Moving Infrared Object Detection [J]. Journal of Jilin University: Engineering and Technology Edition, 2010, 40(5): 1330-1335. (丁莹, 李文辉, 范静涛, 等. 基于模糊积分特征的红外图像运动目标检测算法 [J]. 吉林大学学报: 工学版, 2010, 40(5): 1330-1335.)。
车牌识别外文翻译
中英文翻译A configurable method for multi-style license platerecognitionAutomatic license plate recognition (LPR) has been a practical technique in the past decades. Numerous applications, such as automatic toll collection, criminal pursuit and traffic law enforcement , have been benefited from it . Although some novel techniques, for example RFID (radio frequency identification), WSN (wireless sensor network), etc., have been proposed for car ID identification, LPR on image data is still an indispensable technique in current intelligent transportation systems for its convenience and low cost. LPR is generally divided into three steps: license plate detection, character segmentation and character recognition. The detection step roughly classifies LP and non-LP regions, the segmentation step separates the symbols/characters from each other in one LP so that only accurate outline of each image block of characters is left for the recognition, and the recognition step finally converts greylevel image block into characters/symbols by predefined recognition models. Although LPR technique has a long research history, it is still driven forward by various arising demands, the most frequent one of which is the variation of LP styles, for example:(1) Appearance variation caused by the change of image capturingconditions.(2)Style variation from one nation to another.(3)Style variation when the government releases new LP format. Wesummed them up into four factors, namely rotation angle,line number, character type and format, after comprehensive analyses of multi-style LP characteristics on real data. Generally speaking, any change of the above four factors can result in the change of LP style or appearance and then affect the detection, segmentation or recognition algorithms. If one LP has a large rotation angle, the segmentation and recognition algorithms for horizontal LP may not work. If there are more than one character lines in one LP, additional line separation algorithm is needed before a segmentation process. With the variation of character types when we apply the method from one nation to another, the ability to re-define the recognition models is needed. What is more, the change of LP styles requires the method to adjust by itself so that the segmented and recognized character candidates can match best with an LP format.Several methods have been proposed for multi-national LPs or multiformat LPs in the past years while few of them comprehensively address the style adaptation problem in terms of the abovementioned factors. Some of them only claim the ability of processing multinational LPs by redefining the detection and segmentation rules or recognition models.In this paper, we propose a configurable LPR method which is adaptable from one style to another, particularly from one nation to another, by defining the four factors as parameters.1Users can constrain the scope of a parameter and at the same time the method will adjust itself so that the recognition can be faster and more accurate. Similar to existing LPR techniques, we also provide details of detection, segmentation and recognition algorithms. The difference is that we emphasize on the configurable framework for LPR and the extensibility of the proposed method for multistyle LPs instead of the performance of each algorithm.In the past decades, many methods have been proposed for LPR that contains detection, segmentation and recognition algorithms. In the following paragraphs, these algorithms and LPR methods based on them are briefly reviewed.LP detection algorithms can be mainly classified into three classes according to the features used, namely edgebased algorithms, colorbased algorithms and texture-based algorithms. The most commonly used method for LP detection is certainly the combinations of edge detection and mathematical morphology .In these methods, gradient (edges) is first extracted from the image and then a spatial analysis by morphology is applied to connect the edges into LP regions. Another way is counting edges on the image rows to find out regions of dense edges or to describe the dense edges in LP regions by a Hough transformation .Edge analysis is the most straightforward method with low computation complexity and good extensibility. Compared with edgebased algorithms, colorbased algorithms depend more on the application conditions. Since LPs in a nation often have several2predefined colors, researchers have defined color models to segment region of interests as the LP regions .This kind of method can be affected a lot by lighting conditions. To win both high recall and low false positive rates, texture classification has been used for LP detection. In Ref.Kim et al. used an SVM to train texture classifiers to detect image block that contains LP pixels.In Ref. the authors used Gabor filters to extract texture features in multiscales and multiorientations to describe the texture properties of LP regions. In Ref. Zhang used X and Y derivative features,grey-value variance and Adaboost classifier to classify LP and non-LP regions in an image.In Refs. wavelet feature analysis is applied to identify LP regions. Despite the good performance of these methods the computation complexity will limit their usability. In addition, texture-based algorithms may be affected by multi-lingual factors.Multi-line LP segmentation algorithms can also be classified into three classes, namely algorithms based on projection,binarization and global optimization. In the projection algorithms, gradient or color projection on vertical orientation will be calculated at first. The “valleys”on the projection result are regarded as the space between characters and used to segment characters from each other.Segmented regions are further processed by vertical projection to obtain precise bounding boxes of the LP characters. Since simple segmentation methods are easily affected by the rotation of LP, segmenting the skewed LP becomes a key issue to be solved. In the binarization algorithms, global or local methods are often used3to obtain foreground from background and then region connection operation is used to obtain character regions. In the most recent work, local threshold determination and slide window technique are developed to improve the segmentation performance. In the global optimization algorithms, the goal is not to obtain good segmentation result for independent characters but to obtain a compromise of character spatial arrangement and single character recognition result. Hidden Markov chain has been used to formulate the dynamic segmentation of characters in LP. The advantage of the algorithm is that the global optimization will improve the robustness to noise. And the disadvantage is that precise format definition is necessary before a segmentation process.Character and symbol recognition algorithms in LPR can be categorized into learning-based ones and template matching ones. For the former one, artificial neural network (ANN) is the mostly used method since it is proved to be able to obtain very good recognition result given a large training set. An important factor in training an ANN recognition model for LP is to build reasonable network structure with good features. SVM-based method is also adopted in LPR to obtain good recognition performance with even few training samples. Recently, cascade classifier method is also used for LP recognition. Template matching is another widely used algorithm. Generally, researchers need to build template images by hand for the LP characters and symbols. They can assign larger weights for the important points, for example, the corner points, in the4template to emphasize the different characteristics of the characters. Invariance of feature points is also considered in the template matching method to improve the robustness. The disadvantage is that it is difficult to define new template by the users who have no professional knowledge on pattern recognition, which will restrict the application of the algorithm.Based on the abovementioned algorithms, lots of LPR methods have been developed. However, these methods aremainly developed for specific nation or special LP formats. In Ref. the authors focus on recognizing Greek LPs by proposing new segmentation and recognition algorithms. The characters on LPs are alphanumerics with several fixed formats. In Ref. Zhang et al. developed a learning-based method for LP detection and character recognition. Their method is mainly for LPs of Korean styles. In Ref. optical character recognition (OCR) technique are integrated into LPR to develop general LPR method, while the performance of OCR may drop when facing LPs of poor image quality since it is difficult to discriminate real character from candidates without format supervision. This method can only select candidates of best recognition results as LP characters without recovery process. Wang et al. developed a method to recognize LPR with various viewing angles. Skew factor is considered in their method. In Ref. the authors proposed an automatic LPR method which can treat the cases of changes of illumination, vehicle speed, routes and backgrounds, which was realized by developing new detection and segmentation algorithms with robustness to the5illumination and image blurring. The performance of the method is encouraging while the authors do not present the recognition result in multination or multistyle conditions. In Ref. the authors propose an LPR method in multinational environment with character segmentation and format independent recognition. Since no recognition information is used in character segmentation, false segmented characters from background noise may be produced. What is more, the recognition method is not a learning-based method, which will limit its extensibility. In Ref. Mecocci et al. propose a generative recognition method. Generative models (GM) are proposed to produce many synthetic characters whose statistical variability is equivalent (for each class) to that showed by real samples. Thus a suitable statistical description of a large set of characters can be obtained by using only a limited set of images. As a result, the extension ability of character recognition is improved. This method mainly concerns the character recognition extensibility instead of whole LPR method.From the review we can see that LPR method in multistyle LPR with multinational application is not fully considered. Lots of existing LPR methods can work very well in a special application condition while the performance will drop sharply when they are extended from one condition to another, or from several styles to others.多类型车牌识别配置的方法自动车牌识别(LPR)在过去的几十年中的实用技术。
2022年自然语言处理及计算语言学相关术语中英对译表二
自然语言处理及计算语言学相关术语中英对译表二自然语言处理及计算语言学相关术语中英对译表二 delimiter 定界符号 [定界符]denotation 外延denotic logic 符号逻辑dependency 依存关系dependency gram r 依存关系语法dependency relation 依存关系depth-first search 深度优先搜寻derivation 派生derivational bound morpheme 派生性附着语素descriptive gram r 描述型语法 [描写语法]descriptive linguistics 描述语言学 [描写语言学] desiderative 意愿的determiner 限定词deterministic algorithm 决定型算法 [确定性算法] deterministic finite state auto ton 决定型有限状态机deterministic parser 决定型语法剖析器 [确定性句法剖析程序] developmental psychology 开展心理学diachronic linguistics 历时语言学diacritic 附加符号dialectology 方言学dictionary database 辞典数据库 [词点数据库]dictionary entry 辞典条目digital pro ssing 数字处理 [数值处理] diglossia 双言digraph 二合字母diminutive 指小词diphone 双连音directed acyclic graph 有向非循环图disambiguation 消除歧义 [歧义消除] discourse 篇章discourse ysis 篇章分析 [言谈分析] discourse planning 篇章规划discourse representation theory 篇章表征理论 [言谈表示理论] discourse strategy 言谈策略discourse structure 言谈结构discrete 离散的disjunction 选言dissimilation 异化distributed 分布式的distributed cooperative reasoning 分布协调型推理distributed text parsing 分布式文本剖析disyllabic 双音节的ditransitive verb 双宾动词 [双宾语动词;双及物动词]divergen 扩散[分化]d-m (determiner-measure) construction 定量结构d-n (determiner-noun) construction 定名结构document retrieval system 文件检索系统 [文献检索系统] do in dependency 领域依存性 [领域依存关系]double insertion 交互中插double-base 双基downgrading 降级dummy 虚位duration 音长{ 学}/时段{语法学/语意学}dynamic programming 动态规划earley algorithm earley 算法echo 回声句egressive 呼气音ejective 紧喉音electronic dictionary 电子词典elementary string 根本字符串 [根本单词串] ellipsis 省略em algorithm em算法embedding 崁入emic 功能关系的empirici 经验论empty category principle 虚范畴原那么 [空范畴原理] empty word 虚词enclitics 后接成份end user 终端用户 [最终用户]endo ntric 同心的endophora 语境照应entailment 蕴涵entity 实体entropy 熵entry 条目episodic memory 情节性记忆epistemological work 认识论网络ergative verb 作格动词ergativity 作格性esperando 世界语etic 无功能关系etymology 词源学eventevent driven control 驱动型控制example-based chine translation 以例句为本的机器翻译excla tion 感慨exclusive disjunction 排它性逻辑“或”experien r case 经验者格expert system 专家系统extension 外延external argument 域外论元extraposition 移外变形 [外置转换]facility value 易度值feature 特征feature bundle 特征束feature co-ourren restriction 特征同现限制 [特性同现限制] feature instantiation 特征表达feature structure 特征结构 [特性结构]feature unification 特征连并 [特性合一]feedback 回馈felicity condition 妥适条件file structure 档案结构finite auto ton 有限状态机 [有限自动机]finite state 有限状态finite state morphology 有限状态构词法 [有限状态词法] finite-state auto ta 有限状态自动机finite-state language 有限状态语言finite-state chine 有限状态机finite-state transdu r 有限状态置换器flap 闪音flat 降音foreground infor tion 前景讯息 [前景信息]for l language theory 形式语言理论for l linguistics 形式语言学for l se ntics 形式语意学forward inferen 前向推理 [向前推理]forward-backward algorithm 前前后后算法frame 框架frame based knowledge representation 框架型知识表示frame theory 框架理论free morpheme 自由语素fregean principle fregean 原那么fricative 擦音f-structure 功能结构full text searching 全文检索function word 功能词functional gram r 功能语法functional programming 函数型程序设计 [函数型程序设计] functional senten perspective 功能句子观functional structure 功能结构functional unification 功能连并 [功能合一]functor 功能符fundamental frequency 基频garden path senten 花园路径句gb (gover ent and binding) 管辖约束geminate 重叠音gender 性generalized phrase structure gram r 概化词组结构语法 [广义短语结构语法]generative gram r 衍生语法generative linguistics 衍生语言学 [生成语言学]generic 泛指geic epistemology 发生认识论geive rker 属格标记genitive 属格gerund 动名词gover ent and binding theory 管辖约束理论gpsg (generalized phrase structure gram r) 概化词组结构语法[广义短语结构语法]gradability 可分级性gram r checker 文法检查器gram tical affix 语法词缀gram tical category 语法范畴gram tical function 语能gram tical inferen 文法推论gram tical relation 语法关系grapheme 字素haplology 类音删略head 中心语head driven phrase structure 中心语驱动词组结构 [中心词驱动词组结构]head feature convention 中心语特征继承原理 [中心词特性继承原理]head-driven phrase structure gram r 中心语驱动词组结构律heteronym 同形heuristic parsing 经验式句法剖析heuristics 经验知识hidden rkov model 隐式马可夫模型hierarchical structure 阶层结构 [层次结构]holophrase 单词句homograph 同形异义词homonym 同音异义词homophone 同音词homophony 同音异义homorganic 同部位音的horn clause horn 子句hpsg (head-driven phrase structure gram r) 中心语驱动词组结构语法hu n- chine inte 人机界面hypernym 上位词hypertext 超文件 [超文本]hyponym 下位词hypotactic 主从结构的ic (immediate constituent) 直接成份icg (infor tion-based case gram r) 讯息为本的格位语法idiom 成语 [熟语]idiosyncrasy 特异性illocutionary 施为性immediate constituent 直接成份imperative 祈使句implicative predicate 蕴含谓词implicature 含意indexical 标引的indirect object 间接宾语indirect speech act 间接言谈行动 [间接言语行为] indo-european language 印欧语言inductional inferen 归纳推理inferen chine 推理机器infinitive 不定词 [to 不定式]infix 中缀inflection/inflexion 屈折变化inflectional affix 屈折词缀infor tion extraction 信息撷取infor tion pro ssing 信息处理 [信息处理]infor tion retrieval 信息检索infor tion scien 信息科学 [信息科学; 情报科学] infor tion theory 信息论 [信息论]inherent feature 固有特征inherit 继承inheritan 继承inheritan hierarchy 继承阶层 [继承层次]inheritan of attribute 属性继承innateness position 语法天生假说insertion 中插inside-outside algorithm 里里外外算法instantiation 表达instrumental (case) 工具格integrated parser 集成句法剖析程序integrated theory of discourse ysis 篇章分析综合理论 [言谈分析综合理论]in igen intensive production 知识密集型生产intensifier 加强成分intensional logic 内含逻辑intensional se ntics 内涵语意学intensional type 内含类型interjection/excla tion 感慨词inter-level 中间成分interlingua 中介语言interlingual 中介语(的)interlocutor 对话者internalise 内化international phoic association (ipa) 国际学会inter 网际网络interpretive se ntics 诠释性语意学intonation 语调intonation unit (iu) 语调单位ipa (international phoic association) 国际学会ir (infor tion retrieval) 信息检索is-a relation is-a 关系isomorphi 同形现象iu (intonation unit) 语调单位junction 连接keyword in context 上下文中关键词[上下文内关键词] kinesics 体势学knowledge acquisition 知识习得knowledge base 知识库knowledge based chine translation 知识为本之机器翻译knowledge extraction 知识撷取 [知识题取]knowledge representation 知识表示kwic (keyword in context) 关键词前后文 [上下文内关键词] label 卷标labial 唇音labio-dental 唇齿音labio-velar 软颚唇音lad (language acquisition devi ) 语言习得装置lag 发声延迟language acquisition 语言习得language acquisition devi 语言习得装置language engineering 语言工程language generation 语言生成language intuition 语感language model 语言模型language technology 语言科技left-corner parsing 左角落剖析 [左角句法剖析] lem 词元lenis 弱辅音letter-to-phone 字转音lexeme 词汇单位lexical ambiguity 词汇歧义lexical category 词类lexical con ptual structure 词汇概念结构lexical entry 词项lexical entry selection standard 选词标准lexical integrity 词语完整性lexical se ntics 词汇语意学lexical-functional gram r 词汇功能语法lexicography 词典学lexicology 词汇学lexicon 词汇库 [词典;词库]lexis 词汇层lf (logical form) 逻辑形式lfg (lexical-functional gram r) 词汇功能语法liaison 连音linear bounded auto ton 线性有限自主机linear pre den 线性次序lingua franca 共通语linguistic decoding 语言译码linguistic unit 语言单位linked list 串行loan 外来语local 局部的locali 方位主义localizer 方位词locus model 轨迹模型locution 惯用语logic 逻辑logic array work 逻辑数组网络logic programming 逻辑程序设计 [逻辑程序设计] logical form 逻辑形式logical operator 逻辑算子 [逻辑算符]logic-based gram r 逻辑为本语法 [基于逻辑的语法] long term memory 记忆longest tch principle 最长匹配原那么 [最长一致法] lr (left-right) parsing lr 剖析chine dictionary 机器词典chine language 机器语言chine learning 机器学习chine translation 机器翻译chine-readable dictionary (mrd) 机读辞典crolinguistics 宏观语言学rkov chart 马可夫图the tical linguistics 数理语言学ximum entropy 最大熵m-d (modifier-head) construction 偏正结构mean length of utteran (mlu) 语句平均长度measure of infor tion 讯习测度 [信息测度] memory based 根据记忆的mental lexicon 心理词汇库mental model 心理模型mental pro ss 心理过程 [智力过程;智力处理] metalanguage 超语言metaphor 隐喻metaphorical extension 隐喻扩展metarule 律上律 [元规那么]metathesis 易位microlinguistics 微观语言学middle structure 中间式结构mini l pair 最小对mini list program 微言主义mlu (mean length of utteran ) 语句平均长度modal 情态词modal auxiliary 情态助动词modal logic 情态逻辑modifier 修饰语modular logic gram r 模块化逻辑语法modular parsing system 模块化句法剖析系统modularity 模块性(理论)module 模块monophthong 单元音monotonic 单调monotonicity 单调性montague gram r 蒙泰究语法 [蒙塔格语法] mood 语气morpheme 词素morphological affix 构词词缀morphological deposition 语素分解morphological pattern 词型morphological pro ssing 词素处理morphological rule 构词律 [词法规那么] morphological segmentation 语素切分morphology 构词学morphophonemics 词音学 [形态音位学;语素音位学] morphophonological rule 形态音位规那么morphosyntax 词句法motor theory 肌动理论movement 移位mrd ( chine-readable dictionary) 机读辞典模板,内容仅供参考。
高速铁路周界入侵视频图像样本库
高速铁路周界入侵视频图像样本库李传1,谢征宇1,2,李永玲1,秦勇2,3,于格2,孙雨萌1(1.北京交通大学交通运输学院,北京100044;2.中关村轨道交通视频与安全产业技术联盟,北京100089;3.北京交通大学轨道交通控制与安全国家重点实验室,北京100044)摘要:随着高速铁路的大规模建设与运营,视频监控系统在铁路沿线的应用规模越来越庞大,产生的视频数据爆炸性增长。
基于图像的目标检测算法能及时发现监控画面中入侵高铁周界的异物,对保证安全运营具有重要意义。
通过建设高铁周界入侵视频图像样本库,利用统一的高铁场景视频图像数据、目标检测算法、算法运行环境和算法评价标准,测试不同场景、天气下目标检测算法的功能有效性和现场适用性,为相关部门提供智能分析算法的标准化测试,以推动铁路视频图像分析的快速发展。
关键词:高速铁路;周界;异物入侵;样本库;目标检测算法中图分类号:U298文献标识码:A文章编号:1001-683X(2021)03-0136-08 DOI:10.19549/j.issn.1001-683x.2021.03.1360引言近年来我国高速铁路快速发展,预计到2025年高速铁路总里程将达到3.8万km[1]。
随着高速铁路的大规模建设与运营,视频监控系统在沿线的应用越来越普及,目前仅高速铁路线路上就有几十万个监控摄像头。
TB/T3478—2017《铁路视频监控需求规范铁路公安用户》明确规定了视频内容分析的项目,包括识别分析目标的大小、入侵检测等基于图像的目标检测算法,可以及时发现监控画面中入侵高铁周界的异物,对保证高铁安全运营具有重要意义。
中国铁路上海局集团有限公司在沪杭客专既有视频监控系统基础上,研发了高铁视频监控智能识别预警系统,实现了高铁线路人员、异物侵限和设备形位变化事件的智能识别和预警,为铁路周界安全提供了有效的技术手段[2]。
目前,视频监控系统对图像清晰度的要求越来越高,高清摄像机在铁路上的应用也倍受重视[3]。
segment anything解读
Segment AnythingIntroductionIn the field of computer vision, image segmentation refers to the process of dividing an image into multiple segments or regions. These segments are typically defined by boundaries that separate different objects or areas within the image. The task of segmenting anything involves applying segmentation techniques to various types of images, regardless of the content or complexity.Importance of Image SegmentationImage segmentation plays a crucial role in many computer vision applications, including object recognition, scene understanding, and image editing. By segmenting an image into meaningful regions, we can extract valuable information about the objects present in the scene. This information can be used for further analysis or to enhance the interpretation of the image.Challenges in Segmenting AnythingSegmenting anything presents several challenges due to the diverse nature and complexity of images. Some common challenges include: 1. Varying Object Shapes: Objects in images can have different shapes and sizes, making it difficult to define a universal segmentation approach.2. Complex Backgrounds: Images often contain complex backgrounds that can interfere with accurate segmentation.3. Object Occlusion: Objects may be partially occluded by other objects or by themselves, making it challenging to separate them from their surroundings.4. Ambiguity: Certain objects or regions in an image may have ambiguous boundaries, making it difficult to determine their exact segmentation.Techniques for Segmenting AnythingVarious techniques have been developed to address these challenges and perform accurate segmentation on diverse types of images. Here are some commonly used techniques:1. ThresholdingThresholding is a simple yet effective technique that separates pixels based on their intensity values. It involves setting a threshold value and classifying pixels as foreground or background based on whethertheir intensity is above or below the threshold.2. Edge-Based MethodsEdge-based methods focus on detecting abrupt changes in pixel intensity, which often correspond to object boundaries. These methods use edge detection algorithms, such as the Canny edge detector, to identify edges and segment objects based on the detected edges.3. Region-Based MethodsRegion-based methods group pixels into regions based on their similarity in color, texture, or other features. These methods often involve iterative processes, such as region growing or region splitting and merging, to gradually segment the image into distinct regions.4. Deep Learning ApproachesDeep learning approaches have gained popularity in recent years for image segmentation tasks. Convolutional Neural Networks (CNNs) are commonly used to learn features and perform pixel-wise classification to segment objects in an image. Popular architectures for image segmentation include U-Net and Mask R-CNN.Evaluation Metrics for Image SegmentationTo assess the performance of segmentation algorithms, various evaluation metrics are used. Some common metrics include:1.Intersection over Union (IoU): IoU measures the overlap betweenthe predicted segmentation mask and the ground truth mask. It iscalculated as the intersection area divided by the union area ofthe two masks.2.Pixel Accuracy: Pixel accuracy measures the percentage ofcorrectly classified pixels in the segmentation mask compared tothe ground truth mask.3.Mean Intersection over Union (mIoU): mIoU calculates the averageIoU across multiple segmented objects or regions in an image.Applications of Segmenting AnythingSegmentation techniques find applications in a wide range of fields, including:1.Medical Imaging: Image segmentation is used for tumor detection,organ delineation, and diagnosis in medical imaging applications. 2.Autonomous Driving: Segmenting objects such as pedestrians,vehicles, and traffic signs is crucial for autonomous vehicles’perception systems.3.Video Surveillance: Image segmentation aids in tracking movingobjects and identifying suspicious activities in surveillancevideos.4.Image Editing: Segmenting images allows for targeted adjustmentsand manipulations, such as background removal or objectreplacement.ConclusionSegmenting anything is a challenging yet essential task in computer vision. By applying various segmentation techniques and evaluation metrics, we can extract meaningful information from images and enable advanced applications in multiple domains. As technology continues to advance, we can expect further improvements in segmentation algorithms, leading to more accurate and efficient segmentation of any type of image.。
变形车牌照的处理及识别
system.It can be used in many kinds of vehicle management occasions.Compared
with traditional vehicle management methods,LPS greatly improves the efficiency of
髂■I 。川8如霜II
⑨
天津大薯
_c|:硕I■士 隅嘎学 I跚|位 躅哪论 ■勘文 la■■_
学科专业.I 作者姓名: 指导教师:
模式识别与智能系统 刘颖 王萍教授
2006年1 月
中文摘要
车辆牌照自动识别系统(LPR)是智能交通系统的重要组成部分,可用于各级 各类车辆管理场所。与传统的车辆管理方法相比,它很大地提高了管理效率与水 平。节省了人力、物力,实现了车辆管理的科学化、规范化,对交通治安起到了 一定的保障作用,因此有着广泛的应用前景。车牌自动识别系统~般包括车牌定 位、字符分割和字符识别三个模块。
WOrk includes:
(1)The license plate image is restricted greatly by illumination.The license plate
blurs in the condition of sunny or sunless such as in cloudy,flog and night.Aiming at
车牌定位
{
I 车牌字符分割
●
车牌字符识别 图1.1车牌识别系统流程图
第一章绪论
1.3 国内外研究现状
借助计算机工具的牌照识别出现于80年代。在1993年,LPR技术成功地从 实验室研究转向市场应用。近年来,随着市场需求的急剧扩大,LPR技术日渐成 熟,已有相当一部分的供应商分别提供系统的各个部件,如采集、硬件、字符识 别等。现在,大致有近15家提供完全商业化的LPR系统。LPR技术正在被广泛 的应用于AVI(Automation Vehicles Identification),AVL(Automation Vehicles Location),ETTM(Electronic"rolling and Traffic Management),VES(video Violation Enforcement)等领域,并向更加实用化的方向发展。机会与挑战并存, 该项技术也面临着许多困难:汽车牌照的多样化,牌照字符的小型化,新型车牌 材料和装饰牌照字体的出现对该技术提出了新的要求。
基于改进的局部表面凸性算法三维点云分割
图 1 局部表面凸性 Fig.1 Localsurfaceconvexity
{ } sigm[-nT inj,-cos(vnSim),vnSimF]
ci,j =maxsigm[max(‖ndT idi,ij,‖j ,‖ndT jdj,ij,‖i ),cos(90°-vconv),vconvF] ,
(1)
(2)边界判定
sigm(x,θ,m) =0.5- 0.5(x-θ)m ,
槡1+(x-θ)2m2
(2) 式中,θ为有效阈值,m为影响阈值处切线斜率的 范围参数。
由于三维点云中的邻近点不一定属于同一个 物体,需要对局部连通点集中的点进行物体边界 判定,判断获得的局部连通点集是否属于同一物 体。根据深度值的不连续可以判断物体边界的存 在以及邻近像素点是否属于同一部分,对于任意
{ li,j =min sigm[|m(irni{-ri,rjr)j}|,vrDiff,vr2Diff],
sigm[|(ri-(rjr)h--(rjr)h -rj)|,vrDiff,vrNF(ri)] ,
} sigm[|(ri-(rjr)j--r(kr)j-rk)|,vrNDiff,vrNF(ri)]
(1中国科学院 长春光学精密机械与物理研究所 激光与物质相互作用国家重点实验室,吉林 长春 130033;
2中国科学院大学,北京 100049)
摘要:点云分割是点云分类、识别以及三维重建等处理的基础,分割结果对后续应用影响巨大。本文提出利用连通点集 改进局部表面凸性算法中邻近点关系的方法,解决目前激光三维成像系统点云分割算法在处理复杂环境散乱点云时存 在分割过度及分割不充分的问题,通过主顶点与周围点构成连通集,作为分割判断局部子点集,形成有效分割区域。该 方法解决了常用点云分割方法无法对形状不规则物体进行有效分割的问题,提高了分割精度。算法实验结果表明,相比 于最小切割算法和区域生长算法,基于连通点集的改进局部表面凸性算法对实际路面环境信息的分割效果更好,并能在 一定程度上避免分割过度和分割不充分的情况,证明该方法适用于复杂环境散乱点云数据分割。 关 键 词:激光三维成像;点云分割;连通点集;局部表面凸性 中图分类号:TN958.98 文献标识码:A doi:10.3788/CO.20171003.0348
Hajer Fradi图像处理文献总结
2012年1、Hajer Fradi,Jean-Luc Dugelay. Robust Foreground Segmentation UsingImproved Gaussian Mixture Model and Optical Flow. ICIEV, 248-253, 2012.(前景分离方法)Abstract: GMM background subtraction has been widely employed to separate the moving objects from the static part of the scene. However, The background model estimation step is still problematic; the main difficulty is to decide which distributions of the mixture belong to the background. In this paper,the author propose a new approach based on incorporating an uniform motion model(Optical Flow) into GMM background subtraction.在论文中对一些前景分离方法进行了介绍与比较。
并对光流与GMM结合使用的部分文献进行了讨论。
对改进的GMM算法(参考文献15)和使用二次多项式模型的光流法(参考文献19)进行了说明。
在Pixel level结合使用了两种方法进行前景分离。
结果:在i2r数据集上进行了实验。
与参考文献15和19的方法在检测率(Recall)和精度(Precision)两个方面进行了对比,证明了方法的有效性。
2、Hajer Fradi,Jean-Luc Dugelay. People counting system in crowded scenes based on feature regression. EUSIPCO,136-140,2012.(拥挤状况下人群计数方法)Abstract: The author propose a counting system based on measurements of interest points, where a perspective normalization and a crowd measure-informed density estimation are introduced into a single feature. Then, the correspondence between this feature and the number of persons is learned by Gaussion Process regression.人群计数方法通常包括两类:detection based and features based.论文对两类方法存在的问题和研究现状进行了说明。
背景自适应的GrabCut图像分割算法
背景自适应的GrabCut图像分割算法杨绍兵;李磊民;黄玉清【摘要】图割用于图像分割需用户交互,基于激光雷达传感器,提出了阈值法得到目标的外截矩形,再映射到图像完成交互.针对GrabCut算法耗时、对局部噪声敏感和在复杂背景提取边缘不理想等缺点,提出了背景自适应的GrabCut算法,即在确定背景像素中选取可能目标像素邻近的一部分像素作为背景像素,使背景变得简单,尤其适用于前景像素在整幅图中所占比例较小和在目标像素周围的背景相对简单的情况.实验结果表明,所提算法与GrabCut算法相比,减少了图的节点数,降低了错误率,有效的提高了运行效率,提取的目标边缘信息更加完整、平滑.【期刊名称】《计算机系统应用》【年(卷),期】2017(026)002【总页数】5页(P174-178)【关键词】图像分割;GrabCut算法;高斯混合模型;激光雷达;背景自适应【作者】杨绍兵;李磊民;黄玉清【作者单位】西南科技大学信息工程学院,绵阳621010;西南科技大学研究生院,绵阳621010;西南科技大学信息工程学院,绵阳621010【正文语种】中文图像分割是图像处理和计算机视觉领域的基础, 图像分割的算法数量众多, 其中, 图割作为一种结合图像边缘信息和纹理信息鲁棒的能量最小化方法, 得到越来越多的重视, 广泛的应用于图像分割、机器视觉等领域.2001年Yuri Y Boykov和Maric -PierrJolly[1]首次证实了离散能量函数的全局最优化能有效地用于N-D 图像的精确目标提取, 并提出了一种交互式的基于能量最小化的二值图割算法, 利用最大流/最小割算法得到全局最优解[2]. 许多学者对构建颜色空间、纹理形状以及信息模型和能量函数进行了改进, Han等[3]使用多维非线性结构颜色特征取代GMM(Gaussian Mixture Model).2004年, Rother等[4]提出GrabCut算法, 是目前目标提取最好的方法之一. Poullot等[5]提出将GrabCut用于视频无监督的前景分割, 取得了很好的效果. 针对GrabCut耗时的缺陷, 文献[6]提出用分水岭分割把像素聚成超像素, 提高分割效率; 文献[7]提出了降低原图像的分辨率来加快收敛速度; 周良芬等[8]人采用分水岭降低了错误率且提高了运行效率. Hua等[9]采用感兴趣区域(ROI)提高算法的准确率. 目前GrabCut在工程应用很少, 主要因为图割算法GMM模型的迭代求解过程复杂, 运算量大, 而且图割是一种交互式分割算法, 需要借助其他传感器, 为算法提供交互信息. 针对以上两个问题本文提出了一种用激光雷达来实现用户交互的背景自适应的GrabCut分割算法.Rother等[4]提出了Grab Cut算法在Graph cut 基础上做了一下几个方面改进: 首先, 利用RGB三通道GMM取代灰度直方图来描述背景像素和前景像素的分布; 其次, 利用迭代求取GMM中的各参数取代一次估计完成能量函数最小化; 最后, 通过非完全标记方法, 用户只需框选可能前景区域就可完成交互.1.1 相关Graph Cut分割算法设G=(V,E)为一个无向图, 其中V是一个有限非空的节点集合, E为一个无序节点对集合的边集, 给定的待分割的图像I, 要分割出目标和背景, 通过用户交互或者其他传感器的信息确定前景和背景的种子后, 可以对应构建两个特殊的终端节点: 源节点S和汇节点T, P为像素映射成图的节点集合, 则V=(S,T)∪P. 分割后, 源节点S 和目标节点相连, 汇节点T则和背景节点相连如图1(c). 要转换成对边加权的图G,将图像I每个像素映射成G中的一个节点, 像素之间的关系用图G中边的权重表示. 边分为两种, 终端节点S和T分别与像素节点连接、像素节点与像素节点连接, 分别对应的t-links和n-links.给每个像素pi一个二值标号li∈{0,1}, 其中0代表背景背景像素, 1代表目标像素, 则标号向量L={l1,l2…,lN}为二值分割结果. 边的权重(代价)既要考虑两端点所对应像素的位置, 也要考虑像素间的灰度差. 为了获得最优的二值分割结果定义一个λ加权的区域项R(L)和边界项B(L)的组合:其中:数据项中, Rp(0)为像素p为目标的代价, Rp(1)为像素p为背景的代价; 在边界项中, 如果像素p和q同属于目标或者背景, 则对应边的代价F(p,q)比较大; 如果不属于同类, 则F(p,q)较小. 综上所述, 边集E中各个边的权重(代价函数)如表1所示. Boykov[1,2]的交互式分割过程如图1所示, (a)为一个的二维图像, 将其映射为图G 得到(b)图, 其中B像素表示背景种子, O表示前景种子, 由区域项表达式(2)得到t-links, 边界项表达式(3)得到n-links. 采用最大流/最小割得到最优解.1.2 GrabCut算法原理GrabCut采用RGB颜色空间模型, 在文献[10]中用K个高斯分量(一般K=5)的全协方差GMM来分别对目标和背景建模. 存在一个向量K=(k1,…kn…kN), kn表示第n个像素的高斯分量. GrabCut采用迭代过程使目标和背景GMM的参数更优, 能量函数最小; 此外, GrabCut的交互更为简单, 只需要框出可能目标像素, 其他的视为背景像素即只需要提供框的两个斜对角坐标就能完成交互能量函数定义为式(5), 其中U(L,θ,z)为区域项, 表示一个像素被归类为目标或者背景的惩罚; V(L,z)为边界项两个像素不连续的惩罚. D(ln,kn,θ,zn)为第n个像素对应的混合高斯建模后归为前景或者背景的惩罚.GrabCut算法步骤:初始化:(1) 用户直接框选可能目标得到初始的trimap T. 框外全为背景像素TB,框内为TU, 且有.(2) 当n∈TB则ln=0, 当n∈TU则ln=1.(3) 根据前景和背景的标号, 就可以估计各自的GMM参数.迭代最小化:(1) 为每个像素分配GMM的高斯分量:(2) 从给定的图像数据z中, 学习优化GMM参数:(3) 采用能量最小化进行分割估计:(4) 重复上面步骤, 直到收敛.GrabCut需要用户交互, 在实际工程运用中, 我们需借助其他传感器的信息. 本文采用32线激光雷达完成GrabCut所需的交互信息. 激光雷达和CCD(Charge-coupled Device)的标定不是本文重点内容, 因此假定激光雷达和CCD标定已经完成, 激光雷达和CCD图像的像素建立了映射关系.如图2为激光对凹和凸障碍物检测的原理图, H为激光雷达相对地面的高度, W为凹障碍物的宽度, p1, p2, p3, p4在一条线上激光雷达扫描与地面相交的四个点, θ1, θ2, θ3为激光的发射角, 当激光雷达参数一定时θ=θ1=θ2=θ3; D为激光雷达到p1的距离, θ一定时, 随着D越大, 激光雷达两线之间的水平距离越远, 也就是说分辨率越低, 则自主机器人(如挖掘机)作业时精度不够, 因此我们把激光雷达的信息和CCD图像信息进行融合, 借助图像信息提高精度.如图3所示, I为一幅RGB图, 图中小矩形代表一个像素, 深色部分为检测到的障碍物, 外部的矩形框为所需的交互信息, 由图可知完成算法交互只需求出矩形的斜对角两个坐标(x1,y1)和(x2,y2). 激光雷达主要的作用是检测障碍物并返回可能目标框的两点坐标.在获取雷达数据后, 本文采用项志宇等[10]提出的算法, 首先进行数据滤波, 数据滤波包含两个步骤. 首先, 距离值大于一定阈值的数据点认为是不可靠的点, 直接丢弃; 再采用窗口大小为3的中值滤波除去噪声点. 数据滤波后, 把相互之间距离差在一定的阈值范围内的数据点聚成快团. 当得到障碍物的大体轮廓后, 采用外截矩形, 再映射到图像得到图3中的两个坐标(x1,y1)和(x2,y2), 完成交互.针对GrabCut算法耗时、在复杂背景提取边缘不理想等缺点, 提出了背景自适应的GrabCut算法. 图割解决图像分割问题时, 需要将图像转化为网络图, 图像较大G的节点较多, 计算量变大, 因此我们可以根据可能目标像素的个数来自适应背景像素, 这样不仅减少了图G的节点数, 而且也使背景变得更加简单, 背景的GMM 更有效, 分割效果更好.图4为背景自适应的GrabCut算法原理, 其中I为RGB图像, 深色部分为障碍物. 在为改进之前, U为可能目标像素, 为背景像素, 交互时只需得到(x1,y1)和(x2,y2)两个坐标, 再分别进行GMM, 可求得各像素属于目标或者背景的概率. 改进后, 就可以得到U为可能目标像素, 图中两个矩形框之间的像素集合B为背景像素. 在得到(x1,y1)和(x2,y2)两个坐标, 背景我们在此基础上横轴扩展m个像素, 纵轴扩展n个像素得到B, 设I大小为m0x n0的一副图像, 则:可得到约束条件:改进后的背景根据前景的变化而变化, 从而改变了GrabCut的初始化.初始化:(1) 通过激光雷达得到(x1,y1)和(x2,y2)两个坐标, 以两个坐标画一个矩形得到U, 则矩形内为可能目标像素, 得到初始的trimap T, 框内为TU, (x1-m,y1-n)和(x2+m,y2+n)两个坐标所得的矩形框内和的差集得到TB.(2) 当n∈TB则ln=0, 当n∈TU则ln=1.(3) 根据前景和背景的标号, 就可以估计各自的GMM参数.本文选择了不同背景下的3幅图, 将改进的算法和文献[4]提出的GrabCut算法进行对比分析. 实验PC配置为2.4GHz的Intel双核CPU4G内存, 在windows平台下, 采用Visual Studio 2012配置opencv2.4.9, m和n的值的选取尽量让确定背景像素(两个矩形框之间的像素)单一, 这样GMM参数更优, 分割效果更好. 在自动交互的情况下, 通过实验并考虑实时性和分割效果, 得到式(8)中的m和n的值. 当然, 这里仅仅适用一般情况训练所得到的结果, 对于在特殊环境, 还需进一步实验得到参数m和n的值.图5中, 图像背景较为复杂且光照较暗, 目标邻近背景相对单一且前景和背景像素区分度不大. (b)为文献[4]提出的算法分割效果较差, 边缘不完整, 人的下半身和头部信息丢失; (c)为本文提出的算法, 分割的目标更加准确, 边缘完整, 目标信息没有丢失.图6为背景相对单一分割的效果图, (b)和(c)分割的目标信息没有丢失信息都完整, 本文算法边缘信息更光滑.图7为草地上的分割结果, (b)和(c)总体来讲分割效果都比较好, 但由于头部周围背景相对复杂一些, 所以本文提出的算法分割的边缘更为细致.本文采用错误率error和分割时间两个定量指标对图像分割结果进行客观评价. 假设理想分割后目标像素数量为N0, 此处用人工手动分割取代理论上的理想分割, N1为采用其他算法分割后目标像素个数, 则可得到:由表2对比可知, error显著的减少, 由于在图5中, 前景像素周围的背景像素单一, 所以分割错误率减少的最多; 由分割时间对比可知, 改进的算法提高了运行效率, 图5采用本文算法运行时间减少的最多, 而图7减少的时间最少, 可以看出本文提出的算法更加适用于前景像素在整幅的像素所占比例较少的情况.综上所述, 从实时性和错误率来分析, 本文的算法提取的目标更加高效、省时, 尤其在背景较复杂和前景像素在整幅图的像素所占比例较少时, 本文的算法分割的边缘更加光滑、细致, 更加节省时间.针对在工程实现中, GrabCut算法分割时需用户交互确定部分背景和可能前景像素, 本文提出了对激光雷达信息采用阈值法得到两个坐标信息, 再映射到图像完成交互. 针对GrabCut的局限性, 本文提出了背景自适应的GrabCut自动分割算法, 背景根据可能前景像素的变化而变化, 减少了背景像素, 从而减少了图的节点数, 分割时间显著的减少. 此外, 通过减少背景像素同时也可以剪除复杂的背景, GMM建模效果更有效, 错误率降低到3.5%以内, 分割的目标细节更丰富, 提取的目标更完整, 同时获得更细致、平滑的边缘, 通过算法分割效果分析、错误率和时间开销比较, 有效的说明了本文算法的优越性.1 Boykov YY, Jolly MP. Interactive graph cuts for optimal boundary & region segmentation of objects in N-D images. Proc. 8th IEEE International Conference on Computer Vision, 2001(ICCV 2001). IEEE. 2001.105–112.2 Boykov Y, Kolmogorov V. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. Tissue Engineering, 2005, 11(12): 1631–1639.3 Sezgin M, Sankur B. Survey over image thresholding techniques and quantitative performance evaluation. Journal of Electronic Imaging, 2004.4 Rother C, Kolmogorov V, Blake A. “GrabCut”: Interactive foreground extraction using iterated graph cuts. ACM Trans. on Graphics, 2004, 23(3): 307–312.5 Poullot S, Satoh S. VabCut: A video extension of GrabCut for unsupervised video foreground object segmentation. InternationalConference on Computer Vision Theory and Applications. IEEE. 2014. 362–371.6 徐秋平,郭敏,王亚荣.基于分水岭变换和图割的彩色图像快速分割.计算机工程,2009,35(19):210–212.7 丁红,张晓峰.基于快速收敛Grabcut的目标提取算法.计算机工程与设计,2012,33(4):1477–1481.8 周良芬,何建农.基于GrabCut改进的图像分割算法.计算机应用,2013,33(1):49–52.9 Hua S, Shi P. GrabCut color image segmentation based on region of interest. International Congress on Image and Signal Processing. IEEE. 2014.10 项志宇.针对越野自主导航的障碍物检测系统.东南大学学报:自然科学版,2005,(A02):71–74.。
图像处理专业词汇
FT 滤波器FFT filtersVGA 调色板和许多其他参数VGA palette and many others 按名称排序sort by name包括角度和刻度including angle and scale保持目标keep targets保存save保存和装载save and load饱和度saturation饱和加法和减法add and subtract with saturate背景淡化background flatten背景发现find background边缘和条纹测量Edge and Stripe/Measurement边缘和条纹的提取find edge and stripe编辑Edit编辑edit编辑或删除相关区域edit or delete relative region编码Code编码条Coda Bar变换forward or reverse fast Fourier transformation变量和自定义的行为variables and custom actions变量检测examine variables变形warping变形系数warping coefficients标题tile标注和影响区域label and zone of influence标准normal标准偏差standard deviation表面弯曲convex并入图像merge to image采集栏digitizer bar采集类型grab type菜单形式menu item参数Preferences参数轴和角度reference axis and angle测量measurement测量方法提取extract measurements from测量结果显示和统计display measurement results and statistics测量转换transfer to measurement插入Insert插入insert插入条件检查Insert condition checks查找最大值find extreme maximum长度length超过50 个不同特征的计算calculate over 50 differentfeatures area撤销undo撤销次数number of undo levels乘multiply尺寸size抽取或融合分量红red/处理Processing处理/采集图像到一个新的窗口processed/grabbed image into new window 窗口window窗口监视watch window窗位window leveling创建create垂直边沿vertical edge从Windows从表格新建new from grid从工具条按钮from toolbar button从用户窗口融合merge from user form粗糙roughness错误纠正error correction错误匹配fit error打开open打开近期的文件或脚本open recent file or script打印print打印设置print setup打印预览print preview大小和日期size and date带通band pass带有调色板的8- bit带有动态预览的直方图和x, y 线曲线椭圆轮廓histogram and x, y line curveellipse profiles with dynamic preview带阻band reject代码类型code type单步single step单一simple单帧采集snap shot导入VB等等etc.低通low pass第一帧first点point调色板预览palette viewer调试方式debug mode调用外部的DLL调整大小resize调整轮廓滤波器的平滑度和轮廓的最小域值adjust smoothness of contour filter and minimum threshold forcontours定点除fixed point divide定位精度positional accuracy定义一个包含有不相关的不一致的或无特征区域的模板define model including mask for irrelevant inconsistent orfeatureless areas定制制定-配置菜单Customize - configure menus动态预览with dynamic preview读出或产生一个条形或矩阵码read or generate bar and matrix codes读取和查验特征字符串erify character strings断点break points对比度contrast对比度拉伸contrast stretch对称symmetry对模板应用“不关心的”像素标注apply don't care pixel mask to model多边形polygon二进制binary二进制分离separate binary二值和灰度binary and grayscale翻转reverse返回return放大或缩小7 个级别zoom in or out 7 levels分类结果sort results分水岭Watershed分析Analysis分组视图view components浮点float腐蚀erode复合视图view composite复合输入combined with input复制duplicate复制duplicateselect all傅立叶变换Fourier transform改变热点值change hotspot values感兴趣区域ROI高级几何学Advanced geometry高通high pass格式栏formatbar更改默认的搜索参数modify default search parameters 工具Utilities工具栏toolbar工具属性tool properties工具条toolbar工作区workspace bar共享轮廓shared contours构件build构造表格construct grid和/或and/or和逆FFT画图工具drawing tools缓存buffer换算convert灰度grayscale恢复目标restore targets回放playback绘图连结connect map获得/装载标注make/load mask获取选定粒子draw selected blobs或从一个相关区域创建一个ROI or create an ROI from a relative region基线score基于校准映射的畸变校正distortion correction based on calibration mapping 极性polarity极坐标转换polar coordinatetransformation几何学Geometry记录record加粗thick加法add间隔spacing兼容compatible简洁compactness剪切cut减法subtract减小缩进outdent交互式的定义字体参数包括搜索限制ine font parameters including search constraints脚本栏script bar角度angle角度和缩放范围angle and scale range接收和确定域值acceptance and certainty thresholds结果栏result bar解开目标unlock targets精确度和时间间隔accuracy and timeout interval矩形rectangle矩形rectangular绝对差分absolute difference绝对值absolute value均匀uniform均值average拷贝copy拷贝序列copy sequence可接收的域值acceptance threshold克隆clone控制control控制controls快捷健shortcut key宽度breadth宽度width拉普拉斯Laplacians拉伸elongation蓝blue类型type粒子Blob粒子blob粒子标注label blobs粒子分离segment blobs粒子内的孔数目number of holes in a blob 亮度brightness亮度luminance另存为save as滤波器filters绿green轮廓profile overlay轮廓极性contour polarity逻辑运算logical operations面积area模板编辑edit model模板覆盖model coverage模板和目标覆盖model and target coverage 模板索引model index模板探测器Model Finder模板位置和角度model position and angle 模板中心model center模糊mask模块import VB module模块modules模式匹配Pattern matching默认案例default cases目标Targets目标分离separate objects目标评价target score欧拉数Euler number盆basins膨胀dilate匹配率match scores匹配数目number of matches平方和sum of the squares平滑smooth平均average平均averaged平均值mean平移translation前景色foreground color清除缓冲区为一个恒量clear buffer to a constant清除特定部分delete special区域增长region-growing ROI取反negate全部删除delete all缺省填充和相连粒子分离fill holes and separate touching blobs任意指定位置的中心矩和二阶矩central and ordinary moments of any order location: X, Y锐化sharpen三维视图view 3D色度hue删除delete删除帧delete frame设置settings设置相机类型enable digitizer camera type设置要点set main示例demos事件发现数量number of occurrences事件数目number of occurrences视图View收藏collectionDICOM手动manually手绘曲线freehand输出选项output options输出选择结果export selected results输入通道input channel属性页properties page数据矩阵DataMatrix数字化设置Digitizer settings双缓存double buffer双域值two-level水平边沿horizontal edge搜索find搜索和其他应用Windows Finder and other applications 搜索角度search angle搜索结果search results搜索区域search area搜索区域search region搜索速度search speed速度speed算法arithmetic缩放scaling缩放和偏移scale and offset锁定目标lock destination锁定实时图像处理效果预览lock live preview of processing effects on images 锁定预览Lock preview锁定源lock source特定角度at specific angle特定匹配操作hit or miss梯度rank替换replace添加噪声add noise条带直径ferret diameter停止stop停止采集halt grab同步synchronize同步通道sync channel统计Statistics图像Image图像大小image size图像拷贝copy image图像属性image properties图形graph退出exit椭圆ellipse椭圆ellipses外形shape伪彩pseudo-color位置position文本查看view as text文件File文件MIL MFO font file文件load and save as MIL MMF files文件load and save models as MIL MMO files OCR文件中的函数make calls to functions in external DLL files文件转换器file converterActiveMIL Builder ActiveMIL Builder 无符号抽取部分Extract band -细化thin下一帧next显示表现字体的灰度级ayscale representations of fonts显示代码show code线line线lines相对起点relative origin像素总数sum of all pixels向前或向后移动Move to front or back向上或向下up or down校准Calibration校准calibrate新的/感兴趣区域粘贴paste into New/ROI新建new信息/ 图形层DICOM information/overlay形态morphology行为actions修改modify修改路径modify paths修改搜索参数modify default search parameters 序列采集sequence旋转rotation旋转模板rotate model选择select选择selector循环loops移动move移动shift应用过滤器和分类器apply filters and classifiers影响区域zone of influence映射mapping用户定义user defined用基于变化上的控制实时预览分水岭转化结果阻止过分切割live preview of resulting watershed transformations with controlover variation to prevent over segmentation用某个值填充fill with value优化和编辑调色板palette optimization/editor有条件的conditional域值threshold域值thresholding预处理模板优化搜索速度循环全部扫描preprocess model to optimize search speed circular over-scan预览previous元件数目和开始(自动或手动)number of cells and threshold auto or manual元件最小/最大尺寸cell size min/max源source允许的匹配错误率和加权fit error and weight运行run在目标中匹配数目number of modelmatches in target暂停pause增大缩进indent整数除integer divide正FFT正常连续continuous normal支持象征学supported symbologies: BC 412直方图均衡histogram equalization执行execute执行外部程序和自动完成VBA only execute external programs and perform Automation VBA only指定specify指数exponential Rayleigh中值median重复repeat重建reconstruct重建和修改字体restore and modify fonts重新操作redo重心center of gravity周长perimeter注释annotations转换Convert转换convert装载load装载和保存模板为MIL MMO装载和另存为MIL MFO装载和另存为MIL MMF状态栏status bar资源管理器拖放图像drag-and-drop images from Windows ExplorerWindows自动或手动automatic or manual自动或手动模板创建automatic or manual model creation字符产大小string size字符串string字体font最大maximum最大化maximum最大数maxima最后一帧last frame最小minimum最小化minimum最小间隔标准minimum separation criteria最小数minima坐标盒的范围bounding box coordinatesAlgebraic operation 代数运算;一种图像处理运算,包括两幅图像对应像素的和、差、积、商。
distance transform of sampled function解读
distance transform of sampled function解读Distance Transform of Sampled Function: An InterpretationIntroductionThe distance transform of a sampled function is a fundamental concept in digital image processing and computer vision. It serves as a powerful tool for various applications such as object recognition, image segmentation, and shape analysis. In this article, we will delve into the intricacies of the distance transform of a sampled function, its key properties, and its significance in computer science.Definition and Basic PrinciplesThe distance transform is an operation that assigns a distance value to each pixel in an image, based on its proximity to a specific target object or region. It quantifies the distance between each pixel and the nearest boundary of the object, providing valuable geometric information about the image.To compute the distance transform, first, a binary image is created, where the target object or region is represented by foreground pixels (usually white) and the background is represented by background pixels (usually black). This binary image serves as the input for the distance transform algorithm.Distance Transform AlgorithmsSeveral distance transform algorithms have been developed over the years. One of the most widely used algorithms is the chamfer distancetransform, also known as the 3-4-5 algorithm. This algorithm assigns a distance value to each foreground pixel by considering the neighboring pixels and their corresponding distances. Other popular algorithms include the Euclidean distance transform, the Manhattan distance transform, and the Voronoi distance transform.Properties of the Distance TransformThe distance transform possesses a set of important properties that make it a versatile tool for image analysis. These properties include:1. Distance Metric Preservation: The distance values assigned to the pixels accurately represent their geometric proximity to the boundary of the target object.2. Locality: The distance transform efficiently encodes local shape information. It provides a detailed description of the object's boundary and captures fine-grained details.3. Invariance to Object Shape: The distance transform is independent of the object's shape, making it robust to variations in object size, rotation, and orientation.Applications of the Distance TransformThe distance transform finds numerous applications across various domains. Some notable applications include:1. Image Segmentation: The distance transform can be used in conjunction with segmentation algorithms to accurately delineate objects inan image. It helps in distinguishing objects from the background and separating overlapping objects.2. Skeletonization: By considering the foreground pixels with a distance value of 1, the distance transform can be used to extract the object's skeleton. The skeleton represents the object's medial axis, aiding in shape analysis and recognition.3. Path Planning: The distance transform can assist in path planning algorithms by providing a distance map that guides the navigation of robots or autonomous vehicles. It helps in finding the shortest path between two points while avoiding obstacles.ConclusionThe distance transform of a sampled function plays a vital role in digital image processing and computer vision. Its ability to capture geometric information, preserve distance metrics, and provide valuable insights into the spatial structure of objects makes it indispensable in various applications. The proper understanding and utilization of the distance transform contribute to the advancement of image analysis techniques, enabling more accurate and efficient solutions in computer science.。
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Using this behavior-based similarity measure, we extend the notion of 2-dimensional image correlation into the 3-dimensional space-time volume, thus allowing to correlate dynamic behaviors and actions. Small space-time video segments (small video clips) are “correlated” against entire video sequences in all three dimensions (x,y, and t). Peak correlation values correspond to video locations with similar dynamic behaviors. Our approach can detect very complex behaviors in video sequences (e.g., ballet movements, pool dives, running water), even when multiple complex activities occur simultaneously within the field-of-view of the camera. We further show its robustness to small changes in scale
图像处理专业英语词汇
图像处理专业英语词汇Introduction:As a professional in the field of image processing, it is important to have a strong command of the relevant technical terminology in English. This will enable effective communication with colleagues, clients, and stakeholders in the industry. In this document, we will provide a comprehensive list of commonly used English vocabulary related to image processing, along with their definitions and usage examples.1. Image Processing:Image processing refers to the manipulation and analysis of digital images using computer algorithms. It involves various techniques such as image enhancement, restoration, segmentation, and recognition.2. Pixel:A pixel, short for picture element, is the smallest unit of a digital image. It represents a single point in an image and contains information about its color and intensity.Example: The resolution of a digital camera is determined by the number of pixels it can capture in an image.3. Resolution:Resolution refers to the level of detail that can be captured or displayed in an image. It is typically measured in pixels per inch (PPI) or dots per inch (DPI).Example: Higher resolution images provide sharper and more detailed visuals.4. Image Enhancement:Image enhancement involves improving the quality of an image by adjusting its brightness, contrast, sharpness, and color balance.Example: The image processing software offers a range of tools for enhancing photographs.5. Image Restoration:Image restoration techniques are used to remove noise, blur, or other distortions from an image and restore it to its original quality.Example: The image restoration algorithm successfully eliminated the noise in the scanned document.6. Image Segmentation:Image segmentation is the process of dividing an image into multiple regions or objects based on their characteristics, such as color, texture, or intensity.Example: The image segmentation algorithm accurately separated the foreground and background objects.7. Image Recognition:Image recognition involves identifying and classifying objects or patterns in an image using machine learning and computer vision techniques.Example: The image recognition system can accurately recognize and classify different species of flowers.8. Histogram:A histogram is a graphical representation of the distribution of pixel intensities in an image. It shows the frequency of occurrence of different intensity levels.Example: The histogram analysis revealed a high concentration of dark pixels in the image.9. Edge Detection:Edge detection is a technique used to identify and highlight the boundaries between different objects or regions in an image.Example: The edge detection algorithm accurately detected the edges of the objects in the image.10. Image Compression:Image compression is the process of reducing the file size of an image without significant loss of quality. It is achieved by removing redundant or irrelevant information from the image.Example: The image compression algorithm reduced the file size by 50% without noticeable loss of image quality.11. Morphological Operations:Morphological operations are a set of image processing techniques used to analyze and manipulate the shape and structure of objects in an image.Example: The morphological operations successfully removed small noise particles from the image.12. Feature Extraction:Feature extraction involves identifying and extracting relevant features or characteristics from an image for further analysis or classification.Example: The feature extraction algorithm extracted texture features from the image for cancer detection.Conclusion:This comprehensive list of English vocabulary related to image processing provides a solid foundation for effective communication in the field. By familiarizing yourself with these terms and their usage, you will be better equipped to collaborate, discuss, andpresent ideas in the context of image processing. Remember to continuously update your knowledge as the field evolves and new techniques emerge.。
(论文格式模板)学术会议论文标题((居中,三号黑体)
(论文格式模板)学术会议论文标题(居中,三号黑体)王某某,张某某,李某某(居中,四号楷体)(XX 大学 XX 学院,上海200000)(居中,小五号楷体)摘要:中文摘要200字左右,应包括目的、方法、结果和结论等要素。
(小五号宋体)关键词:关键词需3~5个;分号间隔(小五号宋体)English Title (居中四号加粗, Times New Roman)WANG Mou-mou ,ZHANG Mou-mou, LI Si-si Mou-mou (居中五号)(School of Communication and Information Engineering, Shanghai University, Shanghai 200072, China) (居中小五号)Abstract: abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract, abstract abstract abstract abstract abstract abstract. Abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract, abstract abstract abstract abstract abstract abstract. Abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract abstract, abstract abstract abstract abstract abstract abstract. (五号)Key words : key word one; key word two; key word three1 引言(四号黑体)正文正文正文正文正文正文正文正文正文正文,正文正文正文正文正文正文正文正文正文正文。
一种封闭区域人数智能监控统计系统
一种封闭区域人数智能监控统计系统何继燕;郜鲁涛;赵红波【摘要】传统安防视频监控系统数据存储空间非常巨大, 人工查找异常事件或行为等有效的视频信息困难, 产生了应用视频智能监控技术来实现更高效的自动监控并配以报警功能, 对视频序列中的运动目标进行检测, 实现检测区域的人数统计, 使事后取证的被动防守变为主动防御报警的需求.在综合运用数字图像处理理论、目标跟踪算法等的基础上, 构建了一个人物分割与人群跟踪相结合的人物计数系统.以基于背景差分算法与帧间差分算法相结合的方式, 建立视频目标分割算法, 包括对目标进行检测并粗略分割, 分单人和多人计数.实现对视频图像的保存读取, 前景单人与群体的提取和分割.该方法确能为智能监控的实现提供一种可行的路径.%Due to large data storage space and difficult finding of valid video information manually such as abnormal events or activities for traditional security video monitoring system, the requirements applying intelligent monitoring technology to achieve more efficient automatic monitoring with alarm function, detection of moving targets in video sequence to realize the number statistics of people in the detection area, which makes the passive defense after the forensics into active defense alarm, are produced.Based on the digital image processing theory and target tracking algorithm, we construct a figure counting system in combination with character segmentation and crowd tracking.A video object segmentation algorithm is established on the basis of combining background difference algorithm and inter-frame difference algorithm, including detection and rough segmentation of object, single person counting and multi-personcounting.The saving and reading of video images, and the extraction and segmentation of foreground single and group, are implemented.This method can provide a feasible path for the implementation of intelligent monitoring.【期刊名称】《计算机技术与发展》【年(卷),期】2019(029)002【总页数】4页(P212-215)【关键词】目标跟踪;人物统计;人物分割;背景差分【作者】何继燕;郜鲁涛;赵红波【作者单位】云南农业大学, 云南昆明 650201;云南农业大学, 云南昆明 650201;云南农业大学, 云南昆明 650201【正文语种】中文【中图分类】TP3020 引言随着对社会各领域信息获取需求的日趋强烈,数字视频监控技术的应用逐渐变广。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Improving Foreground Segmentations with Probabilistic Superpixel Markov Random Fields Alexander Schick∗Martin B¨a uml†Rainer Stiefelhagen∗†∗Fraunhofer IOSB alexander.schick@iosb.fraunhofer.de†Karlsruhe Institute of Technology {martin.baeuml,rainer.stiefelhagen}@AbstractWe propose a novel post-processing framework to im-prove foreground segmentations with the use of Probabilis-tic Superpixel Markov Random Fields.First,we convert a given pixel-based segmentation into a probabilistic su-perpixel representation.Based on these probabilistic su-perpixels,a Markov randomfield exploits structural infor-mation and similarities to improve the segmentation.We evaluate our approach on all categories of the Change De-tection2012dataset.Our approach improves all perfor-mance measures simultaneously for eight different basis foreground segmentation algorithms.1.Introduction and related workSegmentation of an image into foreground and back-ground is arguably one of the most important pre-processing steps in many computer vision applications.The goal of change detection,or foreground segmentation,is the separation of the dynamic foreground from the presumably static background.A good segmentation of the relevant im-age regions can greatly improve the performance of applica-tions building on top of it.For example,people detection is much easier and computationally more efficient when static background is reliably removed.Because of the importance of change detection,there is a large body of literature and a great number of varia-tions.However,most algorithms can be classified by how they model the background and how they compute the dis-tance of a frame to the background model.The background model is usually described on pixel level,e.g.by a simple mean value,by storing a set of samples[4],or by one or multiple Gaussians[9,16,17].Other approaches use self-organizing neural maps[11]or non-parametric density es-timation methods[8].More comprehensive overviews can be found in[5,6,14].It is common to post-process a segmentation to improve results and comparative evaluations can be found in[6,13].Typical post-processing methods include noise removal, morphologic operators,medianfiltering,but also higher-level methods such as saliency or opticalflow analysis. Post-processing generally improves the results and can alle-viate the differences between algorithms[6].However,the methods and their parameters must,in general,be chosen carefully depending on the sequence[13].Independent of the segmentation algorithm or post-processing method,most approaches are based on pixels. But pixels are a result of the sensors we use,not meaningful units by themselves.In addition,they are very susceptible to noise.Superpixels,on the other hand,are a higher-level image representation that partitions an image into meaning-ful regions.Their key property is that they align well with object boundaries.They are more robust to noise than pixels and serve well to represent objects in the image.Therefore, they can be used as atomic primitives in image processing applications.We use an improved variation of the super-pixel segmentation algorithm SLIC[2]that was proposed in[15].[15]generally maintains a lattice-like structure[12] due to the initial rasterization.This means that the superpix-els conform to a grid with known andfixed neighborhoods, like pixels in an image.This property is only weakly guar-anteed,but we can nevertheless exploit it in the Markov ran-domfield(MRF).While there are segmentation approaches based on superpixels[3,10],we are unaware of superpixels being used in a post-processing framework.In the remainder of this paper,we will introduce a novel post-processing framework based on Probabilistic Super-pixel Markov Random Fields(PSP-MRF)to improve a given foreground segmentation.We show evaluations of eight benchmark algorithms for all categories of the Change Detection2012dataset before concluding with a discussion.2.Probabilistic Superpixel Markov RandomFieldsWe willfirst introduce the concept of probabilistic super-pixels before incorporating them in a Markov randomfield.(a)Input(b)Segmentation(c)Superpixels(d)Probabilisticsuperpixels(e)Final segmentationFigure1.Visualization of the processing pipeline.Based on the input,the superpixels are bining them with the segmen-tation leads to probabilistic superpixels with white regions indicating a high foreground probability.Thefinal segmentation shows a clear improvement.2.1.Probabilistic superpixelsWe will now introduce the term probabilistic superpixeland explain how to compute them.For a more visual ex-planation,we refer to Figure1.A probabilistic superpixelgives the probability that its pixels belong to a certain class.In this paper,we consider two classes:foreground andbackground.Therefore,a probabilistic superpixel gives theprobability of its pixels belonging to the foreground.Let F be the foreground segmentation of image I andS its superpixel segmentation.Let S∈S be a superpixelwith pixels p∈S and|S|its size.Let F(p)be1if pixel pbelongs to the foreground and0otherwise.Then,the prob-ability of superpixel S belonging to the foreground is givenbyp(S)=p∈SF(p)|S|.(1)Probabilistic superpixels have several advantages.Their shape only depends on the image,not on the foreground segmentation.Because they accumulate foreground pixels, they are therefore able to restore the original shape of the objects even if the foreground segmentation contains errors. In addition,they allow to transform any binary segmenta-tion into a probabilistic one.Further,they can also be ap-plied to non-binary inputs and extended to multiple classes. Finally,probabilistic superpixelsfit very well into proba-bilistic frameworks,like Markov randomfields,as we will show now.2.2.Markov randomfieldsSuperpixels usually provide an over-segmentation of the image,and in general foreground objects consist of more than one superpixel.Assuming that nearby superpixels which are similar in appearance also jointly belong to either foreground or background,we make use of neighborhood relationships between superpixels in order to improve the segmentation.Let us denote the neighborhood of superpixel S as N S.Note that because of the lattice structure of the su-perpixels,and considering vertical,horizontal and diagonal connections,each superpixel has exactly eight neighbors. Let N={(S,T)|S∈S,T∈N S,id(S)<id(T)}be the ordered set of all neighboring superpixels in the image.Following[7],we define an energy functionE(f S))=S∈SU S(f S)+(S,T)∈NV ST(f S,f T)(2)over the domain of all foreground/background-labelings f S∈{0,1}|S|.The segmentation problem can now be seenas an energy minimization problem,where we seek the f∗S that minimizes E.The unary term U S(f S)captures the likelihood of super-pixel S belonging to either foreground or background,and we define it asU S(f S)=−ln(σ(p(S),θ))f S−ln(1−σ(p(S),θ))(1−f p),(3) where p(S)is given by Eq.(1),andσ(p(S),θ)=min(1,p(S)·θ0.5)is a linear mapping function of the super-pixel probability.The relationship between two neighboring superpixels S and T is modeled by a pairwise termV ST(f S,f T)=|f S−f T|(λ1+λ2·e−β||µS−µT||).(4) It consists of the Ising priorλ1|f S−f T|and an edge-sensitive term,which gives a higher penalty if the mean col-orsµS andµT(computed from the original input image)of neighboring superpixels S and T are not similar.Parameter βmodels the expected color differences and is estimated for each image asβ=(2∗E(||µS−µT||))−1.Wefind the labeling for the superpixels by minimizing E,i.e.f∗=arg min f E,which can be done efficiently via graph cuts[7].Since the number of superpixels is signifi-cantly lower than the number of original pixels(by a factor of∼25in our experiments),the graph cut computation does not present a computational bottleneck.3.Experimental setupOur post-processing framework works on existing fore-ground segmentations.To cover a wide range of algo-rithms,we used pre-computed results of benchmark al-gorithms made available in the Change Detection2012(a)Original(b)ImprovedFigure2.Original and improved segmentation(”canoe”).Our ap-proach reduces noise andfills holes,while maintaining the shape of the objects due to the superpixels.dataset[1].The algorithms are Euclidean distance[6] (EUCL),GMM KaewTraKulPong[9](GMM Kaew), GMM Stauffer and Grimson[16](GMM Stau),GMM Zivkovic[17](GMM Zivk),KDE[8],Mahalanobis dis-tance[6](MAHA),SOBS[11],and ViBe[4].Please note that these results were already post-processed with a5x5medianfilter.We tried to improve them further by applying common post-processing meth-ods(e.g.various combinations of morphologic operators). While this led to an improvement of some performance measures(e.g.precision),it was at the cost of decreasing other performance measures(e.g.recall).We will show in Section4that our method simultaneously improves all per-formance measure of all benchmark algorithms.We applied our post-processing framework to all cate-gories and evaluated the results with the tools made avail-able by[1].For evaluation,we used the performance mea-sures used in the Change Detection2012challenge[1]:re-call,specificity,false positive rate(FPR),false negative rate (FNR),percentage of bad classification(PBC),F-measure, and precision.Wefirst segmented each input frame into su-perpixels.Then,based on the existing foreground segmen-tation,we computed the probabilistic superpixels which were then used as input for the MRF.The result of the MRF is thefinal post-processed segmentation that we used in the evaluation.For the superpixel segmentation,we chose a superpixel size of25pixels to capture smaller objects and compactness parameterα=0.9as recommended in[15]. For the MRF,we choseλ1=0.3,λ2=3,andθ=0.35. The parameters were kept constant for all benchmark algo-rithms and for all categories.4.EvaluationTable1shows the summary results over all categories for each benchmark algorithm and for the improved seg-mentation with the proposed framework.The proposed PSP-MRF achieves the best results for all algorithms and in all performance measures.It equally improves recall and precision which also leads to an improvement for the F-measure.Both the false positive and false negative rates are reduced and consequently also the PBC,while the speci-ficity is slightly improved.Note that the span of some of the performance measures is quite narrow,e.g.the span of the F-measure is only0.12over all benchmark algorithms.A closer analysis(Table2)of the improvements shows that the categories”dynamic background”and”camera jit-ter”benefit the most which is due to the neighboring re-lations modeled by the MRFs.The”shadow”category also benefits strongly from our improvements mainly due to an increased recall rate without a loss in precision.The ”baseline”category benefits the least from the PSP-MRFs, mainly because the results were already very good due to easy video sequences.Due to space limitations,the detailed results of all algorithms are available for download1.Figures1,2,and3show qualitative examples of the ef-fect of PSP-MRFs.Noisy regions can be reduced even if they form relatively large segments.Holes in the segmenta-tion are also closed.One of the biggest benefits,however,is the fact that the shape of the objects is maintained after pre-processing because the superpixels capture the meaningful object boundaries in the images.In our prototype implementation,this post-processing framework achieves up to ten frames per second.There are many aspects that can be parallelized,and we are confident that with careful optimization a true real-time implementa-tion is possible.5.ConclusionWe proposed a novel post-processing framework to im-prove foreground segmentations based on Probabilistic Su-perpixel Markov Random Fields.We evaluated our method on all categories of the Change Detection2012dataset for eight benchmark algorithms and showed continuously bet-ter results for all performance measures.In future work,we want to further investigate the effects of the underlying superpixel segmentation and incorporate temporal relationships between frames into the Markov ran-domfield.6.AcknowledgmentsThis work was partially supported by the FhG Inter-nal Programs under Grant No.692026and by the German Federal Ministry of Education and Research(BMBF)un-der Contract No.01ISO9052E.The views expressed herein are the authors responsibility and do not necessarily reflect those of BMBF.1/projects/pspmrfAlgorithm Recall Specificity FPR FNR PBC F-Measure Precision Eucl[6]0.70480.96920.03080.01694.34650.61110.6223Eucl[6]+PSP-MRF0.72520.96990.03010.01624.18210.63500.6509GMM Kaew[9]0.50720.99470.00530.02913.10510.59040.8228GMM Kaew[9]+PSP-MRF0.52210.99480.00520.02843.02520.60430.8359GMM Stau[16]0.71080.98600.01400.02023.10460.66230.7012GMM Stau[16]+PSP-MRF0.73020.98690.01310.01962.96230.67790.7202GMM Ziko[17]0.69640.98450.01550.01933.15040.65960.7079GMM Ziko[17]+PSP-MRF0.72130.98550.01450.01812.95200.68210.7284KDE[8]0.74420.97570.02430.01383.46020.67190.6843KDE[8]+PSP-MRF0.76550.97600.02400.01283.34190.68990.7039Maha[6]0.76070.95990.04010.01104.66310.62590.6040Maha[6]+PSP-MRF0.77840.96020.03980.01004.53650.65240.6395SOBS[11]0.78820.98180.01820.00942.56420.71590.7179SOBS[11]+PSP-MRF0.80370.98300.01700.00892.39370.73720.7512ViBe[4]0.68210.98300.01700.01763.11780.66830.7357ViBe[4]+PSP-MRF0.71130.98370.01630.01612.90180.70060.7733 Table1.Evaluation results for eight benchmark algorithms averaged over all categories of the Change Detection2012dataset.Each double row shows the results of the original segmentation algorithm and the improved segmentation generated by the proposed Probabilistic Superpixel Markov Random Field(PSP-MRF).Category Recall Specificity FPR FNR PBC F-Measure Precision Baseline0.91930.99800.00200.00260.43320.92510.9313 Baseline(PSP-MRF)0.93190.99780.00220.00210.41270.92890.9261 Dyn.Back.0.87980.98430.01570.00091.63670.64390.5856 Dyn.Back.(PSP-MRF)0.89550.98590.01410.00061.45140.69600.6576 Camera Jitter0.80070.97870.02130.00752.74790.708600.6399 Camera Jitter(PSP-MRF)0.82110.98250.01750.00642.27810.75020.7009 Interm.Object Motion0.70570.95070.04930.01836.13240.56280.5531 Interm.Object Motion(PSP-MRF)0.70100.95300.04700.02006.05940.56450.5727 Shadow0.83550.98360.01640.00832.33180.77170.7219 Shadow(PSP-MRF)0.87360.98290.01710.00672.24140.79070.7281 Thermal0.58880.99560.00440.01882.09830.68340.8754 Thermal(PSP-MRF)0.59910.99620.00380.01731.91890.69320.9218 Table2.Detailed evaluation of the SOBS algorithm for all categories of the Change Detection2012dataset.Results for other algorithms will be made available.Each double row shows the results of SOBS and the improved segmentation generated by the proposed Probabilistic Superpixel Markov Random Field(PSP-MRF).References[1]/.In Workshop on ChangeDetection2012.[2]R.Achanta,A.Shaji,K.Smith,A.Lucchi,P.Fua,andS.S¨u sstrunk.SLIC Superpixels.Technical report,´Ecole Polytechnique F´e d´e rale de Lausanne,2010.[3] A.Ayvaci and S.Soatto.Motion Segmentation with Occlu-sions on the Superpixel Graph.In Workshop on Dynamical Vision,pages727–734,2009.[4]O.Barnich and M.Van Droogenbroeck.ViBe:A univer-sal background subtraction algorithm for video sequences.Transactions on Image Processing,20(6):1709–1724,June 2011.[5]Y.Benezeth,P.Jodoin,B.Emile,urent,and C.Rosen-berger.Review and evaluation of commonly-implemented background subtraction algorithms.In ICPR,2008.[6]Y.Benezeth,P.-M.Jodoin, B.Emile,urent,andparative study of background subtrac-tion algorithms.Journal of Electronic Imaging,19(3):1–12, 2010.[7]Y.Boykov and G.Funka-Lea.Graph Cuts and Efficient N-DImage Segmentation.IJCV,70(2):109–131,2006.[8] A.Elgammal,D.Harwood,and L.Davis.Non-parametric(a)Input (b)Segmentation (c)Superpixels (d)Probabilisticsuperpixels (e)Final segmentationFigure 3.Additional examples of the processing pipeline.Each row shows one example from each category.The categories (and videos)are from top to bottom:baseline (highway),camera jitter (boulevard),dynamic background (fountain01),intermittent object motion (sofa),shadow (cubicle),and thermal (library).The columns are identical to Figure 1:input image,input segmentation based on SOBS,superpixel segmentation,probabilistic superpixels,and final segmentation.Model for Background Subtraction.In ECCV ,2000.[9]P.KaewTraKulPong and R.Bowden.An Improved Adap-tive Background Mixture Model for Real-time Tracking with Shadow Detection.In European Workshop on Advanced Video Based Surveillance Systems ,pages 1–5,2001.[10]L.Lu and G.Hager.Dynamic Foreground /Background Extraction from Images and Videos using Random Patches.In NIPS ,pages 929–936,2006.[11]L.Maddalena and A.Petrosino.A Self-Organizing Approach to Background Subtraction for Visual Surveil-lance Applications.Transactions on Image Processing ,17(7):1168–1177,July 2008.[12]A.P.Moore,S.J.D.Prince,J.Warrell,U.Mohammed,and G.Jones.Superpixel Lattices.In CVPR ,2008.[13] D.H.Parks and S.S.Fels.Evaluation of BackgroundSubtraction Algorithms with Post-processing.In Advanced Video and Signal Based Surveillance ,2008.[14]M.Piccardi.Background subtraction techniques:a review.In Systems,Man and Cybernetics ,pages 3099–3104,2004.[15] A.Schick,M.Fischer,and R.Stiefelhagen.Measuring andEvaluating the Compactness of Superpixels.Manuscript sub-mitted for publication.2012.[16] C.Stauffer and W.Grimson.Adaptive background mixturemodels for real-time tracking.In CVPR ,1999.[17]Z.Zivkovic and A.S.Group.Improved Adaptive GaussianMixture Model for Background Subtraction.In ICPR ,2004.。