ABSTRACT Tracking Mean Shift Clustered Point Clouds for 3D
基于MeanShift的织物图像分割算法
第28卷 第10期2007年10月纺 织 学 报Journal of Textile Research Vol.28 No.10Oct. 2007文章编号:0253-9721(2007)10-0108-05基于Mean Shift 的织物图像分割算法诸葛振荣1,徐敏1,刘洋飞2(11浙江大学电气工程学院,浙江杭州 310027;21浙江大学经纬计算机系统工程有限公司,浙江杭州 310027)摘 要 首先介绍了Mean Shift 算法的处理过程,然后以处理织物图像分割为研究对象,将扩展形式的Mean shi ft 算法用于解决织物图像分割问题。
新提出的织物图像分割算法包括2个步骤:Mean Shift 图像滤波和Mean Shift 图像分割,并介绍了各自的原理。
分割效果有3个关键参数控制:空域带宽、色度域带宽和最小区域限制。
实验结果给出了3个参数的影响和选取分析,最后给出该织物图像分割算法与原C AD 处理效果比较图,表明该算法具有可行性、有效性和鲁棒性。
关键词 均值转移;织物图像;图像滤波;图像分割中图分类号:TS391 文献标识码:AFabric image segmentation algorithm based on Mean ShiftZHUGE Zhenrong 1,X U Min 1,LI U Yangfei2(11College o f Electrical Engineering ,Zhejian g University ,Hangzhou ,Zhejiang 310027,China ;21Com pute r System En gineering Co .,Ltd .,Zhejiang Unive rsity ,H angzhou ,Zhe j iang 310027,China )Abstract Firstly,Mean Shift procedure was introduced.And then targeting at the researc h on fabric ima ge segmenta tion,the extended Mean shift algorithm was used to try to solve the issue.A ne wly proposed fabric image segmentation algorithm involved two steps:Mean Shift image filtering and image se gmentation,and their principles were e xplained respectively.The segmentation results were controlled by three important parameters,i.e.spatial bandwidth,color bandwidth and minimum region,whose influence,selection and analysis were demonstrated.Finally,the fabric image segmentation algorithm and the graph for comparison showing the previous CAD processed effec t were given,which showed the feasibility,efficiency and robustness of that algorithm in fabric image processing.Key words Mean Shift;fabric image;image filtering;image segmentation 收稿日期:2007-01-23 修回日期:2007-04-24作者简介:诸葛振荣(1948)),男,副教授。
均值偏移算法
均值偏移算法1. 概述均值偏移算法(Mean Shift algorithm)是一种非参数化的聚类算法,能够对数据进行自动的聚类分析。
该算法的原理基于密度估计和梯度下降,通过不断迭代的方式找到数据的局部最大密度区域,从而将数据划分成不同的簇。
均值偏移算法在计算机视觉、图像处理以及模式识别等领域都有应用。
2. 均值偏移原理均值偏移算法的核心思想是通过计算样本点与其邻近点之间的距离来寻找密度最大的区域,并将距离小于某个阈值的样本点划分到同一簇中。
具体步骤如下:2.1. 初始化首先,需要初始化聚类中心点。
可以选择从样本中随机选择一些点作为初始的聚类中心。
2.2. 密度估计对于每个聚类中心点,计算它与所有样本点的距离。
可以使用欧氏距离或者其他距离度量方法。
根据距离大小,将样本点按照距离聚类中心点的远近分别分为不同的簇。
2.3. 均值迁移对于每个样本点,计算它距离所属簇其他样本点的中心的平均距离,并将该样本点迁移到距离最近的中心。
重复这一过程,直到所有样本点不再发生迁移。
2.4. 聚类结果经过迭代的均值偏移过程之后,每个样本点都会被分配到一个簇中。
最终的聚类结果即为各个样本点所属的簇。
3. 实例详解下面通过一个实例来详细解释均值偏移算法的工作原理。
假设我们有一组二维数据点,如下所示:[(1, 2), (1.5, 1.8), (5, 8), (8, 8), (1, 0.6), (9, 11)]3.1. 初始化我们随机选择两个点作为初始的聚类中心,比如选择(1, 2)和(9, 11)。
3.2. 密度估计计算每个样本点与聚类中心的距离:距离中心点(1, 2)的距离:[0, 0.2828, 10.8167, 10.8167, 1.412, 13.6015]距离中心点(9, 11)的距离:[13.6015, 13.193, 4.4721, 1.4142, 11.6619, 0]根据距离的大小,可以将数据分为两个初始簇:初始簇1: [(1, 2), (1.5, 1.8), (1, 0.6)]初始簇2: [(5, 8), (8, 8), (9, 11)]3.3. 均值迁移对于簇1中的样本点(1, 2),计算其与其他样本点的平均距离,并将其迁移到距离最近的中心点。
基于特征匹配预估的Mean Shift运动检测与跟踪
摘 要 :针对传统 Mean Shift跟踪算法在 目标 存在背景干扰 或遇到遮挡时 ,目标 跟踪不 准确 的问题 ,提 出 了一种基于特征匹配运动检测 预估 的 Mean Shif t跟踪方法 。采用 Harris算法提 取跟踪 目标特征 点进行运 动定 位检测 ,通过 Kalman滤波器估计每一 帧 中 目标 迭代 的起 始位 置 ,由 Mean Shift算 法从 预估位 置开始 迭代 搜索 ,最终实现 目标 跟踪 。实验证 明:提出的算法 能够在遮 挡的情况 下对 目标进行 精准 的定位检 测 , 有效 改善了复杂条件下 的跟踪效果 ,具有较好 的鲁棒 性 。 关键 词 :Mean Shift跟踪 ;运动检测 ;Harris算 法 ;Kalman滤波 ;迭代搜 索 中图分类号 :TP391 文献标识码 :A 文章编号 :1000-9787(2018)07-0135-03
2018年 第 37卷 第 7期
传感 器与微系统 (Transducer and Microsystem Technologies)
135
DOI:10.13873/J.1000-9787(2018)07-0135--03
基 于 特 征 匹 配预 估 的 Mean Shif t运 动 检 测 与 跟 踪
Mean Shift算 法 ,改进后 的跟踪算法有效减少 了 目标搜 索时 迭代次 数 ,能够精 确检测和快速跟踪运动 目标 ,提高 了 目标 跟踪性 能的稳定性 。 1 传 统 Mean Shif t 运 动 跟 踪 不 足
传 统 Mean Shift_4 跟 踪算 法通 过分 析计 算选 定跟 踪 目标在特征空 间中每个特 征值概 率密 度来建 立 目标 模 型 , 在 进行算法迭代 时是 把前 一 帧搜索 匹配 的最 优位 置 y。作 为当前帧窗 口迭代 函数的起始 位置 Y。,若 前后两 次迭代 距 离 ll Y 一y。ll< ,停 止迭 代搜 索。 由于传统 Mean Shift缺 乏对 目标最优位 置 的预测 ,当背 景颜 色和 目标 颜色极 为 相似或者 目标 出现大 比例的遮挡 时会造成 目标搜索结果不 精确 ,导 致迭代次数增加而致使 跟踪 目标失败 。 2 基 于 特 征 匹配 预 估 的 M ean Sif tft
mean-shift算法公式
mean-shift算法公式Mean-shift算法是一种无参聚类算法,常用于图像分割、目标跟踪和模式识别等领域。
本文将详细介绍mean-shift算法的原理、公式和实际应用场景。
一、原理Mean-shift算法的核心思想是密度估计和质心漂移。
它基于高斯核函数,通过不断更新质心,最终将数据点分为不同的簇。
具体而言,我们要对每个数据点x_i进行密度估计,将其周围的点加权后求和得到密度估计值f(x_i)。
给定一个初始质心x_c,我们通过以下公式计算新质心x_c’:x_c' = \frac{\sum_{x_i \in B(x_c,r)} w(x_i) \times x_i}{\sum_{x_i \in B(x_c,r)} w(x_i)}B(x_c,r)表示以x_c为圆心,半径为r的区域,w(x_i)为高斯权重系数,可以写作w(x_i) = e ^ {-\frac{(x_i - x_c)^2}{2 \times \sigma^2}}\sigma是高斯核函数的标准差,控制窗口大小和权重降低的速度。
在计算新质心后,我们将其移动到新位置,即x_c = x_c’,然后重复以上步骤,直到质心不再改变或者达到预定的迭代次数为止。
最终,所有距离相近的数据点被归为同一簇。
算法的时间复杂度为O(nr^2),其中n为数据点数量,r为窗口半径。
可以通过调整r和\sigma来平衡速度和准确率。
二、公式1. 高斯核函数w(x_i) = e ^ {-\frac{(x_i - x_c)^2}{2 \times \sigma^2}}其中x_i和x_c是数据点和质心的位置向量,\sigma是高斯核函数的标准差。
该函数表示距离越大的数据点的权重越小,与质心距离越近的数据点的权重越大,因此可以有效估计密度。
2. 新质心计算公式x_c' = \frac{\sum_{x_i \in B(x_c,r)} w(x_i) \times x_i}{\sum_{x_i \in B(x_c,r)} w(x_i)}B(x_c,r)表示以x_c为圆心,半径为r的区域,w(x_i)为高斯权重系数。
基于Mean-Shift的目标跟踪算法
Ab t a t Si c e n Shita go ihm a o os d,i s de l pe n ppl d wi e y.Butt r d ton l s r c : n e M a — f l rt h s pr p e ti veo d a d a i dl e he t a ii a a g rt l o ihm sou e t h a g twho e s e d i a i n i ei ha i g.To s l he fr tpr blm , i tofus o t e t r e s p e s r p d a d sz s c ng n o vet is o e
摘 要 : a— hf算 法 自从 提 出 以来 得 到 较 广 泛 的 应 用 和 发 展 , 是 传 统 的 MS算 法 对 于 在 视 频 中大 小 变 换 的和 移 动速 度 非 常 快 的 目标 , MenS i t 但 将
不 能 进 行 准确 的 跟踪 。 为 了解 决 这 个 问题 , 已有 将 卡 尔 曼 预 测 和 MS相 融 合 进 行 预 测 跟 踪 的 算 法 。但 是 当 物 体 在 视 频 中 的 尺 寸 变 化 的 时候 , 由 于不 变 的核 带 宽 , 产 生 由背 景 产 生 的 误 差 , 样 导 致 跟 踪 的 失 败 。 为 此 , 文 提 出一 种 在 卡 尔 曼 预 测 和 MS相 结 合 的基 础 上 , 宽 自适 应 会 同 本 带
目标尺度自适应的Mean Shift跟踪算法
20 1 1年 2月
兵
工
学
报
Vo _ 2 I 3 NO. 2
Fe b. 201 1
ACA ARM AM ENTARI I
目标 尺度 自适应 的 MenS i 跟踪 算 法 a hf t
康一梅。 ,谢 晚 冬 ,胡 江 ,黄 琪 。
0 引 言
运 动 目标 跟踪 是计算 机视觉 领域 的一个 重要 的 研 究课题 , 广泛 应 用 于交 通 控制 、 工 智 能 、 人 军事 制 导 等各个 方面 。MenS i 算 法 最早 是 作 为一 种 无 a hf t
参 密 度 估 计 算 法 于 17 年 由 F k ng 95 u u aa首 先 提
Key wor s: atfca n tlie c d ri ili e l n e;m e n s i ; tr e r c i g; afn tu t r s ae a a tto i g a hf t ag tta k n fi e sr c u e; c l d pa in
两 帧 之 间 目标 的 仿 射 变 换 矩 阵 , 利 用 该 矩 阵对 目标 位 置 和 大 小 进 行 修 正 。 实 验 表 明 改 进 算 法 有 并
效 的提 高 了 MenS i 算 法在 目标尺度 变化 时 的跟踪 稳定性 , 目标 的尺 度 变化具 有适应 性。 a hf t 对 关键 词 :人工 智 能;M a h ;目标跟 踪 ; 射 变换 ;尺度 自适 应 e nS i t f 仿 中图分类 号 : P 9 T31 文献标 志码 : A 文章 编号 : 0 019 ( 0 1 0 -2 0 7 10 — 3 2 l ) 20 1 - 0 0
MeanShift
§5-1Mean Shift 算法Mean Shift 算法是由Fukunaga 和Hosteler 于1975年提出的一种无监督聚类方法[109],Mean Shift 的含义是均值偏移向量,它使每一个点“漂移”到密度函数的局部极大值点。
但是再提出之初,Mean Shift 算法并没有得到广泛的重视,直到1995年,Cheng 等人对该算法进行了进一步的研究[110],提出了一般的表达形式并定义了一族核函数,从而扩展了该算法的应用领域,此后Mean Shift 算法逐步得到了人们的重视。
目前,Mean Shift 算法已广泛应用于目标跟踪[111~114]、图像分割与平滑[115~118]等领域,同时由于该算法具有简洁、能够处理目标变形等优点,也是目前目标跟踪领域的一个重要研究热点。
5-1-1 Mean Shift 算法原理Mean Shift 算法是一种基于密度梯度的无参数估计方法,从空间任意一点,沿核密度的梯度上升方向,以自适应的步长进行搜索,最终可以收敛于核密度估计函数的局部极大值处。
基本的Mean Shift 算法可以描述为:设{}()1,,i x i n = 为d 维空间R d 中含有n 个样本点的集合,在点x 处的均值偏移向量的基本形式可以由式(5.1)表示:1()()hh ix S M x xx k∈=-∑ (5.1)其中,S h 是R d 中满足式(5.2)的所有y 点集合,其形状为一个半径为h 的高维球区域。
k 为所有n 个样本点中属于高维球区域的点的数目。
(x i -x )为样本点相对于点x 的偏移向量。
根据式(5.1)的定义可知,点x 的均值偏移向量就是所有属于S h 区域中的样本点与点x 的偏移向量均值,而S h 区域中的样本点大多数是沿着概率密度梯度的方向,所以均值漂移向量的方向与概率密度梯度方向一致,图5.1为具体的示意图。
{}2():()()Th S x y y x y x h=--≤ (5.2)图5.1 Mean Shift 示意图 Fig.5.1 Mean Shift sketch map根据式(5.1)和图5.1可以看出,所有属于区域S h 中的样本点对于点x 的均值漂移向量贡献度相同,而与这些点与点x 间的距离无关。
mean-shift算法matlab代码
一、介绍Mean-shift算法Mean-shift算法是一种基于密度估计的非参数聚类算法,它可以根据数据点的密度分布自动寻找最优的聚类中心。
该算法最早由Dorin Comaniciu和Peter Meer在1999年提出,并被广泛应用于图像分割、目标跟踪等领域。
其原理是通过不断地将数据点向局部密度最大的方向移动,直到达到局部密度的最大值点,即收敛到聚类中心。
二、 Mean-shift算法的优势1. 无需事先确定聚类数量:Mean-shift算法不需要事先确定聚类数量,能够根据数据点的密度自动确定聚类数量。
2. 对初始值不敏感:Mean-shift算法对初始值不敏感,能够自动找到全局最优的聚类中心。
3. 适用于高维数据:Mean-shift算法在高维数据中仍然能够有效地进行聚类。
三、 Mean-shift算法的实现步骤1. 初始化:选择每个数据点作为初始的聚类中心。
2. 计算密度:对于每个数据点,计算其密度,并将其向密度增加的方向移动。
3. 更新聚类中心:不断重复步骤2,直至收敛到局部密度的最大值点,得到最终的聚类中心。
四、 Mean-shift算法的Matlab代码实现以下是一个简单的Matlab代码实现Mean-shift算法的示例:```matlab数据初始化X = randn(500, 2); 生成500个二维随机数据点Mean-shift算法bandwidth = 1; 设置带宽参数ms = MeanShift(X, bandwidth); 初始化Mean-shift对象[clustCent, memberships] = ms.cluster(); 执行聚类聚类结果可视化figure;scatter(X(:,1), X(:,2), 10, memberships, 'filled');hold on;plot(clustCent(:,1), clustCent(:,2), 'kx', 'MarkerSize',15,'LineWidth',3);title('Mean-shift聚类结果');```在代码中,我们首先初始化500个二维随机数据点X,然后设置带宽参数并初始化Mean-shift对象。
meanshift聚类算法的原理和特点
meanshift聚类算法的原理和特点
Mean Shift算法是一种非参数的统计方法,主要用于聚类和密度估计。
其基本原理是通过迭代的方式找到最终的聚类中心,即对每一个样本点计算其漂移均值,以计算出来的漂移均值作为新的起始点,重复以上的步骤,直到满足终止的条件,得到的最终的均值漂移点即为最终的聚类中心。
Mean Shift算法的特点如下:
1. 无需预先设定聚类数目:Mean Shift算法能够根据数据的分布自动进行聚类,无需预先设定聚类的数目。
2. 适用于任意形状的聚类:Mean Shift算法对聚类的形状没有特别的要求,可以适用于任意形状的聚类。
3. 对数据规模和分布不敏感:Mean Shift算法对数据规模和分布不敏感,可以在不同的数据规模和分布下进行聚类。
4. 适合处理大规模数据集:Mean Shift算法采用核函数来计算样本之间的相似度,可以在大规模数据集上进行快速聚类。
5. 可视化效果好:Mean Shift算法可以通过颜色来标记不同的聚类,使得聚类的结果更加直观和易于理解。
然而,Mean Shift算法也存在一些不足之处,例如对于高维数据的处理能力有限,容易受到噪声和异常值的影响等。
因此,在实际应用中,需要根据具体的数据和任务特点选择合适的聚类算法。
聚类算法英文专业术语
聚类算法英文专业术语1. 聚类 (Clustering)2. 距离度量 (Distance Metric)3. 相似度度量 (Similarity Metric)4. 皮尔逊相关系数 (Pearson Correlation Coefficient)5. 欧几里得距离 (Euclidean Distance)6. 曼哈顿距离 (Manhattan Distance)7. 切比雪夫距离 (Chebyshev Distance)8. 余弦相似度 (Cosine Similarity)9. 层次聚类 (Hierarchical Clustering)10. 分层聚类 (Divisive Clustering)11. 凝聚聚类 (Agglomerative Clustering)12. K均值聚类 (K-Means Clustering)13. 高斯混合模型聚类 (Gaussian Mixture Model Clustering)14. 密度聚类 (Density-Based Clustering)15. DBSCAN (Density-Based Spatial Clustering of Applications with Noise)16. OPTICS (Ordering Points To Identify the Clustering Structure)17. Mean Shift18. 聚类评估指标 (Clustering Evaluation Metrics)19. 轮廓系数 (Silhouette Coefficient)20. Calinski-Harabasz指数 (Calinski-Harabasz Index)21. Davies-Bouldin指数 (Davies-Bouldin Index)22. 聚类中心 (Cluster Center)23. 聚类半径 (Cluster Radius)24. 噪声点 (Noise Point)25. 簇内差异 (Within-Cluster Variation)26. 簇间差异 (Between-Cluster Variation)。
目标跟踪meanshift
Bhattacharyya 系数
q q1 , , qm
p y p1 y, , pm y
q
1
y
1
p y
f y cosy
p yT q p y q
m
u 1
• k表示在这n个样本点 中,有k个点落入Sh 区域中.
Mean Shift示意图
直观描述
感兴趣区域 质心
目的:找出最密集的区域 完全相同的桌球分布
Mean Shift 矢量
直观描述
感兴趣区域
质心
目的:找出最密集的区域 完全相同的桌球分布
Mean Shift 矢量
直观描述
感兴趣区域 质心
Objective : Find the densest region Distribution of identical billiard balls
数据最密集的地方,对应于概率密度最大的地方。我们可 以对概率密度求梯度,梯度的方向就是概率密度增加最大 的方向,从而也就是数据最密集的方向。
令
,假设除了有限个点,轮廓函数 的梯度对所
有
均存在 。将 作为轮廓函数,核函数 为:
fh,K
x
2ck ,d nhd 2
n i 1
目标跟踪——meanshift
Meanshift背景
Mean Shift 这个概念最早是由Fukunaga 于1975 年在一篇关于概率密 度梯度函数的估计中提出来的,其最初含义正如其名,就是偏移的均 值向量,在这里Mean Shift 是一个名词,它指代的是一个向量,但随着 Mean Shift 理论的发展,Mean Shift 的含义也发生了变化,如果我们说 Mean Shift 算法,一般是指一个迭代的步骤,即先算出当前点的偏移 均值,移动该点到其偏移均值,然后以此为新的起始点,继续移动,直到 满足一定的条件结束.
基于Mean Shift算法的运动目标跟踪方法研究
基于Mean Shift算法的运动目标跟踪方法研究摘要研究了运动视频图像的非参数密度估计及目标跟踪方法,引出了Mean Shift算法理论。
将Mean Shift算法应用于跟踪系统中,在运动目标检测的基础上,利用特征空间中特征值的概率密度描述目标模型与候选模型,并对目标进行定位。
实验结果证明了基于Mean Shift算法的目标跟踪能准确跟踪目标,该算法可收敛到局部极值点,具有良好的跟踪效果。
引言Mean Shift目标跟踪算法采用核概率密度来描述目标的特征,基于Bhattacharyya系数用Mean Shift进行迭代搜索,最终收敛的位置即为目标中心。
对于尺寸变化不大的目标,其跟踪效果主要取决于用于计算Bhattacharyya系数的目标模板以及候选目标特征选取机制。
目标特征的选取原则是要能够最大限度地描述目标本身独有的性质,它决定了Bhattacharyya系数对Mean Shift迭代的收敛速度和收敛位置。
常用的方法有图像(灰度或者彩色图像)直方图,也有一些国内学者采用方向直方图,还有结合目标的其他特征,如目标中心加权距离、目标梯度图像、标准差图像等,在一定程度上增强了跟踪的鲁棒性。
但是用这些方法得到的直方图中,除了目标本身的灰度以外,还包含了大量的背景成分,背景发生变化将会导致候选目标区域和模板直方图的最佳匹配位置与实际目标中心有所偏差,对跟踪快速运动的目标,系统将会变得不稳定,最终导致目标丢失。
文献提出采用背景加权的方法计算目标模板与候选区域特征直方图,试图消除背景影响,但由于目标窗中的部分背景在扩大窗中可能并不存在,因此这种方法亦无法完全消除背景,实际的跟踪效果并没有显著提高。
本文将Mean Shift算法用于目标跟踪中,首先要初始化被跟踪目标所在区域,采用运动目标检测出来的结果作为初始目标。
然后对目标区域的所有像素点计算特征空间中每个特征值的概率。
利用相似性函数度量初始帧目标模型和当前帧候选模型的相似性,在计算相似函数最大值时可得到目标的运动方向,通过不断的迭代计算,在当前帧中,目标会收敛到运动目标的真实位置,从而实现跟踪。
基于改进Mean Shift算法的细胞追踪方法研究
胞 的 图像序 列 因其 准 确 率 高 、 于 软 件 化 和 硬件 集 易 成, 逐渐 替代 人工 的方 法 成 为对 细 胞 追 踪 的主要 手
段。
现 有 的 图像 追 踪 方 法 , 主 动 轮 廓 法 , 平 有 水
集 ( ee s ) 均值平移 ( enS i) Lvl e , t M a h 等几种方 t f 法。主动轮廓法是一种模型调整方法 , 这种方法 能 在分割 目 标的同时实现 目标 的追踪 , 但算法对参数 设置敏感 , 且要求相邻 两帧 的细胞 区域部分重叠 。
21 0 1年第8 期
中图分 类号 :P 9 . T 3 14 文献标识码 : A 文章编号 :0 9—2 5 (0 1 0 0 9 0 10 5 2 2 1 )8— 04— 4
基 于 改进 MenSi 算 法 的细 胞 追 踪 方 法研 究 a hf t
师 扬 ,王 浩
( 哈尔滨工程 大学 信息与通信工程学院 ,哈尔滨 10 0 ) 5 0 1
尔曼滤波器的输入参数。实验结果表 明,对于细胞图像 的追踪 ,该方法较经典 M a h 算法有 enS i t f 着更 高的 准确 率。
关 键词 :细胞 追 踪 ;MenSi;卡 尔曼 滤 波器 a hf t
Tr c i g m e h d f r c l a e n i p o e e n- h f l o ih a k n t o o el b s d o m r v d M a S ita g rt m s
摘
基于mean-shift的运动目标跟踪算法应用
基于mean-shift的运动目标跟踪算法应用
李敏
【期刊名称】《西南民族大学学报(自然科学版)》
【年(卷),期】2013(039)003
【摘要】先了解了Mean Shift的发展历程,采用了密度梯度,均值漂移,联合空间范围(值)域的灰度和彩色图像用于保持不连续性滤波和图像分割.Mean shift的属性被重新考虑证明其在收敛区域内.建议的滤波方法与图像中的每一个像素点最接近的局部模块中的密度分布的联合域相关联.在获得图像区域时只需要多加一个步骤将区域分割成一个分段固定结构,再融合本地区域与附近的模块.在提出的这个方法中有两个参数需要控制,分别是空间的分辨率和空间的范围域.由于已经保证了图像的收敛性,因此该技术不需要用户干预停止过滤所需图像质量.
【总页数】5页(P477-481)
【作者】李敏
【作者单位】重庆广播电视大学电子信息工程学院,重庆400052
【正文语种】中文
【中图分类】TP391.41
【相关文献】
1.基于稀疏差分和Mean-Shift滤波的Retinex算法在人脸识别中的应用 [J], 陈莉;龙光利
2.基于Mean-Shift算法的粒子滤波器在目标跟踪中的应用 [J], 杨波
3.基于Mean-shift的粒子滤波算法在遮挡目标跟踪中的应用 [J], 李睿;刘涛;李明
4.基于一种双目视觉系统的运动目标跟踪算法研究与应用 [J], 付强
5.基于联合粒子滤波和Mean-Shift的运动目标跟踪算法 [J], 杨佳
因版权原因,仅展示原文概要,查看原文内容请购买。
机器学习技术中的密度聚类算法详解
机器学习技术中的密度聚类算法详解密度聚类算法是机器学习领域中常用的聚类方法之一,它的主要思想是根据数据点在特征空间中的密度来进行聚类。
不同于传统的基于距离的聚类方法,密度聚类算法能够自动识别出不同形状和大小的聚类簇,因此在处理复杂的数据集时具有优势。
密度聚类算法最早是由Ester等人在1996年提出的,其中最经典的方法是DBSCAN(Density-Based Spatial Clustering of Applications with Noise)。
DBSCAN算法通过定义数据点的密度来进行聚类,它将数据点分为核心点、边界点和噪音点三类,从而实现聚类的效果。
DBSCAN算法的核心思想在于定义了两个重要的参数:邻域半径(ε)和邻居数目(MinPts)。
对于每个数据点,如果其ε半径内的邻居数目不小于MinPts,则该点被认为是核心点。
如果一个数据点位于核心点的ε邻域内但不是核心点,则被视为边界点。
而位于任何核心点ε邻域之外的数据点则被视为噪音点。
DBSCAN算法的聚类过程从任意一个未访问的数据点开始,探索其ε邻域内的所有数据点,并递归地访问它们的ε邻域,直到没有其他未访问的数据点。
如果访问过程中发现某个数据点是核心点,则将其与访问过的核心点合并成一个簇。
最终,所有的数据点将被分配到不同的簇中。
除了DBSCAN算法,还有一些其他的密度聚类算法,如OPTICS(Ordering Points To Identify the Clustering Structure)和Mean-Shift算法。
OPTICS算法改进了DBSCAN算法的局限性,不需要事先指定邻域半径参数,而是通过计算数据点的可达距离来进行聚类。
而Mean-Shift算法则通过梯度下降的方法寻找样本点在特征空间中的密度最大值,从而进行聚类。
密度聚类算法相比于传统的K均值聚类算法具有以下优势。
首先,密度聚类能够自动识别出不同形状和大小的聚类簇,而K均值聚类往往默认聚类簇为凸形状。
mean shift.ppt
吴珍荣 S201102080
背景知识
• 均值漂移(Mean Shift)是Fukunaga等人提出的一种非参数概 率密度梯度估计算法,该方法直到Cheng的研究成果发表之后, 才受到较多的关注。此后均值漂移被广泛应用到诸多相关域, 如模式分类、图像分割以及目标跟踪等方面。在跟踪领域, Mean Shift的跟踪算法是一种以目标区域像素值的概率分布为 特征的跟踪算法。
Probability Probability
模板区域
(质心为取模板时的点)
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
1 23
.
.
.
m
color
qr qu u1..m
m
qu 1
u 1
候选区域
(质心为 y)
0.3
0.25
0.2
0.15
0.1
0.05
0
1
2
3
.
.
color
.
m
pr y pu y u1..m
m
f y pu y qu u 1
模板位置:
y0
候选区域位置:
y
f y 1 m 2 u1
pu
y0 qu
1 2
m u 1
pu
y
qu
pu y0
与y独立
pu
y
Ch
k
b( xi )u
y xi h
2
Ch
2
n
i1 wik
y xi h
2
Mean-Shift物体跟踪
相似度的最大化
2
基于集成多示例学习的Mean Shift跟踪算法
基于集成多示例学习的Mean Shift跟踪算法罗会兰;单顺勇;孔繁胜【期刊名称】《计算机辅助设计与图形学学报》【年(卷),期】2015(000)002【摘要】为了实现长时间稳定的对特定目标的跟踪,结合匹配型跟踪方法和决策型跟踪方法的优势,同时利用集成学习的思想构建多个强分类器,提出一种基于集成多示例学习的mean shift跟踪算法。
首先在上一帧中对示例进行随机采样,构建分类器的集体,通过集成学习合成最终的分类器以确定当前帧中目标的初始位置;然后对初始位置和上一帧目标最终位置的距离与设定的阈值进行判断,决定是否采用mean shift跟踪算法对初始位置进行修订,以确定目标的最终位置。
实验结果表明,该算法不但可以应对目标的形变、旋转、遮挡以及光照变化等各种复杂的情况,而且可以做到长时间的跟踪,具有较强的鲁棒性。
%An effective object tracking method was proposed by combining multiple instance learning and mean shift tracking. The motivation is to use the advantages of the generative model and discriminative model and use ensemble learning to gain more robust tracking effect. First, instances are randomly selected to train different classifiers in the previous frame, and then the final integrated classifier was trained using ensemble learning to improve the tracking accuracy. The initial position of the object was determined by using ensemble multiple instance learning. Then mean shift tracking was used to revise the initial position by comparing the distance between the initial position and the object position in the previous frame to a threshold.The experimental results has shown that the proposed algorithm has good performance in many complicated situations, e.g. pose change, rotation, occlusion and changes of illumination, and can track successfully for a long time with strong robustness.【总页数】12页(P226-237)【作者】罗会兰;单顺勇;孔繁胜【作者单位】江西理工大学信息工程学院赣州 341000;江西理工大学信息工程学院赣州 341000;浙江大学计算机科学与技术学院杭州 310027【正文语种】中文【中图分类】TP391.4【相关文献】1.基于在线多示例学习的协同训练目标跟踪算法 [J], 李飞;王从庆;周鑫;周大可2.基于多示例学习的目标跟踪算法 [J], 李娜;李大湘;刘卫华;刘颖3.基于RGB-D的在线多示例学习目标跟踪算法 [J], 高毅鹏;郑彬;曾宪华4.基于加速鲁棒特征和多示例学习的目标跟踪算法 [J], 白晓红;温静;赵雪;陈金广5.基于相关相似度的在线多示例学习目标跟踪算法 [J], 陈敏;张清华;陈晓森;陈江湖;谢思齐;陈思;因版权原因,仅展示原文概要,查看原文内容请购买。
帧差法和Mean_shift相结合的运动目标自动检测与跟踪
u= 1
!
m
pu ( y ) q ^u
( 6)
式 ( 6)在 p ^u y ^ 0 点泰勒展开可得 , 1 [ p (y ), q] ∃ 2! u= 1
m
1 p (y0 )qu + 2
!p
u= 1
m
u
(y )
qu pu (y 0 ) ( 7)
。
把式 ( 4) 带入式 ( 7) , 整理可得 , [ p ( y ), q ] ∃ 1 2 u! = 1
汤中泽 张春燕 申传家 孟 晓
( 安徽大学数学与计算科学学院 , 合肥 230039 )
摘
要
传统的 M ean sh ift算法简单快速 , 但存在半自动跟踪缺陷 , 在起始帧需要手动确定搜索窗 口来选择目 标 , 且核 窗宽固
定不变 , 不能实时地适应目标尺寸大小变 化 , 容易 跟丢目 标 。 接 合帧差 法 , 首先 通过帧 差法 检测目 标 , 并获 取目 标窗口 和中 心 , 再结合 M ean sh ift 跟踪 , 并通过设定 ^ ( y )相对改变量 r 来确定目标 模板是否需 要重新获 取 , 实 现 M ean shift算法全 自动跟 踪 , 并能适应目标尺寸大小改变的情况 。 实验表明 , 该方法跟踪准确 , 实时性高 。关键词 目标检测 帧差法 数学形态学 文献标志码 M ean sh ift 算法
[ 7, 8]
区域标记和判别
形态学处理完图像后 , 一些
小的干扰区域已经被去除 , 小的间隙和孔洞也已经 被填充 , 但是仍然会有面积相对较大的黑色孔洞存 在。这是因为引起背景变化的目标, 往往会在前后 两帧图像中有部分重叠 , 那么在变化检测时往往在 连通的白色区域之中会 产生较大的黑 色孔洞。为 了将这些较大的黑色孔洞填充, 首先计算各个连通 的黑色区域的面积, 当某一黑色区域的面积小于给 定的阈值时 , 就将该区域改 为白色区域 ; 完成上述 处理后 就可以 计算各 个连通 的白 色区 域的 面积。 当某白色区域的面积大于给定的阈值 时就认为该 区域为检测到的运动目标区域。 1 2 目标提取 当检测到当前帧中有目标出现时, 并确定了大 致的运动区域 , 但此区域中 还包含了阴影 , 阴影严 重影响了目标提取和后续的跟踪精度, 必须有效的 去除阴影。阴影检测的方法主要有两大类 : 基于阴 影特征和基于几何模型。其中, 前者通过阴影的几 何特性、 亮度和颜色来区分。基于颜色特性的方法 大多基于以下理论: 背景在阴影覆盖和无阴影覆盖 下 , 只在亮度方面有差异 , 而在色彩方面并无差异。 文献 [ 5] 提出一种基于 H SV 颜色空间的阴影检测方 法。与 RGB 颜色空间相比, H SV 更利于阴影边缘的 检测。 本文采用的阴影 检测方法将阴影 消除和边缘 检测结合在一起
Mean-Shift跟踪算法中核函数窗宽的自动选取
[2]
[1,2]
.
为了下面叙述方便,我们首先给出以下几个定义. 定义 1. 在一帧图像中,目标所在的图像区域称为目标区域,用 F 表示. F 以外的图像区域称为背景区域,用
B 表示 . 包含 F ,且面积最小的圆形区域的圆心称为目标的形心 . 称同时包含目标图像区域 F 和 B 的圆形区域 T 为跟踪窗口 . 假设 F , B 各自对应的颜色直方图中的非零子项位置不重合 , 也即目标与背景有着明显的颜色
+ Corresponding author: Phn: +86-21-62934243, Fax: +86-21-62932035, E-mail: pengningsong@
Received 2004-05-02; Accepted 2004-10-09 Peng NS, Yang J, Liu Z, Zhang FC. Automatic selection of kernel-bandwidth for Mean-Shift object tracking. Journal of Software, 2005,16(9):1542−1550. DOI: 10.1360/jos161542 Abstract: Classic Mean-Shift based tracking algorithm uses fixed kernel-bandwidth, which limits the
J ( x ) = ∑ a g ( a − x ) w( a )
(1)
其中, k 是核函数, w 是权重.在 x 处计算出的 Mean-Shift 矢量 ms 反向指向卷积曲面
(2)
的梯度方向 式匹配位置
[1,2]
, 其中 g 是 k 的影子核 . 沿着 ms 方向不断移动核函数中心位置直至收敛就可以找到邻近的模
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Tracking Mean Shift Clustered Point Clouds for3DSurveillanceMark A.Keck Jr. Dept.of Computer Scienceand EngineeringOhio State University Columbus,OH43210USA keck@James W.DavisDept.of Computer Scienceand EngineeringOhio State UniversityColumbus,OH43210USAjwdavis@cse.ohio-Ambrish TyagiDept.of Computer Scienceand EngineeringOhio State UniversityColumbus,OH43210USAtyagia@cse.ohio-ABSTRACTWe present in this paper a method of tracking multiple ob-jects(people)in3D for application in video surveillance. The tracking method is designed to work on images with objects at low resolution and has two major contributions. First we propose a way to generate3D point clouds that im-poses multiple constraints(both geometric and appearance-based)to ensure minimal noise in the3D data.Second, we incorporate a method to group the points into clouds (or clusters)that correspond to objects in the environment being imaged.We show that this method is more power-ful than current3D tracking techniques that try to fuse2D tracking information into3D tracks.A comparison to com-peting3D tracking methods are shown,and performance and limitations are discussed.Categories and Subject DescriptorsI.4.5[Image Processing and Computer Vision]:Re-construction;I.5.3[Pattern Recognition]:Clustering General TermsAlgorithmsKeywords3D tracking,reconstruction,mean shift clustering.1.INTRODUCTIONThe human visual system has the striking ability to tem-porally associate,or track objects.However,this has proven difficult in computer vision,and as such tracking has become a classic problem in thefield.Tracking in the surveillance domain is often solved with one of a few existing methods. Kalmanfilters[3,7,13,14]assume the object being tracked is driven by a linear process and thefilter,based on ob-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.VSSN’06,October27,2006,Santa Barbara,California,USA. Copyright2006ACM1-59593-496-0/06/0010...$5.00.servations,finds the optimal state sequence of the hidden process.Often this scheme is extended to estimate state sequences of nonlinear processes with Extended Kalmanfil-ters(EKFs)at the cost of suboptimal estimation of state sequences.Kalmanfilters and EKFs have one simplifying assumption:they assume that the posterior is Gaussian dis-tributed.This assumption does not hold for all processes, and motivated the creation of particlefilters[9]which have been used extensively in tracking of contours and shapes. More recently,kernel-based techniques for tracking have also become popular.Mean shift tracking[5,6]is very attractive to the vision community because of its computational effi-ciency compared to the other techniques described,however, it lacks some of the robustness that the previous approaches have.All of the aforementioned algorithms are more often than not applied only in the image plane(in2D).There have been steps toward extracting3D information from existing frame-works or using multiple cameras to improve tracking results (e.g.[2,11,19]),but these frameworks focus on combining results from2D trackers into3D information and therefore still suffer from all the problems of2D tracking.To get around these problems we propose a3D recon-struction of the foreground objects and track those objects in three dimensions,circumventing the problems inherent in combining multiple2D trackers.The work proposes a novel methodology on reasoning about3D reconstruction and the grouping of these point clouds.Experimental results are shown and compared to other approaches.The remainder of the paper is organized as follows:Sect. 2will discuss related work in more detail.In Sect.3we will describe our method of matching points in two camera views and generating3D point clouds by employing multiple geometric constraints and continues with Sect.4discussing how we group the point clouds generated.We will go over experimental methods in Sect.5and discuss thefindings. Concluding remarks are given in Sect.6.1.1NotationIn this paper,we will use the following notational stan-dards.Non-image matrices will be denoted with capital boldface(e.g.A)and vectors will be denoted with lowercase boldface(e.g.b).The i th row of a matrix A will be denoted a i.Image matrices will be denoted with calligraphic font, like I.Often in this paper we will refer to two cameras,and there-(a)(b)Figure1:(a)View fromfirst camera.(b)View from second camera.fore two images.We will refer to objects in thefirst cam-era/image with no accentuation,for instance image I or image point m.However,we will denote objects related to the second camera/image with the′symbol,like image I′or image point m′.We will denote homogeneous coordinates with the tilde symbol,i.e.if x=[x y]⊤,then˜x=[x y1]⊤.2.RELATED WORKAs discussed in Sect.1there have been many proposed tracking frameworks,and most are based on either the Bayesian approach to dynamic state estimation[1,3,7,9,13,14]or are based on kernel density estimation[5,6].As these techniques are all fairly well known in the community,we would only like to comment on the fact that in almost all cases they are only used in single view.More recently3D trackers have started to appear.Some trackers focus more on thefiner level of tracking with high resolution targets[12].However,the primary focus of this system is surveillance and therefore we are interested in a coarser level of tracking.There have been some systems focused in this domain that have used multiple sensors to improve results.In[19]shape features were extracted from blobs in a single view and these features were associated from frame to frame.The tracking algorithm was extended in the paper using an EKF and multiple cameras to help with occlusion.In[2]a similar approach was proposed,using epipolar transfer to match centroids of foreground objects in two and three views and found the corresponding3D point for the match sets.This3D data was then pushed through a Kalmanfilter for track-ing.Both of these systems work in many cases,but when two foreground objects are a single blob in all images,these systems fail(although it sounds like a special case,this is actually quite common).Other work exists that locates the positions of people on the ground plane using homographies[11].This can,with a large number of cameras,give accurate counts of the numberof pedestrians in a scene and reliably attain the positionof their feet.However,this approach suffers greatly when shadows are abundant(typical in natural scenes).We propose a semi-dense3D reconstruction(only recon-struct the surfaces of foreground objects)followed by an intelligent clustering of the resulting3D points,and then using the centroids of these3D clusters as the observation input to a Kalmanfilter fortracking.(a)(b)(c)(d)Figure2:(a)Background subtraction fromfirst view.(b)Background subtraction from second view.(c)Shadow removal fromfirst view.(d)Shadow re-moval from second view.3.3D POINT CLOUD GENERATIONIn this section we will discuss the steps necessary to gen-erate a3D reconstruction.From here forward,we assume that we have only two cameras.3.1Calibration&Structure from Motion Thefirst step in the system is to calibrate the cameras offline.We accomplish this using the algorithm from[15]. This results in knowledge of the intrinsic camera matrices which we denote as K and K′for cameras one and two respectively.Once the cameras are calibrated,we deploy them in an outdoor area.Example viewpoints from two cameras with overlapping views are displayed in Fig.1. The next step is to determine the relative orientation of the cameras.We use a basic structure and motion tech-nique similar to[17].However,because the cameras are as-sumed to be wide baseline an automatic matching approach like that of[18]cannot be robustly employed to recover the epipolar geometry.One could implement an approach simi-lar to[16]where features invariant to photometric and geo-metric changes are extracted in the image and then matched, but we manually selected anchor points in the scene to avoid noise in this stage affecting our results.This process ex-tracts the relative rotation and translation between the two cameras,which we will denote R and t respectively.To take advantage of more geometric information,we also assume that at least four of the correspondences among the views are known to be on the ground plane,so that the homography between ground planes,H,can be estimated such that for all homogeneous points˜x(where x=[x y]⊤are coordinates from thefirst image)s˜x′=H˜x(1) where s is an arbitrary scale factor.We estimate H using the standard least squares technique.3.2Shadow RemovalFor each camera,a mean background model is estimatedAlgorithm1Projective Shadow Removal1:procedure ShadowRemoval(H,I,I′,F,F′,T)2:Let L r,L g,L b denote the red,green,and blue planes of image L3:ˆI←Warp(I,H)⊲Warp view1to view2 4:ˆF←Warp(F,H)⊲Warp foreground1to view2 5:D←I′−ˆI6:M←F′∨ˆF⊲Create a mask7:C←D2r+D2g+D2b ×M8:S←C>T9:F′r←F′∧¬S10:return F′r11:end procedureover60images.To extract foreground objects from the im-ages,basic background subtraction using the mean back-ground model is performed:F=|I−B|>T(2) where F is the foreground image,B is the background im-age,I is the input,and T is a manually chosen threshold. Although this is a very simple background subtraction tech-nique,we will show that it is still effective(other methods can be used).Example foreground images are shown in Fig. 2(a)and(b).Notice that in these images shadows appear strong in the foreground.Just as in many other applica-tions,shadows in the foreground of these images will cause problems later in the pipeline.To remedy this we attempt to remove shadows using a technique similar to that in[10].The input to the algorithm is the homography H between the two ground planes,the input images I and I′and the original foreground images F and F′and a threshold T.In this algorithm,the notation “×”is an element-wise multiplication.A description of the algorithm in Alg.1follows.As a preprocessing step,we use image warping to remove areas in each of the foreground images that can only be seen in one view(this allows us to reason on only foreground ob-jects seen in both views).We also balance the luminance across the images.We then warp both the original image and the foreground image from view one into view two.We get the image difference of image two and the warped image in the color space.We can then calculate the“distance”be-tween these two images by squaring each plane element-wise, and taking the square root.With this distance image,we want only to examine the elements that are in the foreground in either image,so we make a mask from the foreground of image two and the warped foreground of image one.We then mask the distance image with this andfind elements that are smaller than threshold T.Anything that passes this threshold is assumed to be a shadow because it has similar color in both the warped and original image,which means it is very likely to lie on the ground plane.Although in[10]the authors utilize an appearance model for shadows,we found comparable performance with this algorithm using a single threshold.An example of shadow removal is shown in Fig.2(c)and(d).3.3Image MatchingWith two foreground images(without shadows),we now find correspondences among the foreground objects.To do this we employ a version of epipolar search.Since we know(b)Figure3:(a)Matching via simple epipolar search.(b)Matching via our epipolar search strategy using two geometric constraints.the fundamental matrix F from our estimation of the mo-tion between cameras,for any point of interest˜x in image I we can determine the line l′in image I′on which the corresponding point˜x′lies with the following formula:l′=F˜x(3) Epipolar search is a common technique and a good refer-ence on epipolar geometry and search can be found in[8]. Typical implementations,however,often employ color his-tograms as a metric for matching points across views.Due to the low resolution of the foreground objects and noise, this constraint alone generates many false matches. However,if the objects move with respect to some domi-nant ground plane(which we assume exists to remove shad-ows)and the cameras are relatively far from this ground plane(which is often the case in surveillance),we can im-pose a second geometric constraint to limit our search in-stead of a constraint based solely on appearance.We as-sume that any point of interest on an object in I(because it is moving with respect to a dominant ground plane)will be transformed“near”the corresponding point in I′via the homography H.Knowing this will allow us to limit our search to a much smaller area,and in general it is impor-tant to note that in these types of applications geometric constraints are far more robust than appearance-based con-straints alone(although using appearance based features to fine-tune matches after this stage may help disambiguate multiple matches).The goal is then tofind the point along the epipolar line l′in I′that is within radiusρof point˜x′=H˜x that is most similar to˜x in appearance.The radiusρis in pixels, and depends on the distance from the camera to the ground plane.In our case,we chose the number20for all experi-ments.In Fig.3(a)we show an example of basic epipolar search without a second geometric constraint.In Fig.3(b) we show our approach to epipolar search to illustrate the utility of the added constraint which reduces the number of noisy matches significantly.It is also important to note that all of the points generated in this plot correspond to people. We apply this matching methodfirst using view one as the cue and searching in view two,then reverse the process using view two as the cue and searching in image one for matches.This results in a denser reconstruction,which is beneficial for clustering.Once matches are recovered from the images,we use basic linear triangulation to reconstruct the3D points from the matches.The linear solution for a pair of matching points m=[u1,v1]⊤and m′=[u2,v2]from image one and image two,respectively,can be formulated[17]as followsk1−u1k3k2−v1k3 b1−u2b3 b2−v2b3˜p=(u2k′3−k′1)t(v2k′3−k′2)t(4)where matrix B=K′R.We can rewrite Eqn.4as Z˜p=z. The linear solution to˜p is˜p=Z†z(5) where Z†denotes the pseudo-inverse(Z⊤Z)−1Z⊤.We perform this reconstruction for each of the matches found,and as a result get a set of3D points in space corre-sponding to our foreground objects.4.MEAN SHIFT CLUSTERINGPoint clouds like those generated in Sec.3.3cannot be in-put directly to a tracking system.We need a much more suc-cinct representation for observations of foreground objects. To do this we cluster the3D points using mean shift[4]. Mean shift clustering allows one to groups a set of points without knowing the number of clusters a priori.When using this clustering technique it is important to address two issues:the metric of the feature space and the shape of the kernel.For our purposes,fortunately,the feature space is Euclidean since it is a reconstruction of real3D ob-jects(up to a uniform scale factor).We select the standard Epanechnikov kernel,which guarantees convergence[4].The Epanechnikov kernel has the profilek E(x)=1−x,0≤x≤1(6) The profile is0when x is outside the designated range.This yields the radially symmetric kernelK E(x)=12c−1d(d+2)(1−||x||2),||x||≤1(7)where x is a data point in d-dimensional space(in our casethis is3),and c−1d is the volume of the d-dimensional unitsphere.This as well evaluates to0when||x||>1.We also must select the bandwidth of our clustering tech-nique.This depends on the result of the structure and mo-tion parameters extracted in Sec.3.1.Structure and motion will return the relative orientation and translation up to an unknown scale,which in turn makes our metric reconstruc-tion from Sect. 3.3up to an unknown scale factor.This means that this parameter will have to be adjusted so that humans are not clustered together,but also so that a single human is not broken into smaller pieces in3D.in Fig.3(b)We then randomly take half of the points,cluster these, and keep those clusters as the input to our tracking sys-tem.Each cluster then has a3-vector as an observation (the(x,y,z)position of the cluster).A sample clustering is shown in Fig.4.Note that all clusters shown in this image correspond to exactly one person save the one highlighted by the yellow box.That cluster merged two people together. We use the cluster centroids as input observations to a Kalmanfilter tracker in3D.Kalmanfilters are represented in state-space form,which is a set of two equations:(1) the state equation(Eqn.8),which models the underlying latent process and(2)the observation equation(Eqn.9) which relates the process to observable phenomena.x k+1=Gx k+w k(8)y k=Hx k+v k(9) In tracking with Kalmanfilters,the state vector is often the position and velocity of the object being tracked,which we also use.x=xyz∆x∆y∆z(10) We also use the standard matrices G and HG=100100010010001001000100000010000001H=100000010000001000The observation and state noise covariance matrices are set experimentally.With our system identified,we can apply the Kalman recursions andfilter observations.We still,however,have the problem of associating the proper observation from an incoming frame pair(a cluster)with the correspondingfilter.A natural solution,since we are dealing withfiltering trajectories,is to use the distance between the estimated position at time t for eachfilter i, which is found with the following formula:ˆz i t=HGx i t−1(11) and compare this estimate with the incoming observations using the Euclidean distance.We then associate an obser-vation with thefilter corresponding to the smallest distance. As there are possible conflicts,wefirst choose the smallest value for all observations,associate that to the properfilter, and then remove all other distance measurements generated by thatfilter and repeat the process until all observations are either assigned,or are deemed unassignable(i.e.they are too far away from any availablefilter).For those observations which are not assigned,if they are near the“edge”of the images(in this case“edge”does not refer to the normal edges of images,but the edge of the area that is seen in both views)then a newfilter is generated and seeded with that starting position.If the observation is not near the edge,it is ignored(assumed to be noise).5.EXPERIMENTSTo evaluate the method outlined above,we applied it to five different outdoor sequences.Each sequence was cap-tured with a Matrox Quad digitizer so that two channels could be recorded simultaneously.The sequences were cap-tured at the full640×480resolution at a full30Hz.In each sequence,sets of corresponding points(∼35)were manu-ally extracted from sequence pairs to estimate the epipolar geometry.We selected sequences for validation and testing that tra-ditional trackers,as well as3D trackers based on fusing2D information[2,19],would fail.Resulting video frames from the tracked sequences are shown infigures Figs.5,6,7.First in Fig.5we validate the tracking approach with a simple sequence.In this se-quence,you can see that the pedestrian is tracked through the wide-baseline views accurately.We show frames from both views.In Fig.6we show the tracker functioning in an environ-ment with a high volume of pedestrians.As a reminder,the approach only tracks the pedestrians that can be seen in both views.In this case we can see that the3D tracker per-forms well.Although in frame50,you can see that tracker 10started to die off,it has started to recover by frame100 and returned near the target.This loss of target occurred due to background subtraction.With such low resolution from the second camera,it was difficult to robustly extract point clouds from the two people being tracked by trackers 10and11,and often only on observation was extracted for the two people.Finally,in Fig.7,we show the most general case.In this example,any2D tracker(or any3D tracker based on the fusing of2D information)will fail because all people are low-resolution,have unreliable appearance,and are all merged into a single blob in both views.To show that2D trackers would fail,we implemented the exact same type of tracker from our3D case in2D and based the observations on blobs instead of on3D clusters.The results are shown in Fig.8(these are the same frames shown in Fig.7).We have shown thefinal images of the two sequences with the tracks discovered from the tracker overlaid.By looking at these twofigures(Fig.7and8)showing the same sequence with the two different algorithms,one can see that the3D tracking algorithm is much more robust to occlusion.Although the tracks look erratic(jagged)the tracker is able to observe all three people in the track and keep thefilters alive.However,in Fig.8we see that in both views,two of the three trackers initialized on the group of three people die out very quickly while one tracker seems to lock on and count them as a single person.Even if one attempts to intelligently combine these2D trackers across views to extract3D information,that information will be unreliable.This is just a single example to provide evidence to the fact that2D trackers will always fail when an object is merged with another object(or objects)in every single view.However,this comes at a price.First of all,the algorithm is obviously much slower than algorithms based on fusing 2D information to get3D tracks.There are two very time-consuming processes in the system.Thefirst such process is shadow removal.It requires multiple image warps,which is very time consuming.We first tried to implement more standard techniques to remove shadows,such as those based on normalized color space,but got very poor results,and turned to this method for its ro-bustness since our matching method will return extremely poor results without shadow removal.The second major bottleneck in the system is the match-ing process itself.For every point in every foreground edge a search along an epipolar line in the corresponding image must occur tofind a match.This is time consuming and in future work could have complexity reduced by adding more cameras.Furthermore,it is evident in sequences we have not shown that shadows are not being completely removed.This is be-cause the shadow removal system will not perform well when the shadow cannot be seen in both views.This will allow more noise points in the reconstruction stage,making clus-ters more noisy and affecting the location of the3D centroid somewhat.Again,in future work this issue could be resolved by adding more cameras.6.CONCLUSIONIn this paper we presented a technique for tracking hu-mans using two cameras.The technique employs a spe-cialized shadow removal technique that uses the geometry of the scene and a specialized epipolar matching technique that utilizes two geometric constraints(epipolar and homo-graphic)due to the nature of geometric constraints being more robust than those based on appearance.Point clouds of foreground objects are generated and then clustered and the centers of these clusters are used as observations to a Kalmanfilter tracking system.The approach was then tested and results were shown.It is apparent from these results that other3D trackers based on fusing2D results and information cannot handle cases where people overlap in multiple camera views.Future work will be focused on adding more cameras to speed up the system and to achieve more accurate results.7.ACKNOWLEDGMENTSThis research was supported in part by the National Sci-ence Foundation under grant No.IIS-0428249.8.REFERENCES[1]S.Arulampalam,S.Maskell,N.Gordon,andT.Clapp.A tutorial on particlefilters for on-linenon-linear/non-gaussian bayesian tracking.IEEETransactions on Signal Processing,50(2):174–188,Feb.2002.[2]J.Black and T.J.Ellis.Multi camera image tracking.Image and Vision Computing,2005.[3]K.J.Bradshaw,I.D.Reid,and D.W.Murray.Theactive recovery of3d motion trajectories and their use in prediction.Pattern Analysis and MachineIntelligence,19(3):219–234,1997.[4]aniciu and P.Meer.Mean shift:A robustapproach toward feature space analysis.IEEE Trans.on Pattern Analysis and Machine Intelligence,May2002.[5]aniciu,V.Ramesh,and P.Meer.Real-timetracking of non-rigid objects using mean shift.In IEEE Conf.on Computer Vision and Pattern Recognition,pages142–151,Hilton Head,SC,USA,2000.[6]aniciu,V.Ramesh,and P.Meer.Kernel-basedobject tracking.IEEE Trans.on Pattern Analysis and Machine Intelligence,May2003.[7]F.Dellaert and C.Thorpe.Robust car tracking usingkalmanfiltering and bayesian templates.In Conference on Intelligent Transportation Systems,1997.[8]R.I.Hartley and A.Zisserman.Multiple ViewGeometry in Computer Vision.Cambridge University Press,ISBN:0521540518,second edition,2004.[9]M.Isard and A.Blake.Condensation–conditionaldensity propagation for visual tracking.InternationalJournal of Computer Vision,29(1):5–28,1998. [10]K.Jeong and C.Jaynes.Moving shadow detectionusing a combined geometric and color classificationapproach.In IEEE Wkshp.on Motion and VideoComputing,Breckenridge,CO,USA,Jan.2004.[11]S.Khan and M.Shah.A multiview approach totracking people in crowded scenes using a planarhomography constraint.In European Conference onComputer Vision,2006.[12]L.Lu,X.-T.Dai,and G.Hager.A particlefilterwithout dynamics for robust3d face tracking.In Conf.on Computer Vision and Pattern RecognitionWorkshop,volume5,2004.[13]O.Masoud and N.Papanikolopoulos.A novel methodfor tracking and counting pedestrians in real-timeusing a single camera.IEEE Transactions onVehicular Technology,50.[14]R.Rosales and S.Sclaroff.Improved tracking ofmultiple humans with trajectory prediction andocclusion modeling.Santa Barbara,CA,USA,1998.[15]R.Tsai.A versatile camera calibration technique forhigh accuracy3d machine vision metrology usingoff-the-shelf tv cameras and lenses.IEEE Journal ofRobotics and Automation,Aug.1987.[16]M.Vergauwen et al.Wide-baseline3d reconstructionfrom digital stills.In Int.Wkshp.on Visualization and Animation of Reality-based3D Models,Engadin,Switzerland,Feb.2003.[17]Z.Zhang.A new multistage approach to motion andstructure estimation by gradually enforcing geometric constraints.In ACCV(2),pages567–574,1998. [18]Z.Zhang,R.Deriche,O.D.Faugeras,and Q.-T.Luong.A robust technique for matching twouncalibrated images through the recovery of theunknown epipolar geometry.Artificial Intelligence,78(1-2):87–119,1995.[19]Q.Zhou and J.K.Aggarwal.Object tracking in anoutdoor environment using fusion of features andcameras.Image and Vision Computing,2005.(b)(c)Figure5:Results from test sequence one.(a)Frame1/Cam1(b)Frame50/Cam1(c)Frame100/Cam1 (d)Frame1/Cam2(e)Frame50/Cam2(f)Frame100/Cam2.(a)(b)(c)(d)(e)(f)Figure6:Results from test sequence two.(a)Frame1/Cam1(b)Frame50/Cam1(c)Frame100/Cam1 (d)Frame1/Cam2(e)Frame50/Cam2(f)Frame100/Cam2.。