Feature Extraction Techniques CMU at TRECVID 2004
运动物体检测
本文在总结和分析了国内外相关研究工作的基础上,针对运动物体检测与提取中如何检测与提取前 景区域以及如何检测与去除阴影的问题开展研究,其主要研究内容和成果如下: 1.采用背景减除法提取前 景区域,利用混合高斯模型进行背景建模,建模过程中,引入了改进的 K-均值算法,加快了背景建模的速 度,提高了背景建模的质量; 2.在背景更新的过程中,采用了基于统计平均的更新算法,相对于传统的背 景更新方法,提高了背景更新的速度; 3.针对运动物体阴影检测与去除中,基于高斯阴影模型的阴影检测 方法在某些情况下对阴影检测不准确的问题,提出了一种基于体色向量匹配的阴影检测与去除算法。
算法 首先对提取出来的前景区域进行亮度测试,去除前景区域中像素亮度值高于对应背景区域的区域。
接着计 算前景中每个区域的方向角分割线,预判断每个区域中是否含有阴影区域,标记出含有阴影区域的区域, 然后对有标记的区域计算其体色向量, 和阴影数据库中的体色向量进行匹配运算, 精确定位出阴影区域。
实 验结果表明,基于体色向量匹配的阴影检测与去除算法对各种场景下运动物体的阴影检测与去除具有很好 的鲁棒性,能够提高阴影检测与去除的速度,克服了基于高斯阴影模型的阴影检测与去除方法在某些情况 下对阴影检测不准确的问题。
作者: 学科专业: 授予学位:辛国江 计算机应用技术 硕士学位授予单位: 湖南大学 湖南大学 导师姓名: 学位年度: 语 种: 分类号: 关键词: 邹北骥 2006 chi TP391.4 阴影检测 体色向量匹配 混合高斯模型 K-均值算法基于统计平均方向角分割线 机标关键词: 视频 景更新 研究返回顶部参考文献(58 条) 参考文献(58 参考文献 1. 2. 3. 4. 5. 参考文献 王亮.胡卫明.谭铁牛 人运动的视觉分析综述 2002(03) E Terrance Boult.Ross J Micheals.Xiang Gao Into the woods:visual surveillance of noncooperative and camouflaged targets in complex outdoor settings 2001 章毓晋 图象图形科学丛书--图像分割 2001 T Huang.A Netravali Motion and structure from feature correspondences:a review 1994运动物体 均值算法 统计平均阴影检测 模型 去除方法提取 前景区域 高斯 方向角分割线 匹配运算 亮度体色向量匹配 背景 测不准 背景减除 精确定位6. 7. 8. 9.R Chellappa.Q Zheng Automatic feature point extraction and tracking in image sequences for arbitrary camera motion1995 R T Collins A system for video surveillance and monitoring:VSAM final report[ Technical Report CMU-RI-TR-00-12] 2000 I Haritaoglu.D Harwood.L S Davis W4:real-time surveillance of people and their activities 2000 John D Prahge Detecting recognizing and understanding video events in surveillance video 200310. McKenna Tracking groups of people [外文期刊] 2000(01)>>更多... 返回顶部引证文献(1 条) 引证文献(1 引证文献 1. 单海涛 复杂环境下运动人体分割算法的研究 [学位论文] 硕士 2007返回顶部相似文献(1 条) 相似文献(1 相似文献 1. 期刊论文 运动物体的阴影检测与分割 - 工程图学学报。
Feature Extraction(特征提取)
ASM
Point feature
not sensitive to light not sensitive to noise gray invariance Feature dimension is low
sensitive to noise Feature dimension is high ------
Training(shape modeling)
1.Select appropriate feature points 2.Shape statistical model 3.Match the model to the new set of points
Fitting(shape matching)
Experiment Result
PCA
EigenFace method uses PCA to get the main components of face distribution. The detailed realization is to decompose eigenvalues of the covariance matrix of all face images in the training set and get the corresponding eigenvectors, which are “eigenfaces ".
Experiment ReLBP is an operator used to describe the local texture features of images. Proposed in 1994.It has the advantages of rotation invariance and gray invariance.
规 则 推 理 算 法 T r e a t
深度学习资源Table of ContentsCommon Lisp以下是根据不同语言类型和应用领域收集的各类工具库,持续更新中。
通用机器学习Recommender?- 一个产品推荐的C语言库,利用了协同过滤.计算机视觉CCV?- C-based-Cached-Core Computer Vision Library ,是一个现代化的计算机视觉库。
VLFeat?- VLFeat 是开源的 computer vision algorithms库, 有Matlab toolbox。
计算机视觉OpenCV?- 最常用的视觉库。
有 C++, C, Python 以及 Java 接口),支持Windows, Linux, Android and Mac OS。
DLib?- DLib 有 C++ 和 Python 脸部识别和物体检测接口。
EBLearn?- Eblearn 是一个面向对象的 C++ 库,实现了各种机器学习模型。
VIGRA?- VIGRA 是一个跨平台的机器视觉和机器学习库,可以处理任意维度的数据,有Python接口。
通用机器学习MLPack?- 可拓展的 C++ 机器学习库。
DLib?- 设计为方便嵌入到其他系统中。
encog-cppVowpal Wabbit (VW)?- A fast out-of-core learning system.sofia-ml?- fast incremental 算法套件.Shogun?- The Shogun Machine Learning ToolboxCaffe?- deep learning 框架,结构清晰,可读性好,速度快。
CXXNET?- 精简的框架,核心代码不到 1000 行。
XGBoost?- 为并行计算优化过的 gradient boosting library.CUDA?- This is a fast C++-CUDA implementation of convolutional [DEEP LEARNING]Stan?- A probabilistic programming language implementing full Bayesian statistical inference with Hamiltonian Monte Carlo samplingBanditLib?- A simple Multi-armed Bandit library.Timbl?- 实现了多个基于内存的算法,其中 IB1-IG (KNN分类算法)和 IGTree(决策树)在NLP中广泛应用.自然语言处理MIT Information Extraction Toolkit?- C, C++, and Python 工具,用来命名实体识别和关系抽取。
基于多任务深度特征提取及MKPCA_特征融合的语音情感识别
第54卷 第5期2023年9月太原理工大学学报J O U R N A L O F T A I Y U A N U N I V E R S I T Y O F T E C HN O L O G YV o l .54N o .5S e p.2023 引文格式:李宝芸,张雪英,李娟,等.基于多任务深度特征提取及MK P C A 特征融合的语音情感识别[J ].太原理工大学学报,2023,54(5):782-788.L I B a o y u n ,Z HA N G X u e y i n g ,L I J u a n ,e t a l .S p e e c h e m o t i o n r e c o g n i t i o n b a s e d o n m u l t i -t a s k d e e pf e a t u r e e x t r a c -t i o n a n d MK P C A f e a t u r e f u s i o n [J ].J o u r n a l o f T a i y u a n U n i v e r s i t y o f T e c h n o l og y,2023,54(5):782-788.收稿日期:2022-03-04;修回日期:2022-04-10基金项目:国家自然科学基金资助项目(61371193);山西省回国留学人员科研资助项目(H G K Y 2019025) 第一作者:李宝芸(1996-),硕士研究生,(E -m a i l )2825959351@q q.c o m 通信作者:张雪英(1964-),博士,教授,主要从事语音信号处理㊁人工智能的研究,(E -m a i l )t y z h a n g x y@163.c o m 基于多任务深度特征提取及MK P C A特征融合的语音情感识别李宝芸,张雪英,李 娟,黄丽霞,陈桂军,孙 颖(太原理工大学信息与计算机学院,太原030024)摘 要:ʌ目的ɔ针对传统声学特征所含情感信息不足的问题,提出一种基于多任务学习的深度特征提取模型优化声学特征,所提声学深度特征既能更好表征自身又拥有更多情感信息㊂ʌ方法ɔ基于声学特征与语谱图特征之间的互补性,首先通过卷积神经网络提取语谱图特征,然后使用多核主成分分析方法对这两个特征进行特征融合降维,所得融合特征可有效提升系统识别性能㊂ʌ结果ɔ在E MO D B 语音库与C A S I A 语音库上进行实验验证,当采用D N N 分类器时,声学深度特征与语谱图特征的多核融合特征取得最高识别率为92.71%㊁88.25%,相比直接拼接特征,识别率分别提升2.43%㊁2.83%.关键词:语音情感识别;多任务学习;声学深度特征;语谱图特征;多核主成分分析中图分类号:T N 912.34 文献标识码:AD O I :10.16355/j .t yu t .1007-9432.2023.05.004 文章编号:1007-9432(2023)05-0782-07S p e e c h E m o t i o n R e c o g n i t i o n B a s e d o n M u l t i -t a s k D e e p Fe a t u r e E x t r a c t i o n a n d MK P C A F e a t u r e F u s i o nL I B a o y u n ,Z H A N G X u e y i n g ,L I J u a n ,H U A N G L i x i a ,C H E N G u i j u n ,S U N Y i n g(C o l l e g e o f I n f o r m a t i o n a n d C o m p u t e r ,T a i y u a n U n i v e r s i t y o f T e c h n o l o g y ,T a i yu a n 030024,C h i n a )A b s t r a c t :ʌP u r po s e s ɔS p e e c h e m o t i o n r e c o g n i t i o n a l l o w s c o m p u t e r s t o u n d e r s t a n d t h e e m o -t i o n a l i n f o r m a t i o n c o n t a i n e d i n h u m a n s p e e c h ,a n d i s a n i m p o r t a n t p a r t o f i n t e l l i ge n t h u m a n -c o m -p u t e r i n t e r a c t i o n .F e a t u r e e x t r a c t i o n a n df u s i o n a r e k e y p a r t s i n s p e e c h e m o t i o n r e c og n i t i o n s ys -t e m s ,a n d h a v e a n i m p o r t a n t i m p a c t o n r e c o g n i t i o n r e s u l t s .A i m i n g a t t h e p r o b l e m o f i n s u f f i c i e n t e m o t i o n a l i n f o r m a t i o n c o n t a i n e d i n t r a d i t i o n a l a c o u s t i c f e a t u r e s ,a d e e pf e a t u r e e x t r a c t i o n m e t h o d b a s e d o n m u l t i -t a s k l e a r n i ng f o r o p t i m i z a t i o n o f a c o u s t i c f e a t u r e s i s p r o p o s e d i n thi s p a pe r .ʌM e t h o d s ɔT h e p r o p o s e d a c o u s t i c d e pt h f e a t u r e c a n b e t t e r c h a r a c t e r i z e i t s e l f a n d h a s m o r e e m o -t i o n a l i n f o r m a t i o n .T h e n ,o n t h e b a s i s o f t h e c o m p l e m e n t a r i t y be t w e e n a c o u s t i cf e a t u r e s a n d s p e c t r og r a m f e a t u r e s ,s p e c t r o g r a m f e a t u r e s th r o u gh c o n v o l u t i o n a l n e u r a l n e t w o r k a r e e x t r a c t e d .T h e n ,t h e m u l t i -k e r n e l p r i n c i p a l c o m p o n e n t a n a l ys i s m e t h o d i s u s e d t o p e r f o r m f e a t u r e f u s i o n a n d d i m e n s i o n r e d u c t i o n o n t h e s e t w o f e a t u r e s ,a n d t h e o b t a i n e d f u s i o n f e a t u r e s c a n e f f e c t i v e l y im -Copyright ©博看网. All Rights Reserved.p r o v e t h e s y s t e m r e c o g n i t i o n p e r f o r m a n c e.ʌF i n d i n g sɔE x p e r i m e n t s a r e c a r r i e d o u t o n t h e E MO D B a n d t h e C A S I A s p e e c h d a t a b a s e s.W h e n t h e D N N c l a s s i f i e r i s u s e d,t h e m u l t i-k e r n e l f u-s i o n f e a t u r e o f t h e a c o u s t i c d e p t h f e a t u r e a n d t h e s p e c t r o g r a m f e a t u r e a c h i e v e t h e h i g h e s t r e c o g n i-t i o n r a t e s o f92.71%a n d88.25%,r e s p e c t i v e l y.C o m p a r e d w i t h d i r e c t f e a t u r e s p l i c i n g,t h i s m e t h o d i n c r e a s e d t h e r e c o g n i t i o n r a t e b y2.43%a n d2.83%,r e s p e c t i v e l y.K e y w o r d s:s p e e c h e m o t i o n r e c o g n i t i o n;m u l t i-t a s k l e a r n i n g;a c o u s t i c d e p t h f e a t u r e s;s p e c t r o-g r a m f e a t u r e s;m u l t i-k e r n e l p r i n c i p a l c o m p o n e n t a n a l y s i s语音情感识别是一门致力于让计算机明白人类语音所包含的情感信息,达到像人一样进行生动㊁自然交流的技术,在智能家居㊁智慧医疗等多个领域得到广泛应用㊂其模型一般包括三个模块,语音情感数据库㊁特征提取以及识别算法[1]㊂其中,特征提取是模型的关键部分之一,且随着特征种类的增多,有效的特征融合也成为提升识别性能的重要方式㊂目前常用的声学特征包括韵律学特征(能量㊁语速㊁过零率),音质特征(共振峰)与梅尔频率倒谱系数(M e l-f r e q u e n c y c e p s t r a l c o e f f i c i e n t s,M F C C)三类[2],但是这些特征都属于浅层特征,并不能表征语音信号的深层次特性㊂深度学习的出现为解决这个问题提供了新的思路[3-4]㊂深度神经网络(d e e p n e u r a l n e t w o r k s,D N N)是常用的提取深度特征的方式,通过有监督的网络训练,可以使隐藏层学习到更多情感信息,之后提取隐藏层作为声学深度特征进行后续的识别工作,可有效提升模型的性能㊂文献[5]利用D N N对直接提取的声学特征进行二次特征优化,增强了传统声学特征的情感信息㊂这说明了利用深度学习优化声学特征的可行性㊂本文为提取更能表征自身且情感信息增多的声学深度特征,在D N N的基础上,结合多任务学习[6-7]的思想,提出一种基于多任务学习的深度特征提取模型㊂在D N N的基础框架之上,构建分类任务与自学习任务,同时训练网络,提取深度特征㊂其中,分类任务中网络的标签为情感类别;自学习任务中设置网络的标签为输入特征本身,在网络的输出层进行特征的重构㊂该模型的整体流程为通过共享的输入层和隐藏层,同时训练分类任务与自学习任务,通过反向传播使得隐藏层具备更多原始特征以及情感信息,训练结束后,提取隐藏层作为声学深度特征输出㊂同时,语谱图作为语音信号的二维表征,包含语音的时频域信息,语谱图特征的提取也是目前的研究热点之一[8-9]㊂本文利用卷积神经网络(c o n v o l u-t i o n a l n e u r a l n e t w o r k,C N N)对语谱图进行学习训练,最后提取全局平均池化层作为语谱图特征输出㊂声学特征与语谱图特征作为语音信号不同维度的表征,两者进行融合可有效提升模型的性能[10-12]㊂文献[13]首先将语谱图通过卷积循环神经网络提取语谱图特征,然后将其与声学特征拼接后输入s o f t-m a x分类器进行识别,结果显示拼接特征的识别率要优于单一特征,说明了上述特征融合的有效性㊂但是直接拼接的特征实质上并没有对特征进行空间上的融合,且会存在特征维数过大的问题㊂多核学习[14]利用核函数将特征映射到高维空间,可以使声学特征与语谱图特征在核空间进行映射融合,获取两个不同特征的优势,从而提升融合特征的性能;核主成分分析[15]可以缩减特征维数,从而解决特征融合带来的维数过多的问题㊂因此,本文结合上述两个算法,构建多核主成分分析(m u l t i p l e k e r n e l p r i n-c i p a l c o m p o n e n t a n a l y s i s,MK P C A)的方法进行特征融合㊂该方法首先将声学特征与语谱图特征结合多核学习的思想构建多核映射空间,使其具有不同核函数的映射特性从而获得更强的特征映射能力,之后进行P C A降维得到融合特征㊂综上所述,本文提出的基于多任务深度特征提取及多核主成分分析语音情感识别系统如图1所示㊂深度特征提取与特征融合是整个系统的关键㊂首先,提取语音信号的声学特征与语谱图作为原始输入;之后,采用多任务深度神经网络提取声学深度特征,利用卷积神经网络提取语谱图特征,作为融合的前端特征;接着使用MK P C A算法得到融合特征,最后输入D N N分类器进行情感识别㊂最终在E MO D B㊁C A S I A语音库上进行验证,结果表明经过深度特征提取与MK P C A特征融合,语音情感识别系统的性能得到了改善,展现了更强的分类能力㊂下面详细叙述各部分原理和算法㊂1基于多任务学习的声学深度特征1.1声学特征本文使用O p e n S M L I E工具箱提取语音信号的387第5期李宝芸,等:基于多任务深度特征提取及MK P C A特征融合的语音情感识别Copyright©博看网. All Rights Reserved.卷积神经网络深度特征提取特征融合分类输出输入语音信号语谱图语谱图特征声学特征多任务深度神经网络声学深度特征MK PCA分类器高兴生气悲伤…图1 语音情感识别系统F i g .1 S p e e c h e m o t i o n r e c o g n i t i o n b l o c k d i a gr a m 声学特征 I N T E R S P E E C H 2009情感挑战赛特征集(I S 09特征集),该特征集共包括32类特征,每类特征提取12个统计量得到共计384维特征,具体特征见表1.表1 I S 09特征集T a b l e 1 I S 09f e a t u r e s e t声学特征能量㊁M F C C 特征㊁过零率㊁发声概率㊁基频及对应的一阶系数统计量最大值㊁最小值㊁变化范围㊁最大值位置㊁最小值位置㊁平均值㊁斜率㊁偏移值㊁二次误差㊁标准偏差㊁偏度㊁峰度I S 09特征集由传统声学特征组成,只能表征语音信号的浅层特征,不能在更深层次上描述语音信号,故本文采取基于多任务学习的深度神经网络对该特征集进行二次特征提取,得到更为高级的声学深度特征㊂1.2 多任务学习多任务学习与一般的单任务学习(如D N N 为单任务学习网络)不同,其简要结构如图2所示㊂多任务学习的结构一般由共享模块与任务模块组成,共享模块包含共享的网络参数,任务模块包含该网络需完成的不同任务㊂多任务学习通过共享网络层参数并行训练多个任务,最终使得单个网络可以实现多个功能㊂task1task2task3Task layersShared layers图2 多任务学习框架F i g .2 M u l t i -t a s k l e a r n i n g fr a m e w o r k 1.3 基于多任务学习的深度特征提取模型本文在D N N 网络的基础上,结合多任务学习,构建多任务D N N 网络进行深度特征提取㊂其中,多任务为分类任务与自学习任务㊂分类任务的本质为设置网络的标签为情感类别,通过s o f t m a x 进行分类识别;自学习任务本质为重构特征,设置网络的标签为输入特征本身,通过线性映射将隐藏层重构为与输入特征维数相同的输出层,计算输出与标签之间的均方误差,实现对输入特征的重构工作㊂本文所提多任务深度特征提取框架如图3所示㊂多任务深度神经网络包含一个输入层㊁三个隐藏层与两个输出层㊂其中,输出层实现两个不同的任务,具有两个不同的损失,网络通过建立联合损失进行反向传播,同时训练分类任务与自学习任务,最终使得隐藏层即所提声学特征可更好表征特征自身以及情感信息增多㊂其整体流程如下:1)I u pu t 输入为I S 09特征集,记为x ={x 1,x 2, ,x n },其中,x n 为特征值,n 为特征维数㊂2)x 前向传播经过共享的隐藏层(H i d d e n 1㊁H i d d e n 2与H i d d e n 3)进行映射学习㊂3)在H i d d e n 3之后进行输出,分为两个输出任务:分类任务与自学习任务㊂a )分类任务将输入的H i d d e n 3的数据,通过s o f t m a x 函数计算情感类别概率得到预测标签,然后通过交叉熵损失函数计算真实情感标签与预测标签之前的损失l o s s 1,反向传播微调隐藏层参数,使得隐藏层具有更多情感标签信息㊂其中,损失函数l o s s 1如式(1)所示,l o s s 1=ðmi =1-y iln (y 'i)-(1-y i)l n (1-y 'i)m.(1)其中,y i 为情感真实标签值,y 'i 为情感预测标签值,m 为情感类别数㊂b )自学习任务通过线性映射将输入的H i d -d e n 3的数据扩展至维数为n 对x 进行重构得到x ',同时通过均方误差计算x 与x '之间损失l o s s 2,反向传播更新网络参数,使H i d d e n 3包含更多输入特征的信息㊂损失函数l o s s 2如式(2)所示,l o s s 2=ðni =1(x i -x 'i )2n.(2)487太原理工大学学报 第54卷Copyright ©博看网. All Rights Reserved.分类任务task1Softmax输出层Output1(y ')Output2(x ')linear mapping自学习任务task2loss 2=∑(x i -x i ')2/nni =1βlossαloss 1=∑-y i ln (y i ')-(1-y i )ln (1-y i ')mi =1m共享输入层以及隐藏层深度特征Input (x )Hidden1Hidden2Hidden3图3 多任务深度特征提取框图F i g .3 B l o c k d i a g r a m o f m u l t i -t a s k d e e p fe a t u r e e x t r a c t i o n 4)多任务学习的本质在于一个网络实现两个任务,通过建立多任务损失l o s s 同时对分类任务与自学习任务进行反向传播微调神经元节点值,多任务损失l o s s 定义如式(3)所示,l o s s =α㊃l o s s 1+β㊃l o s s 2.(3)其中,α㊁β为对应的权重㊂训练结束后,提取H i d d e n 3作为声学深度特征输出㊂经过多任务深度神经网络提取的声学深度特征具有两个特点:更多的情感信息和更好的表征原始特征㊂2 基于MK P C A 特征融合的语音情感识别模型声学特征与语谱图特征是常用的语音特征,声学特征是语音的一维表征,而语谱图特征是语音的二维表征,两个特征是对语音信号的不同表达,对情感的表述能力不同,具有互补的特性,两者融合可以有效提升模型的识别性能㊂本文提出一种多核主成分分析(MK P C A )的方法对二者进行特征融合㊂2.1 基于卷积神经网络的语谱图特征语谱图是语音信号经过傅里叶变换后的二维表示,可有效表征语音信号的时频域特性,M e l 语谱图在传统语谱图的基础上加入M e l 滤波器,使得该语谱图能更好地表征人类对情感的认知㊂语谱图特征的提取是将M e l 语谱图通过C N N 的学习实现的,C N N 网络结构如表2所示㊂2.2 基于多核主成分分析的特征融合多核学习可联合多个核函数构建多核空间,通表2 C N N 网络结构T a b l e 2 C N N n e t w o r k s t r u c t u r e网络层尺寸输入层40*300卷积层17*1池化层12*4卷积层27*3池化层22*4卷积层37*3池化层32*4卷积层45*3池化层42*4全局平均池化层5128过映射融合多种特征㊂K P C A 利用核函数将原始数据映射到高维空间,然后在高维空间中进行主成分分析可对数据进行降维处理㊂本文结合两者构建多核主成分分析(MK P C A )融合声学深度特征与语谱图特征㊂其主要流程如图4所示㊂多核映射特征空间:K mlopca =λ1·K poly +λ2·K rbf预处理声学深度特征、语谱图特征计算协方差矩阵求解特征值与特征向量按照预设维数输出特征向量图4 MK P C A 算法框图F i g .4 MK P C A a l g o r i t h m b l o c k d i a gr a m MK P C A 算法流程如下:1)输入声学深度特征与语谱图特征的拼接特587 第5期 李宝芸,等:基于多任务深度特征提取及MK P C A 特征融合的语音情感识别Copyright ©博看网. All Rights Reserved.征 I S09MT-M S P特征,进行数据预处理㊂2)构造多核映射空间㊂在核函数的选取上,不同的核函数具有不同的特性,但是线性核本质上并没有对特征空间进行映射,S i g m o i d核函数只在某些特殊核参数值的条件下才满足M e r c e r条件,因此本文选取关注全局特性的多项式核与关注局部特性的径向基核构成多核㊂将输入的特征集进行多项式核与径向基核映射,得到多项式核矩阵K p o l y与径向基核矩阵K r b f.其中,多项式核计算公式如式(4)所示:K p o l y(z i,z j)=(a㊃z i㊃z T j+c)d.(4)式(4)中共包含三个参数:a㊁c㊁d,z i㊁z j属于输入特征空间z.径向基核计算公式为如式(5)所示:K r b f(z i,z j)=e x p(- z i-z j 2/2σ2).(5)式(5)包含一个参数:σ.多核映射空间K m k p c a计算如式(6)所示:K m k p c a=λ1㊃K p o l y+λ2㊃K r b f.(6)其中,K m k p c a为多核矩阵,λ1㊁λ2分别为K p o l y与K r b f 的权重,且λ1+λ2=1.3)最后在映射后的空间进行主成分分析㊂对所求多核矩阵求解协方差矩阵,解得特征值与特征向量,按照设定维数进行映射得到融合降维后的特征㊂3实验设置及结果3.1情感语音数据库本文共在两个不同语种的公开数据集上进行实验验证,具体信息如下㊂E MO D B情感语音数据库,是由柏林工业大学录制的德语情感语音库,共535个样本,包含7种情感(生气㊁害怕㊁厌恶㊁高兴㊁中性㊁悲伤㊁无聊)㊂C A S I A情感语音数据库,是由中科院录制的汉语情感语音库,共1200个样本,包含6种情感(生气㊁害怕㊁高兴㊁中性㊁伤心㊁惊讶)㊂3.2实验参数设置实验环境为p y t h o n3.7㊁t e n s o r f l o w1.14,采用五折交叉验证的方式划分训练集与测试集,比例为4ʒ1,评价指标为准确率㊂1)多任务深度神经网络参数:学习率为0.001,b a tc h-s i z e为32,e p o c h为50,共享的第1㊁2隐藏层神经元数目为:[512,512];为提取分类性能最优的深度特征,特征提取层(即第3个隐藏层)共设置5个不同的神经元数目进行对比研究:[50,100,150,200,250];经过实验选取多任务权重为α=0.8㊁β= 0.2.2)语谱图特征提取网络参数:学习率为0.0003,b a tc h-s i z e为32,训练e p o c h为50.3)MK P C A参数:p o l y核的参数a㊁c均在[2-8,28]均匀取40个值,d为[1,2,3];r b f核的参数σ在[2-8,28]均匀取40个值,经过网格寻优的方式得到最优参数组合;经过实验选取两个核函数的权重为λ1=0.4㊁λ2=0.5.4)D N N分类网络参数:学习率为0.001,b a tc h-s i z e为64,e p o c h为50,隐藏层参数为[512, 512,300].3.3实验结果实验中出现的特征名称如表3.表3特征名称说明T a b l e3 F e a t u r e n a m e d e s c r i p t i o n特征说明I S09O p e n S M L I E工具箱提取的声学特征I S09D N N经过深度神经网络提取的声学深度特征I S09M T经过多任务深度神经网络提取的声学深度特征M S P经过C N N提取的语谱图特征I S09M T-M S P I S09MT㊁M S P特征直接拼接后的特征I S09M T-M S P-MK P C A I S09M T-M S P特征经过MK P C A融合后的特征3.3.1声学深度特征实验结果为获得识别效果最优的声学深度特征,实验共设置5个不同的神经元数目提取对应维数的深度特征进行对比,同时与经过深度神经网络提取的深度特征进行对比用以验证多任务学习的有效性㊂两个语音库采用D N N分类器时的识别率如图5所示,原始I S09特征的识别率分别为82.80%㊁75.91%.由图5可以看出,I S09MT特征在两个语音库上的识别率均比原始I S09特征识别率高,最高提升4.12%㊁5.26%,这说明了声学深度特征可有效优化声学特征,提升其情感分类能力㊂同时,I S09MT特征比I S09D N N特征识别率高,这体现了多任务学习的有效性㊂由于多任务D N N网络增加了自学习任务,使得网络计算复杂度有所增加,但我们用小的代价换来了识别率的提升㊂其中,I S09MT特征在E MO D B语音库上特征维数为150维时取得最高识别率为86.92%,相比I S09D N N特征提升1.62%.在C A S I A语音库上为200维时最高识别率为81.17%,相比I S09D N N特征提升1.09%,故本文选取上述维数的特征作为所提声学深度特征㊂687太原理工大学学报第54卷Copyright©博看网. All Rights Reserved.88878685848382识别率/ %50100150200250特征维数(神经元数)85.3084.1884.6785.2385.4286.9286.1784.1183.7484.49IS09DNNIS09MT(a )EMODB 语音库结果828180797877识别率/ %50100150200250特征维数(神经元数)79.5079.0078.5078.9280.1780.5081.1780.0879.0380.42IS09DNN IS09MT(b )CASIA 语音库结果图5 声学特征识别结果F i g .5 A c o u s t i c f e a t u r e r e c o gn i t i o n r e s u l t s 3.3.2 声学深度特征与语谱图特征MK P C A 特征融合实验结果实验共提取4个不同维数的融合特征来验证MK P C A 算法的有效性,在两个语音库的识别率如表4所示,M S P 特征识别率分别为80.18%㊁80.17%,直接拼接的I S 09MT -M S P 特征的识别率分别为90.28%㊁85.42%.表4 I S 09M T -M S P -MK P C A 特征识别率T a b l e 4 R e c o gn i t i o n r a t e o f I S 09M T -M S P -MK P C A f e a t u r e %语音库识别率/%100150200250E MO D B 91.9692.3392.7192.14C A S I A88.0088.0888.2588.17由表4可以看出,经过MK P C A 算法融合降维得到的I S 09MT -M S P -MK P C A 特征在两个语音库上维数为200时取得最大识别率分别为92.71%㊁88.25%,相比单一特征的识别率提升明显,与直接拼接的I S 09MT -M S P 特征相比识别率分别提升2.43%㊁2.83%,说明所提MK P C A 算法可有效利用两个不同特性特征的优势,提升模型性能㊂图6为I S 09MT -M S P -MK P C A 特征在两个语音库上的混淆矩阵,可以看出,绝大部分样本都识别了正确的情感,只有少数样本存在混淆,具体分析如下:由图6(a )中可看出,在E MO D B 语音库上高兴与生气之间的混淆度最大,这是由于两者都是高效价的情感,所以不易区分;由图6(b )可看出,在C A -S I A 语音库上,悲伤与恐惧之间的混淆度最大,这是由于两者都是低效价的情感,所以两者易混淆㊂anger fear bored disgust happyneutral sady _t r u esadangerfearbored disgust happy neutraly _pred 600120007400300005400120004500025007600003006200014006125(a )EMODB 语音库angerfear happy neutral sad surprise y _t r u esadangerfearhappy neutraly _pred 18000913(b )CASIA 语音库surprise21693324115186242105618131722131166050242177图6 混淆矩阵F i g.6 C o n f u s i o n m a t r i x 3.3.3 结果对比表5列出了本文所提语音情感识别算法与其他方法在两个语音库上的识别率对比㊂其中论文所用特征均为声学特征与语谱图㊂表5 与其他方法在两个语音库上的识别率比较T a b l e 5 R e c o g n i t i o n r a t e c o m pa r a t i o n o n t w o d a t ab a s e s %论文方法E MO D B C A S I A 文献[10]直接拼接+r e l i e f F90.21-文献[12]并行卷积循环神经网络86.4458.25文献[13]直接拼接+s o f t m a x-78.91文献[14]MK L 86.0088.00本文方法多任务深度特征+MK P C A92.7188.25 由表5可看出,本文所提模型在两个语音库上均取得了最好识别率,这说明本文所提基于多任务学习的深度特征提取模型以及MK P C A 特征融合算法可有效利用语音信号中包含的情感信息,使得最终的分类性能提升㊂787 第5期 李宝芸,等:基于多任务深度特征提取及MK P C A 特征融合的语音情感识别Copyright ©博看网. All Rights Reserved.4结束语考虑到特征提取及特征融合在语音情感识别中的重要性,首先提出一种多任务深度神经网络进行声学特征的优化工作,通过同时训练分类任务与自学习任务,使得所提声学深度特征拥有更多的情感信息和更好的表征原始特征;接着基于声学特征与语谱图特征之间的互补性,利用MK P C A算法对二者进行特征融合;最后,将所提方法在E MO D B㊁C A S I A语音库上进行验证,多核融合特征最高识别率为92.71%㊁88.25%,相比直接拼接特征,识别率分别提升2.43%㊁2.83%,表明所提方法有效提升了模型识别性能㊂参考文献:[1]张雪英,孙颖,张卫,等.语音情感识别的关键技术[J].太原理工大学学报,2015,46(6):629-636.Z HA N G X Y,S U N Y,Z HA N G W,e t a l.T h e k e y t e c h n o l o g y o f s p e e c h e m o t i o n r e c o g n i t i o n[J].J o u r n a l o f T a i y u a n U n i v e r s i t y o f T e c h n o l o g y,2015,46(6):629-636.[2]张雪英,张婷,孙颖,等.基于P A D模型的级联分类情感语音识别[J].太原理工大学学报,2018,49(5):731-735.Z HA N G X Y,Z HA N G T,S U N Y,e t a l.C a s c a d i n g c l a s s i f i c a t i o n e m o t i o n s p e e c h r e c o g n i t i o n b a s e d o n P A D m o d e l[J].J o u r n a l o f T a i y u a n U n i v e r s i t y o f T e c h n o l o g y,2018,49(5):731-735.[3] W E I P,Z H A O Y.A n o v e l s p e e c h e m o t i o n r e c o g n i t i o n a l g o r i t h m b a s e d o n w a v e l e t k e r n e l s p a r s e c l a s s i f i e r i n s t a c k e d d e e p a u-t o-e n c o d e r m o d e l[J].P e r s o n a l a n d U b i q u i t o u s C o m p u t i n g,2019,23(3-4):521-529.[4] Z HA N G L,WA N G L,D A N G J,e t a l.G e n d e r-a w a r e C N N-B L S T M f o r s p e e c h e m o t i o n r e c o g n i t i o n[C]ʊI n t e r n a t i o n a l C o n f e r-e n c e o n A r t if i c i a l N e u r a l N e t w o r k s.S p r i ng e r,Ch a m,2018:782-790.[5]S U N L,Z O U B,F U S,e t a l.S p e e c h e m o t i o n r e c o g n i t i o n b a s e d o n D N N-d e c i s i o n t r e e S VM m o d e l[J].S p e e c h C o mm u n i c a t i o n,2019,115(2019):29-37.[6] Y A O Z,WA N G Z,L I U W,e t a l.S p e e c h e m o t i o n r e c o g n i t i o n u s i n g f u s i o n o f t h r e e m u l t i-t a s k l e a r n i n g-b a s e d c l a s s i f i e r s:H S F-D N N,M S-C N N a n d L L D-R N N[J].S p e e c h C o mm u n i c a t i o n,2020,120:11-19.[7] L I Y,Z H A O T,K AWA H A R A T.I m p r o v e d e n d-t o-e n d s p e e c h e m o t i o n r e c o g n i t i o n u s i n g s e l f a t t e n t i o n m e c h a n i s m a n d m u l t i-t a s k l e a r n i n g[C]ʊI n t e r s p e e c h2019,2019:2803-2807.[8] Z HA N G S,Z H A N G S,HU A N G T,e t a l.S p e e c h e m o t i o n r e c o g n i t i o n u s i n g d e e p c o n v o l u t i o n a l n e u r a l n e t w o r k a n d d i s c r i m i-n a n t t e m p o r a l p y r a m i d m a t c h i n g[J].I E E E T r a n s a c t i o n s o n M u l t i m e d i a,2017,20(6):1576-1590.[9] L U O D,Z O U Y,HU A N G D.I n v e s t i g a t i o n o n j o i n t r e p r e s e n t a t i o n l e a r n i n g f o r r o b u s t f e a t u r e e x t r a c t i o n i n s p e e c h e m o t i o nr e c o g n i t i o n[C]ʊI n t e r s p e e c h2018,2018:152-156.[10] E R M B.A n o v e l a p p r o a c h f o r c l a s s i f i c a t i o n o f s p e e c h e m o t i o n s b a s e d o n d e e p a n d a c o u s t i c f e a t u r e s[J].I E E E A c c e s s,2020,8:221640-221653.[11]胡德生,张雪英,张静,等.基于主辅网络特征融合的语音情感识别[J].太原理工大学学报,2021,52(5):769-774.HU D S,Z HA N G X Y,Z H A N G J,e t a l.S p e e c h e m o t i o n r e c o g n i t i o n b a s e d o n m a i n a n d a u x i l i a r y n e t w o r k f e a t u r e f u s i o n[J].J o u r n a l o f T a i y u a n U n i v e r s i t y o f T e c h n o l o g y,2021,52(5):769-774.[12]J I A N G P X,H O N G L.P a r a l l e l i z e d c o n v o l u t i o n a l r e c u r r e n t n e u r a l n e t w o r k w i t h s p e c t r a l f e a t u r e s f o r s p e e c h e m o t i o n r e c o g n i-t i o n[J].I E E E A c c e s s,2019,7:90368-90377.[13] P E N G W Y,T A N G X Y.S p e e c h e m o t i o n r e c o g n i t i o n o f m e r g e d f e a t u r e s b a s e d o n i m p r o v e d c o n v o l u t i o n a l n e u r a l n e t w o r k[C]ʊI E E E2n d I n t e r n a t i o n a l C o n f e r e n c e o n I n f o r m a t i o n C o mm u n i c a t i o n a n d S i g n a l P r o c e s s i n g(I C I C S P),2019:301-305.[14]王忠民,刘戈,宋辉.基于多核学习特征融合的语音情感识别方法[J].计算机工程,2019,45(8):248-254.WA N G Z M,L I U G,S O N G H.S p e e c h e m o t i o n r e c o g n i t i o n m e t h o d b a s e d o n m u l t i-c o r e l e a r n i n g f e a t u r e f u s i o n[J].C o m p u t e rE n g i n e e r i n g,2019,45(8):248-254.[15] C H A R O E N D E E M,S U C H A T O A,P U N Y A B U K K A N A P.S p e e c h e m o t i o n r e c o g n i t i o n u s i n g d e r i v e d f e a t u r e s f r o m s p e e c hs e g m e n t a n d k e r n e l p r i n c i p a l c o m p o n e n t a n a l y s i s[C]//14t h I n t e r n a t i o n a l J o i n t C o n f e r e n c e o n C o m p u t e r S c i e n c e a n d S o f t-w a r e E n g i n e e r i n g(J C S S E),N a k h o n S i T h a mm a r a t,T h a i l a n d,2017:1-6.(编辑:贾丽红) 887太原理工大学学报第54卷Copyright©博看网. All Rights Reserved.。
【sphinx】中文声学模型训练
【sphinx】中⽂声学模型训练⼀ .使⽤CMUSphinx训练声学模型CMUSphinx⼯具包中⾃带好⼏个⾼质量的声学模型。
美语模型,法语,中⽂模型。
这些模型是经过优化的,为了得到最佳的性能,⼤多数指令交互系统能直接⽤这些模型,甚⾄⼀些⼤词汇量的应⽤也能直接⽤他们。
除此之外,CMUSphinx提供了功能,能适应现存的模型,为了满⾜有些需要更⾼精度的需求。
当你需要使⽤不同的录⾳环境,(⽐如近距离,远离麦克分或者通过通话过程中),这些情况下做适应结果都是不错的,或者当需要转换⼀种⼝⾳,⽐如美语和英语的转换,印度英语的使⽤等。
⾃适应能满⾜这样的要求:那就是你需要在很短的时间内,⽀持⼀门新的语⾔,那么你只需要基于词典做出⼀个声学模型⾳素集到⽬标⾳素集的转换就可。
然⽽,在某些时候,当下的模型并没法⽤。
⽐如⼿写识别中,或者其他语⾔的监测中。
这些情况下,你需要重新训练你⾃⼰的声学模型。
如下教程会指导你如何开始训练。
⼆开始训练训练之前,假设你有充⾜的数据:⽤于单个⼈的指令应⽤,⾄少需要⼀⼩时录⾳,⽤于很多⼈指令应⽤,需要200个录⾳⼈,每⼈5⼩时⽤于单个⼈的听写,需要10⼩时他的录⾳⽤于多个⼈的听写,需要200个说话⼈,每⼈50⼩时的录⾳同时你要有这门语⾔的语⾳学知识,以及你有⾜够的⽐如⼀个⽉的时间,来训练模型⽽如果你没有⾜够的数据,⾜够的时间,⾜够的经验,那么建议你还是做已有模型的⾃适应来满⾜你的要求。
数据准备训练者需要知道,使⽤哪⼀个声⾳单元来学习参数,⾄少每个序列都要在你的训练集中出现。
这个信息储存在transcript file中。
然后通过词典dictionary,其中对于每个单词都有相应的声⾳序列做了映射。
所以,除了语⾳数据,你还需要⼀个transcripts,和两个词典。
⼀个中是每个单词到发⾳的对应表,以及⼀个中是不发⾳的单元的表,记为filler dictionay.训练开始训练之前需要准备如下两个⽬录etcyour_db.dic - Phonetic dictionaryyour_db.phone - Phoneset fileyour_db.lm.DMP - Language model -语⾔模型your_db.filler - List of fillersyour_db_train.fileids - List of files for trainingyour_db_train.transcription - Transcription for trainingyour_db_test.fileids - List of files for testingyour_db_test.transcription - Transcription for testingwavspeaker_1file_1.wav - Recording of speech utterancespeaker_2file_2.wavFileids (your_db_train.fileids and your_db_test.fileids中列出了语⾳数据的⽂件名。
Feature extraction_rule-based-for ENVI EX
Feature Extraction on rule-based for ENVI EX 中文说明背景介绍Feature Extraction是根据空间、光谱和纹理特征从高分辨率的影像中提取出来相关信息或者是多光谱影像中提取相关信息的有效手段。
我们可以一次性提取多个目标地物比如道路、建筑、交通工具、桥梁、河流和田地。
Feature Extraction是被设计来用多种图像数据来进行可优化的、用户友好型和可重复的图像信息提取工作。
这样可以在进程细节上花费更少的时间而对结果分析阶段进行注重的研究。
Feature Extraction是基于对象的方式来进行影像分类。
一种对象是具有特定空间、光谱(亮度和彩色)和纹理的自定义的感兴趣区。
传统的遥感分类技术是基于像元的,也就是说是依据单个像素的光谱特征值来进行分类,这种技术针对高光谱影像很实用,但是对全色影像和多光谱影像去不很理想。
所以,针对高空间分辨率的全色或者多光谱影像,基于对象的分类技术能够依据所被分类的对象有一定的灵活性。
Feature Extraction 工作流程Feature Extraction可以分为几部分:将整幅影像划分为多个像素的集合区---计算每个集合区中的各类特征值---建立对象---依据对象进行分类(基于规则的或者监督分类)或者提取对象地物每一个步骤如果发现不正确或者不恰当,都可以再返回修改和重新设定。
Find object ----- Extract objectExtracting object with Rule-Based Classification基于规则的对象提取基于规则的对象提取过程允许我们依据对象的属性进行规则的建立。
对于很多对象类型,这种基于规则的方法常常优于监督分类。
基于规则的分类是依靠工作者对被提取对象的了解和反复推断来进行的。
比如说,对于道路来说,它们都是长条形的,而建筑物很有可能是矩形的,而植被有高的NDVI指数值,相比来说,树木比草地更有纹理。
语音识别的特征参数提取与研究毕业论文(可编辑)
语音识别的特征参数提取与研究-毕业论文毕业设计题目:基于语音识别的特征参数提取研究专业:电子信息工程技术姓名:学号:指导教师:20 13 年 4 月 1 日毕业设计(论文)评语指导教师评语签字: 20 年月日评阅教师评语签字: 20 年月日毕业设计(论文)答辩记录成绩及评语答辩提问记录记录人: 20 年月日答辩委员会评语成绩:主任签字: 20 年月日桂林航天工业学院电子工程系毕业设计任务书专业:电子信息工程技术年级:2010级姓名学号指导教师(签名)毕业设计题目基于语音识别的特征参数提取研究任务下达日期2012年 11月10 日设计提交期限7>2013年6月10日设计主要内容本毕业论文的主要内容首先是分析语音识别的基本原理及语音识别的方法;然后讨论了语音信号的预处理、端点检测及语音特征参数:Mel倒谱系数和LPC倒谱系数;最后针对MEL频率倒谱系数及LPC倒谱系数的提取进行研究,并对仿真结果进行分析。
主要技术参数指标Mel倒谱系数和LPC倒谱系数的提取方法, 语音信号的预处理、端点检测方法的分析,Matlab仿真。
成果提交形式将论文装订成册,提交全部毕业文档设计进度安排1、课题的准备阶段:(2012年11月-2013年12月)2、课题研究与系统开发阶段:(2013年1月-2013年3月)3、撰写阶段(2013年4月-2013年5月)4、提交论文准备答辩阶段:(2013年5月-2013年6月)教研室意见签名:20 年月日系主任意见签名: 20 年月日桂林航天工业学院电子工程系毕业设计开题报告姓名学号指导教师毕业设计题目基于语音识别的特征参数提取研究同组设计目的意义语音信号处理是一门新兴的边缘学科,它是语音学和数字信号处理两个学科相结合的产物。
它和认知科学、心理学、语言学、计算机科学、模式识别和人工智能等学科有着紧密的联系。
语音信号处理的发展依赖于这些学科的发展,而语音信号处理技术的进步也会促进这些领域的进步。
ECG Feature Extraction Techniques - A Survey
Abstract—ECG Feature Extraction plays a significant role in diagnosing most of the cardiac diseases. One cardiac cycle in an ECG signal consists of the P-QRS-T waves. This feature extraction scheme determines the amplitudes and intervals in the ECG signal for subsequent analysis. The amplitudes and intervals value of P-QRS-T segment determines the functioning of heart of every human. Recently, numerous research and techniques have been developed for analyzing the ECG signal. The proposed schemes were mostly based on Fuzzy Logic Methods, Artificial Neural Networks (ANN), Genetic Algorithm (GA), Support Vector Machines (SVM), and other Signal Analysis techniques. All these techniques and algorithms have their advantages and limitations. This proposed paper discusses various techniques and transformations proposed earlier in literature for extracting feature from an ECG signal. In addition this paper also provides a comparative study of various methods proposed by researchers in extracting the feature from ECG signal.Keywords—Artificial Neural Networks (ANN), Cardiac Cycle, ECG signal, Feature Extraction, Fuzzy Logic, Genetic Algorithm (GA), and Support Vector Machines (SVM).I.INTRODUCTIONThe investigation of the ECG has been extensively used for diagnosing many cardiac diseases. The ECG is a realistic record of the direction and magnitude of the electrical commotion that is generated by depolarization and re-polarization of the atria and ventricles. One cardiac cycle in an ECG signal consists of the P-QRS-T waves. Figure 1 shows a sample ECG signal. The majority of the clinically useful information in the ECG is originated in the intervals and amplitudes defined by its features (characteristic wave peaks and time durations). The improvement of precise and rapid methods for automatic ECG feature extraction is of chief importance, particularly for the examination of long recordings [1].The ECG feature extraction system provides fundamental features (amplitudes and intervals) to be used in subsequent automatic analysis. In recent times, a number of techniques have been proposed to detect these features [2] [3] [4]. The previously proposed method of ECG signal analysis was based on time domain method. But this is not always adequate to study all the features of ECG signals. Therefore the frequency representation of a signal is required. The deviations in the normal electrical patterns indicate various cardiac disorders. Cardiac cells, in the normal state are electrically polarized [5].ECG is essentially responsible for patient monitoring and diagnosis. The extracted feature from the ECG signal plays a vital in diagnosing the cardiac disease. The development of accurate and quick methods for automatic ECG feature extraction is of major importance. Therefore it is necessary that the feature extraction system performs accurately. The purpose of feature extraction is to find as few properties as possible within ECG signal that would allow successfulabnormality detection and efficient prognosis.Figure.1 A Sample ECG Signal showing P-QRS-T WaveIn recent year, several research and algorithm have been developed for the exertion of analyzing and classifying the ECG signal. The classifying method which have been proposed during the last decade and under evaluation includes digital signal analysis, Fuzzy Logic methods, Artificial Neural Network, Hidden Markov Model, Genetic Algorithm, Support Vector Machines, Self-Organizing Map, Bayesian and other method with each approach exhibiting its own advantages and disadvantages. This paper provides an over view on various techniques and transformations used for extracting the feature from ECG signal. In addition the future enhancement gives a general idea for improvement and development of the feature extraction techniques.The remainder of this paper is structured as follows. Section 2 discusses the related work that was earlier proposed in literature for ECG feature extraction. Section 3 gives a general idea of further improvements of the earlier approaches in ECGECG Feature Extraction Techniques - A SurveyApproachS.Karpagachelvi, Dr.M.Arthanari, Prof. & Head, M.Sivakumar,Doctoral Research Scholar, Dept. of Computer Science and Engineering, Doctoral Research Scholar,Mother Teresa Women's University, Tejaa Shakthi Institute of Technology for Women, Anna University – Coimbatore, Kodaikanal, Tamilnadu, India. Coimbatore- 641 659, Tamilnadu, India. Tamilnadu, Indiaemail : karpagachelvis@ email: arthanarimsvc@ email : sivala@feature detection, and Section 4 concludes the paper with fewer discussions.II.L ITERATURE R EVIEWECG feature extraction has been studied from early time and lots of advanced techniques as well as transformations have been proposed for accurate and fast ECG feature extraction. This section of the paper discusses various techniques and transformations proposed earlier in literature for extracting feature from ECG.Zhao et al. [6] proposed a feature extraction method using wavelet transform and support vector machines. The paper presented a new approach to the feature extraction for reliable heart rhythm recognition. The proposed system of classification is comprised of three components including data preprocessing, feature extraction and classification of ECG signals. Two diverse feature extraction methods are applied together to achieve the feature vector of ECG data. The wavelet transform is used to extract the coefficients of the transform as the features of each ECG segment. Concurrently, autoregressive modeling (AR) is also applied to get hold of the temporal structures of ECG waveforms. Then at last the support vector machine (SVM) with Gaussian kernel is used to classify different ECG heart rhythm. The results of computer simulations provided to determine the performance of the proposed approach reached the overall accuracy of 99.68%.A novel approach for ECG feature extraction was put forth by Castro et al. in [7]. Their proposed paper present an algorithm, based on the wavelet transform, for feature extraction from an electrocardiograph (ECG) signal and recognition of abnormal heartbeats. Since wavelet transforms can be localized both in the frequency and time domains. They developed a method for choosing an optimal mother wavelet from a set of orthogonal and bi-orthogonal wavelet filter bank by means of the best correlation with the ECG signal. The foremost step of their approach is to denoise (remove noise) the ECG signal by a soft or hard threshold with limitation of 99.99 reconstructs ability and then each PQRST cycle is decomposed into a coefficients vector by the optimal wavelet function. The coefficients, approximations of the last scale level and the details of the all levels, are used for the ECG analyzed. They divided the coefficients of each cycle into three segments that are related to P-wave, QRS complex, and T-wave. The summation of the values from these segments provided the feature vectors of single cycles. Mahmoodabadi et al. in [1] described an approach for ECG feature extraction which utilizes Daubechies Wavelets transform. They had developed and evaluated an electrocardiogram (ECG) feature extraction system based on the multi-resolution wavelet transform. The ECG signals from Modified Lead II (MLII) were chosen for processing. The wavelet filter with scaling function further intimately similar to the shape of the ECG signal achieved better detection. The foremost step of their approach was to de-noise the ECG signal by removing the equivalent wavelet coefficients at higher scales. Then, QRS complexes are detected and each one complex is used to trace the peaks of the individual waves, including onsets and offsets of the P and T waves which are present in one cardiac cycle. Their experimental results revealed that their proposed approach for ECG feature extraction achieved sensitivity of 99.18% and a positive predictivity of 98%.A Mathematical morphology for ECG feature extraction was proposed by Tadejko and Rakowski in [8]. The primary focus of their work is to evaluate the classification performance of an automatic classifier of the electrocardiogram (ECG) for the detection abnormal beats with new concept of feature extraction stage. The obtained feature sets were based on ECG morphology and RR-intervals. Configuration adopted a well known Kohonen self-organizing maps (SOM) for examination of signal features and clustering.A classifier was developed with SOM and learning vector quantization (LVQ) algorithms using the data from the records recommended by ANSI/AAMI EC57 standard. In addition their work compares two strategies for classification of annotated QRS complexes: based on original ECG morphology features and proposed new approach - based on preprocessed ECG morphology features. The mathematical morphology filtering is used for the preprocessing of ECG signal.Sufi et al. in [9] formulated a new ECG obfuscation method for feature extraction and corruption detection. They present a new ECG obfuscation method, which uses cross correlation based template matching approach to distinguish all ECG features followed by corruption of those features with added noises. It is extremely difficult to reconstruct the obfuscated features without the knowledge of the templates used for feature matching and the noise. Therefore, they considered three templates and three noises for P wave, QRS Complex and T wave comprise the key, which is only 0.4%-0.9% of the original ECG file size. The key distribution among the authorized doctors is efficient and fast because of its small size. To conclude, the experiments carried on with unimaginably high number of noise combinations the security strength of the presented method was very high.Saxena et al in [10] described an approach for effective feature extraction form ECG signals. Their paper deals with an competent composite method which has been developed for data compression, signal retrieval and feature extraction of ECG signals. After signal retrieval from the compressed data, it has been found that the network not only compresses the data, but also improves the quality of retrieved ECG signal with respect to elimination of high-frequency interference present in the original signal. With the implementation of artificial neural network (ANN) the compression ratio increases as the number of ECG cycle increases. Moreover the features extracted by amplitude, slope and duration criteria from the retrieved signal match with the features of the original signal. Their experimental results at every stage are steady and consistent and prove beyond doubt that the composite method can be used for efficient data management and feature extraction of ECG signals in many real-time applications.A feature extraction method using Discrete Wavelet Transform (DWT) was proposed by Emran et al. in [11]. They used a discrete wavelet transform (DWT) to extract the relevant information from the ECG input data in order to perform the classification task. Their proposed work includes the following modules data acquisition, pre-processing beat detection, feature extraction and classification. In the feature extraction module the Wavelet Transform (DWT) is designed to address the problem of non-stationary ECG signals. It was derived from a single generating function called the mother wavelet by translation and dilation operations. Using DWT in feature extraction may lead to an optimal frequency resolution in all frequency ranges as it has a varying window size, broad at lower frequencies, and narrow at higher frequencies. The DWT characterization will deliver the stable features to the morphology variations of the ECG waveforms.Tayel and Bouridy together in [12] put forth a technique for ECG image classification by extracting their feature using wavelet transformation and neural networks. Features are extracted from wavelet decomposition of the ECG images intensity. The obtained ECG features are then further processed using artificial neural networks. The features are: mean, median, maximum, minimum, range, standard deviation, variance, and mean absolute deviation. The introduced ANN was trained by the main features of the 63 ECG images of different diseases. The test results showed that the classification accuracy of the introduced classifier was up to 92%. The extracted features of the ECG signal using wavelet decomposition was effectively utilized by ANN in producing the classification accuracy of 92%.Alan and Nikola in [13] proposed chaos theory that can be successfully applied to ECG feature extraction. They also discussed numerous chaos methods, including phase space and attractors, correlation dimension, spatial filling index, central tendency measure and approximate entropy. They created a new feature extraction environment called ECG chaos extractor to apply the above mentioned chaos methods. A new semi-automatic program for ECG feature extraction has been implemented and is presented in this article. Graphical interface is used to specify ECG files employed in the extraction procedure as well as for method selection and results saving. The program extracts features from ECG files. An algorithm was presented by Chouhan and Mehta in [14] for detection of QRS complexities. The recognition of QRS-complexes forms the origin for more or less all automated ECG analysis algorithms. The presented algorithm utilizes a modified definition of slope, of ECG signal, as the feature for detection of QRS. A succession of transformations of the filtered and baseline drift corrected ECG signal is used for mining of a new modified slope-feature. In the presented algorithm, filtering procedure based on moving averages [15] provides smooth spike-free ECG signal, which is appropriate for slope feature extraction. The foremost step is to extort slope feature from the filtered and drift corrected ECG signal, by processing and transforming it, in such a way that the extracted feature signal is significantly enhanced in QRS region and suppressed in non-QRS region. The proposed method has detection rate and positive predictivity of 98.56% and 99.18% respectively.Xu et al. in [16] described an algorithm using Slope Vector Waveform (SVW) for ECG QRS complex detection and RR interval evaluation. In their proposed method variable stage differentiation is used to achieve the desired slope vectors for feature extraction, and the non-linear amplification is used to get better of the signal-to-noise ratio. The method allows for a fast and accurate search of the R location, QRS complex duration, and RR interval and yields excellent ECG feature extraction results. In order to get QRS durations, the feature extraction rules are needed.A method for automatic extraction of both time interval and morphological features, from the Electrocardiogram (ECG) to classify ECGs into normal and arrhythmic was described by Alexakis et al. in [17]. The method utilized the combination of artificial neural networks (ANN) and Linear Discriminant Analysis (LDA) techniques for feature extraction. Five ECG features namely RR, RTc, T wave amplitude, T wave skew ness, and T wave kurtosis were used in their method. These features are obtained with the assistance of automatic algorithms. The onset and end of the T wave were detected using the tangent method. The three feature combinations used had very analogous performance when considering the average performance metrics.A modified combined wavelet transforms technique was developed by Saxena et al. in [18]. The technique has been developed to analyze multi lead electrocardiogram signals for cardiac disease diagnostics. Two wavelets have been used, i.e.a quadratic spline wavelet (QSWT) for QRS detection and the Daubechies six coefficient (DU6) wavelet for P and T detection. A procedure has been evolved using electrocardiogram parameters with a point scoring system for diagnosis of various cardiac diseases. The consistency and reliability of the identified and measured parameters were confirmed when both the diagnostic criteria gave the same results. Table 1 shows the comparison of different ECG signal feature extraction techniques.A robust ECG feature extraction scheme was put forth by Olvera in [19]. The proposed method utilizes a matched filter to detect different signal features on a human heart electrocardiogram signal. The detection of the ST segment, which is a precursor of possible cardiac problems, was more difficult to extract using the matched filter due to noise and amplitude variability. By improving on the methods used; using a different form of the matched filter and better threshold detection, the matched filter ECG feature extraction could be made more successful. The detection of different features in the ECG waveform was much harder than anticipated but it was not due to the implementation of the matched filter. The more complex part was creating the revealing method to remove the feature of interest in each ECG signal.Jen et al. in [20] formulated an approach using neural networks for determining the features of ECG signal. They presented an integrated system for ECG diagnosis. The integrated system comprised of cepstrum coefficient method for feature extraction from long-term ECG signals and artificial neural network (ANN) models for the classification. Utilizing the proposed method, one can identify the characteristics hiding inside an ECG signal and then classify the signal as well as diagnose the abnormalities. To explore the performance of the proposed method various types of ECG data from the MIT/BIH database were used for verification. The experimental results showed that the accuracy of diagnosing cardiac disease was above 97.5%. In addition the proposed method successfully extracted the corresponding feature vectors, distinguished the difference and classified ECG signals.Correlation analysis for abnormal ECG signal feature extraction was explained by Ramli and Ahmad in [21]. Their proposed work investigated the technique to extract the important features from the 12 lead system (electrocardiogram) ECG signals. They chose II for their entire analysis due to its representative characteristics for identifying the common heart diseases. The analysis technique chosen is the cross-correlation analysis. Cross-correlation analysis measures the similarity between the two signals and extracts the information present in the signals. Their test results suggested that the proposed technique could effectively extract features, which differentiate between the types of heart diseases analyzed and also for normal heart signal.Ubeyli et al. in [22] described an approach for feature extraction from ECG signal. They developed an automated diagnostic systems employing dissimilar and amalgamated features for electrocardiogram (ECG) signals were analyzed and their accuracies were determined. The classification accuracies of mixture of experts (ME) trained on composite features and modified mixture of experts (MME) trained on diverse features were also compared in their work. The inputs of these automated diagnostic systems were composed of diverse or composite features and these were chosen based on the network structures. The achieved accuracy rates of their proposed approach were higher than that of the ME trained on composite features.Fatemian et al. [25] proposed an approach for ECG feature extraction. They suggested a new wavelet based framework for automatic analysis of single lead electrocardiogram (ECG) for application in human recognition. Their system utilized a robust preprocessing stage, which enables it to handle noise and outliers. This facilitates it to be directly applied on the raw ECG signal. In addition the proposed system is capable of managing ECGs regardless of the heart rate (HR) which renders making presumptions on the individual's stress level unnecessary. The substantial reduction of the template gallery size decreases the storage requirements of the system appreciably. Additionally, the categorization process is speeded up by eliminating the need for dimensionality reduction techniques such as PCA or LDA. Their experimental results revealed the fact that the proposed technique out performed other conventional methods of ECG feature extraction.III.F UTURE E NHANCEMENTThe electrocardiogram (ECG) is a noninvasive and the record of variation of the bio-potential signal of the human heartbeats. The ECG detection which shows the information of the heart and cardiovascular condition is essential to enhance the patient living quality and appropriate treatment. The ECG features can be extracted in time domain [23] or in frequency domain [24]. The extracted feature from the ECG signal plays a vital in diagnosing the cardiac disease. The development of accurate and quick methods for automatic ECG feature extraction is of major importance. Some of the features extraction methods implemented in previous research includes Discrete Wavelet Transform, Karhunen-Loeve Transform, Hermitian Basis and other methods. Every method has its own advantages and limitations. The future work primarily focus on feature extraction from an ECG signal using more statistical data. In addition the future enhancement eye on utilizing different transformation technique that provides higher accuracy in feature extraction. The parameters that must be considered while developing an algorithm for feature extraction of an ECG signal are simplicity of the algorithm and the accuracy of the algorithm in providing the best results in feature extraction.Table I. Comparison of Different Feature Extraction Techniques from an ECG Signal where H, M, L denotes High, Medium and Low respectivelyApproach Simplicity Accuracy PredictivityZhao et al. H H HMahmoodabadiet al.M H H Tadejko andRakowskiL M M Tayel andBouridyM M HJen et al.H H HAlexakis et al.H M MRamli andAhmadM M MXu et al.M H HOlveraH M MEmran et alH M LIV.C ONCLUSIONThe examination of the ECG has been comprehensively used for diagnosing many cardiac diseases. Various techniques and transformations have been proposed earlier in literature for extracting feature from ECG. This proposed paper provides an over view of various ECG feature extraction techniques and algorithms proposed in literature. The featureextraction technique or algorithm developed for ECG must be highly accurate and should ensure fast extraction of features from the ECG signal. This proposed paper also revealed a comparative table evaluating the performance of different algorithms that were proposed earlier for ECG signal feature extraction. The future work mainly concentrates on developing an algorithm for accurate and fast feature extraction. Moreover additional statistical data will be utilized for evaluating the performance of an algorithm in ECG signal feature detection. Improving the accuracy of diagnosing the cardiac disease at the earliest is necessary in the case of patient monitoring system. Therefore our future work also has an eye on improvement in diagnosing the cardiac disease.R EFERENCES[1]S. Z. Mahmoodabadi, A. Ahmadian, and M. D. Abolhasani, “ECGFeature Extraction using Daubechies Wavelets,” Proceedings of the fifth IASTED International conference on Visualization, Imaging and Image Processing, pp. 343-348, 2005.[2]Juan Pablo Martínez, Rute Almeida, Salvador Olmos, Ana Paula Rocha,and Pablo Laguna, “A Wavelet-Based ECG Delineator: Evaluation on Standard Databases,” IEEE Transactions on Biomedical Engineering Vol. 51, No. 4, pp. 570-581, 2004.[3]Krishna Prasad and J. S. Sahambi, “Classification of ECG Arrhythmiasusing Multi-Resolution Analysis and Neural Networks,” IEEE Transactions on Biomedical Engineering, vol. 1, pp. 227-231, 2003. [4]Cuiwei Li, Chongxun Zheng, and Changfeng Tai, “Detection of ECGCharacteristic Points using Wavelet Transforms,” IEEE Transactions on Biomedical Engineering, Vol. 42, No. 1, pp. 21-28, 1995.[5] C. Saritha, V. Sukanya, and Y. Narasimha Murthy, “ECG SignalAnalysis Using Wavelet Transforms,” Bulgarian Journal of Physics, vol. 35, pp. 68-77, 2008.[6]Qibin Zhao, and Liqing Zhan, “ECG Feature Extraction andClassification Using Wavelet Transform and Support Vector Machines,” International Conference on Neural Networks and Brain, ICNN&B ’05, vol. 2, pp. 1089-1092, 2005.[7] B. Castro, D. Kogan, and A. B. Geva, “ECG feature extraction usingoptimal mother wavelet,” The 21st IEEE Convention of the Electrical and Electronic Engineers in Israel, pp. 346-350, 2000.[8]P. Tadejko, and W. Rakowski, “Mathematical Morphology Based ECGFeature Extraction for the Purpose of Heartbeat Classification,” 6th International Conference on Computer Information Systems and Industrial Management Applications, CISIM '07, pp. 322-327, 2007. [9] F. Sufi, S. Mahmoud, I. Khalil, “A new ECG obfuscation method: Ajoint feature extraction & corruption approach,” International Conference on Information Technology and Applications in Biomedicine, 2008. ITAB 2008, pp. 334-337, May 2008.[10]S. C. Saxena, A. Sharma, and S. C. Chaudhary, “Data compression andfeature extraction of ECG signals,” International Journal of Systems Science, vol. 28, no. 5, pp. 483-498, 1997.[11]Emran M. Tamil, Nor Hafeezah Kamarudin, Rosli Salleh, M. YamaniIdna Idris, Noorzaily M.Noor, and Azmi Mohd Tamil, “Heartbeat Electrocardiogram (ECG) Signal Feature Extraction Using Discrete Wavelet Transforms (DWT).”[12]Mazhar B. Tayel, and Mohamed E. El-Bouridy, “ECG ImagesClassification Using Feature Extraction Based On Wavelet Transformation And Neural Network,” ICGST, International Conference on AIML, June 2006.[13]Alan Jovic, and Nikola Bogunovic, “Feature Extraction for ECG Time-Series Mining based on Chaos Theory,” Proceedings of 29thInternational Conference on Information Technology Interfaces, 2007. [14]V. S. Chouhan, and S. S. Mehta, “Detection of QRS Complexes in 12-lead ECG using Adaptive Quantized Threshold,” IJCSNS International Journal of Computer Science and Network Security, vol. 8, no. 1, 2008.[15]V. S. Chouhan, and S. S. Mehta, “Total Removal of Baseline Drift fromECG Signal”, Proceedings of International conference on Computing: Theory and Applications, ICTTA–07, pp. 512-515, ISI, March, 2007. [16]Xiaomin Xu, and Ying Liu, “ECG QRS Complex Detection Using SlopeVector Waveform (SVW) Algorithm,” Proceedings of the 26th Annual International Conference of the IEEE EMBS, pp. 3597-3600, 2004. [17] C. Alexakis, H. O. Nyongesa, R. Saatchi, N. D. Harris, C. Davies, C.Emery, R. H. Ireland, and S. R. Heller, “Feature Extraction and Classification of Electrocardiogram (ECG) Signals Related to Hypoglycaemia,” Conference on computers in Cardiology, pp. 537-540, IEEE, 2003.[18]S. C. Saxena, V. Kumar, and S. T. Hamde, “Feature extraction fromECG signals using wavelet transforms for disease diagnostics,”International Journal of Systems Science, vol. 33, no. 13, pp. 1073-1085, 2002.[19]Felipe E. Olvera, “Electrocardiogram Waveform Feature ExtractionUsing the Matched Filter,” 2006.[20]Kuo-Kuang Jen, and Yean-Ren Hwang, “ECG Feature Extraction andClassification Using Cepstrum and Neural Networks,” Journal of Medical and Biological Engineering, vol. 28, no. 1, 2008.[21] A. B. Ramli, and P. A. Ahmad, “Correlation analysis for abnormal ECGsignal features extraction,” 4th National Conference on Telecommunication Technology, 2003. NCTT 2003 Proceedings, pp.232-237, 2003.[22]Ubeyli, and Elif Derya, “Feature extraction for analysis of ECG signals,”Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE, pp. 1080-1083, 2008. [23]Y. H. Hu, S. Palreddy, and W. Tompkins, “A Patient Adaptable ECGBeat Classifier Using A Mixture Of Experts Approach”, IEEE Transactions on Biomedical Engineering vol. 44, pp. 891–900, 1997. [24]Costas Papaloukas, Dimitrios I. Fotiadis, Aristidis Likas, Lampros K.Michalis, “Automated Methods for Ischemia Detection in Long Duration ECGs”, 2003.[25]S. Z. Fatemian, and D. Hatzinakos, “A new ECG feature extractor forbiometric recognition,” 16th International Conference on Digital Signal Processing, pp. 1-6, 2009.AUTHORS PROFILEKarpagachelvi.S: She received the BSc degree inphysics from Bharathiar University in 1993 andMasters in Computer Applications from MadrasUniversity in 1996. She has 12 years of teachingexperience. She is currently a PhD student with theDepartment of Computer Science at Mother TeresaUniversity.Dr.M.Arthanari: He has obtained Doctorate inMathematics in Madras University in the year 1981.He has 35 years of teaching experience and 25 yearsof research experience. He has a Patent in ComputerScience approved by Govt. of India.Sivakumar M : He has 10+ years of experience in thesoftware industry including Oracle Corporation. Hereceived his Bachelor degree in Physics and Mastersin Computer Applications from the BharathiarUniversity, India. He holds patent for the inventionin embedded technology. He is technically certifiedby various professional bodies like ITIL, IBMRational Clearcase Administrator, OCP - OracleCertified Professional 10g and ISTQB.。
斯坦福大学计算机视觉课件
Fall 2004
Pattern Recognition for Vision
Applications—Morphable Models 2D Morphable Models for Facial Animation “Air”
Video Database
“Badge”
Visual Speech Processing
Pattern Recognition—Classifiers Fisher Discriminant Analysis
x2
(m1 − m2 ) 2 Maximize σ 12 + σ 2 2
x1
Fall 2004 Pattern Recognition for Vision
Pattern Recognition—Classifiers Linear Support Vector Machines
−∞
∫
Template g Image f
Fall 2004
Result c
Pattern Recognition for Vision
Vision—Feature Extraction I Windowed Fourier Transform
∞
% (ω , t ) = f
g (u − t )
−∞
Vision—Feature Extraction I
∞
Template Matching
c( x) =
c( x) =
−∞ ∞
∫
f ( x ' ) g ( x ' − x)dx '
f ( x ' )h( x − x ' )dx '
Feature Extraction based on Sparsity embedding with manifold information for face recognition
Feature Extraction based on Sparsity Embedding with Manifold Information for Face Recognition Shi-Qiang Gao1, Xiao-Yuan Jing1,2,*, Chao Lan1, Yong-Fang Yao1, Zai-Juan Sui11School of Automation, Nanjing University of Posts and Telecommunications, Nanjing, 210003, China2State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China*Corresponding author, Email: jingxy_2000@Abstract—In the past few years, manifold learning and sparse representation have been widely used for feature extraction and dimensionality reduction. The sparse representation technique shows that one sample can be linearly recovered by the others in a data set. Based on this, sparsity preserving projections (SPP) has recently been proposed, which simply minimizes the sparse reconstructive errors among training samples in order to preserve the sparse reconstructive relations among data. However, SPP does not investigate the inherent manifold structure of the data set, which may be helpful for the recognition task. Motivated by this, we present in this paper a novel feature extraction approach named sparsity embedding with manifold information (SEMI), which not only preserves the sparse reconstructive relations, but also maintains the manifold structure of the reconstructed data. Specifically, for a sparse reconstructed sample, we minimize both its difference to the corresponding original sample as SPP does, and its distance to the original intra-class samples. Provided that this sample lies on different submanifolds from other samples, we additionally maximize, in the objective function, its distance to the original inter-class samples. Experimental results on two public ORL and AR face databases demonstrate that SEMI outperforms related methods in classification performance.Keywords-sparse representation; suprivised manifold learning; feature extraction; dimensionality reduction; face recognitionI.I NTRODUCTIONFeature extraction has been widely used for face and palm recognition [1][2]. As a feature extraction and dimensionality reduction technique, manifold learning has attracted more and more research interest. It tends to preserve the manifold structure of a given data set in a low-dimensional subspace. Traditional manifold learning algorithms only define nonlinear mapping on training data, and cannot deal with new test data. Linear manifold embedding methods are developed to address this issue, such as locality preserving projection (LPP) [3] and neighborhood preserving embedding (NPE) [4]. LPP seeks a linear embedded space where local relationships of samples are preserved, while NPE seeks an embedded space where reconstructive relations of samples by their k nearest neighbors can be preserved. While training, all abovementioned manifold learning methods do not consider the class label information, which has been used to extract face features in [5].To take advantage of the class separability, some supervised manifold learning methods have been proposed. Local discriminant embedding (LDE) maintains the intrinsic neighbor relations of the intra-class samples by setting the affinity weights [6]. Marginal fisher analysis (MFA) constructs the intra-class compactness and inter-class separability graphs by using the class label information [7]. As described in [8][9], the discriminant information can be extracted to enhance the classification performance.As another technique of pattern recognition, sparse representation shows that one sample can be recovered by the others. Recently, a new sparse representation method, namely sparsity preserving projections (SPP), has been proposed, which tries to preserve the sparse reconstructive relations among data by minimizing sparse reconstructive errors [10]. However, in real-world application like face recognition, face images often lie on a manifold structure, and preserving the manifold structure may enhance the recognition performance.Motivated by this, we proposed a novel feature extraction method named sparsity embedding with manifold information (SEMI), which aims at preserving both the sparse reconstructive relations and the manifold structure of the data set. Explicitly, to preserve the sparse relations as SPP does, we minimize the difference between original data and their corresponding reconstructive samples. Then, presuming that a reconstructed sample is close not only to the corresponding original sample but also to its neighbors, SEMI minimizes the distance between the original data and the reconstructed intra-class samples. Furthermore, providing that samples from different classes lie on separate submanifolds, SEMI also maximizes the distance between original data and reconstructed inter-class samples. By this means, SEMI not only preserves the sparse reconstructive relations but also the manifold structure of the data set.The rest of this paper is organized as follows: in Section 2, we briefly review two feature extraction methods, i.e., LDE and SPP. In Section 3, we present the sparsity embedding with manifold information (SEMI) approach. Experimental results on the public ORL and AR face databases are provided in Section 4 and conclusions are given in Section 5.II.R ELEATED W ORKA.Local Discriminant Embedding (LDE)LDE tries to not only preserve the neighbor relations of intra-class samples, but also enhance the separability between inter-class neighbor samples.Suppose a sample set12n[,,...,]n mX x x x R×=∈ and let W and W′ be the affinity matrices which are defined as follows:2exp(/), if ()()0 , otherwisei j k k ij x x t i N j or j N i W ++⎧−−∈∈⎪=⎨⎪⎩, (1)and 2exp(/), if ()()0 , otherwisei j k k ij x x t i N j or j N i W −−⎧−−∈∈⎪′=⎨⎪⎩, (2) where ()k i N j +∈ indicates sample i xbelongs to the k nearest neighbors of the intra-class sample j x , and ()k i N j −∈ indicates sample i x belongs to the k nearest neighbors of the inter-class sample j x .Then, the objective function of LDE can take the form:221111..1n nn nTTTTi j ij i j ij i j i j maxvx v x W s tvx v x W ====′−−=∑∑∑∑ (3)By using Lagrange Multipliers to solve (3), the projectivevectors v are the generalized eigenvectors corresponding to the l largest eigenvalues in1[()]()T T X D W X X D W X v v λ−′′−−⋅=⋅, (4)where {}ii n n D diag D ×= and {}ii n n D diag D ×′′= are two diagonal matrices, 1nii ij j D W ==∑and 1niiij j D W =′′=∑. B. Sparsity Preserving Projections(SPP)SPP aims to seek an embedded space where the sparse reconstructive relations among data can be preserved.By solving an L1 norm problem, SPP method first calculate the sparse matrix 12[,,]n A a a a =" as mentioned in section 3,where n i a R ∈is the sparse reconstructive coefficient vector. Then it extracts data feature by minimizing the reconstructive errors between original samples and the corresponding reconstructive data. Hence, SPP seeks the projective vector v by:21arg min nT T i ivi v v x v Xa ==−∑. (5)Equation (5) can be transformed to an equivalent maximization problem as follows:maxTTT T v XA X v v v XX vβ=, (6)where T T A A A A A β=+−.We employ the Lagrange Multipliers to solve (6), and thesolution is the eigenvector of matrix 1()T T XX XA X β− associated with the largest l eigenvalue.III. S PARSITY E MBEDDING WITH M ANIFOLD I NFORMATION In this section, we first give the calculation of the sparse representation of the data. Then, we present the novel feature extraction approach, namely, SEMI.A. Sparse Representation of the DataLet 12[,,...,]n X x x x = be the overall data set of size n , and i x be the th i original sample. Each sample i x can be linearly recovered by the rest 1n − samples in (7).11220i i i i in n x a x a x x a x =+++⋅++"", (7)where ij a indicates the reconstructive coefficient associated with the th j sample for the th i reconstructed sample.Hence, the overall reconstructed sample set X ′ can becalculated by:X XA ′=, (8)where 12[,,]n A a a a =" and 12[,,,]i i i in a a a a =" is the coefficient set associated with i x .Then, the sparse representation technique calculates i a by:0min ..aa s t x Xa ′=, (9)where 0• denotes the L0 norm of a vector, i.e., counting itsnon-zero components.Solving (9) is NP-hard, and reference [11] shows that if the solution is sparse enough, we can calculate it by solving a corresponding L1 norm problem as:1min ..aa s t x Xa ′=. (10)It has been shown that formula (10) can be solved by standard linear programming algorithms [12]. B. Description of SEMIThe sparsity preserving projection (SPP) method merely preserves the sparse reconstructive relations among data, but does not study the manifold structure of the data set. Hence, it is worthwhile to investigate both sparse reconstructive and manifold structure of the data set simultaneously.Suppose that 12[,,,]n m n X x x x R ×=∈" is the overall datamatrix, i yis the class label of i x , and 12[,,]n A a a a =" is the sparse matrix previously obtained.Then, we separately compute the inter-class weight matrixbb ij n n H H ×⎡⎤=⎣⎦ and intra-class weight matrix w w ij n n H H ×⎡⎤=⎣⎦, which are defined as follows:2exp(/), if 0 , otherwise b i j i j ijx x t y y H ⎧−−≠⎪=⎨⎪⎩, (11) and 2exp(/), if & 0 , otherwisew i ji j ij x x t y y i jH ⎧−−=≠⎪=⎨⎪⎩, (12)where t is the parameter and is usually set as the variance of the overall sample set.Based on (11) and (12), we can evaluate the total distance b S between original data and reconstructed inter-class samples,and the total distance w Sbetween original data and reconstructed intra-class samples as follows:211nnT T bb j i iji j S v x v Xa H ===−∑∑, (13)and 211nn T T w w j i iji j S v x v Xa H ===−∑∑. (14) The total reconstructive error Ecaused by original data and corresponding reconstructed samples, is defined in SPP as:21nT T i ii E v x v Xa ==−∑. (15)Equations (14) and (15) can be written as a unified form as:211n nT T j i iji j S v x v Xa H ===−∑∑. (16)where 2exp(/), if 0 , otherwisei j i j ij x x t y yH ⎧−−=⎪=⎨⎪⎩. (17) Based on (13) and (16), we try to seek a optimization b Scriterion, which can maximize and simultaneously minimize S , as the objective function of SEMI:211211n nT Tb j i iji j n nT T j iiji j v x v Xa HMaxv x v Xa H ====−−∑∑∑∑. (18)The numerator can be written in a matrix form as:211111111()()()() ()()()n nn nT T b T T bj i ij j i j i iji j i j n nnnT b T T b Tj ij j i ij i j i i j T T T T b b T T T T b b b b v x v Xa H v x Xa x Xa vH v x H x v v Xa H Xa vv XH XA v v XA H X v v X D AD A H A AH X v========−=−−=+−−=+−−∑∑∑∑∑∑∑∑, (19)and the denominator can be written in a matrix form as:211()n nT T T T T T j iij i j v x v Xa H v X D ADA HA AH X v==−=+−−∑∑, (20)where ij n n H H ×⎡⎤=⎣⎦.Hence, our optimization problem can be formulated as:()arg max()TT T T T b b b b T T T vv X D AD A H A AH X v v v X D ADA HA AH X v+−−=+−−, (21)where v denotes the optimal transform vector, {}b b ii n n D diag D ×= and {}ii n n D diag D ×= are two diagonal matrices of size n ,1nb b ii ijj D H ==∑ and 1nii ij j D H ==∑.For convenience, we defined two matrices as:T T b b b b b L D AD A H A AH =+−−, (22) and T T L D ADA HA AH =+−−. (23) Then, by using the Lagrange Multipliers, equation (21) can be reformulated into a generalized eigen-problem:1[][]T T b XLX XL X v v λ−⋅=⋅, (24)and solution vis the corresponding eigenvector associated with the largest eigenvalue. The algorithmic procedure of SEMI is summarized in Fig. 1. For simplicity, we assume that the overall sparse reconstructive coefficient matrix A has been obtained. IV. E XPERMENTIn this section, firstly, we introduce ORL and AR face databases used in our experiments. Secondly, we compare the recognition performance of the proposed SEMI with related methods, including unsupervised manifold learning methods PCA, LPP, NPE, supervised manifold learning methods LDE, MFA, and sparsity preserving method SPP.A. Introduction of DatabasesThe ORL face database contains 40 individuals with 10 images per person, which are taken under different positions, lighting conditions and facial expressions. Demo images of one subject are shown in Fig.2. In our experiment, we crop each image to size 4656× and rescale its gray level to [0 1]. We in turn choose the first 2-5 representative images per subject fortraining and the rest for testingFigure 2. Demo images of one subject in ORL databaseThe AR face database contains 119 individuals, each 26images with size 6060×. We rescale its gray level to [0 1] and all image samples of one subject are shown in Fig. 3. The major differences between them are the expression, illumination, position, pose and sampling time. In order to effectively evaluate the impact of different variations to the recognition results, we in turn choose following 2-10 representative images of every subject as the training samples: (1), (14), (2), (5), (8), (11), (17), (19), (23) and (25). And theremainders are chosen as the testing samples.Figure 3. Demo images of one subject in AR databaseB. Experimental ResultsTo testify the effectiveness of SEMI, we compare itsrecognition rate with related methods. The number of the nearest neighbors (i.e., k ) is determined by the values that can yield the best recognition results. In all compared methods, we first perform PCA on the data to reduce the dimension andavoid the singularity problem of the inverse matrix. Recognition rates versus different number of training samples on ORL face database are shown in Fig. 4, and the average results are given in Table 1. Fig. 4 shows that SEMI has the highest recognition rates in all cases. As shown in Table 1, the average rate of SEMI is higher than all compared methods by at least 6.26% (=89.12%-82.86%).Figure 4. Recognition rates versus different training on ORL.TABLE I. A VERAGE RECOGNITION RATES (%) ON ORLMethod PCA LPP NPE LDE MFA SPP SEMI Result 80.36 80.50 80.79 82.43 82.86 80.37 89.12Fig. 5 shows the recognition rates versus different number of training samples on AR database, and Table 2 gives the average results. From Fig. 5, we can see that SEMI generally achieves the highest recognition rate. Table 2 demonstrates that SEMI boosts the average recognition rate by at least 3.25%Figure 5. Recognition rates versus different training on AR.TABLE II. A VERAGE RECOGNITION RATES (%) ON ARMethod PCA LPP NPE LDE MFA SPP SEMI Result 77.89 78.64 80.47 80.18 80.43 80.49 83.74V.C ONCLUSIONSIn this paper, we propose a novel feature extraction method named sparsity embedding with manifold information (SEMI), which aims at preserving both the sparse reconstructive relations and the manifold structure of the data set. Firstly, SEMI reconstructs each sample by a sparse superposition of the rest samples, and obtains the reconstructed samples. Then, as SPP does, SEMI minimizes the difference between the reconstructed samples and the corresponding original data. In addition, in order to maintain the manifold structure of the data set, for each reconstructed sample, SEMI minimizes its distance to the original samples of the same class, and simultaneously maximizes its distance to the original samples of different classes. By this means, SEMI is not only able to preserve the sparse reconstructive relations among data, but also to maintain the manifold structure in the embedded space. Experimental results on ORL and AR face databases show that, in contrast with related methods including PCA, LPP, NPE, MFA, LDE, and SPP, the proposed SEMI achieves the highest classification performance.A CKNOWLEDGMENTThe work described in this paper was fully supported by the NSFC under Project No.60772059, the New Century Excellent Talents of Education Ministry under Project No. NCET-09-0162, the Doctoral Foundation of Education Ministry under Project No. 20093223110001, the Qin-Lan Engineering Academic Leader of Jiangsu Province, the Foundation of Jiangsu Province Universities under Project No.09KJB510011, the Foundation of NJUPT under Project No. NY207027, Project No.NY208051.R EFERENCES[1]Y. F. Yao, X.Y. Jing, and H. S. Wong, “Face and palmprint feature levelfusion for single sample biometrics recognition,” Neurocomputing, 70(7), pp. 1582-1586, 2007.[2]X. Y. Jing, C. Lu, and D. Zhang, “An uncorrelated Fisherface approachfor face and palmprint recognition,” Springer Verlag, LNCS 3832, pp.682-687, 2006.[3]X. He, and P. Niyogi, “Locality preserving projections,” Proc. Conf.Advances in Neural Information Processing Systems (NIPS), 2003.[4]X. He, D. Cai, S. Yan, and H. Zhang, “Neighborhood preservingembedding,” Proc. Conf. International Conference on Computer Vision (ICCV), pp. 1208-1213, 2005.[5]S. Li, Y. F. Yao, X. Y. Jing, H. Chang, S. Q. Gao, D. Zhang, and J. Y.Yang, “Face recognition based on nonlinear DCT discriminant feature extraction using improved kernel DCV”, IEICE Trans. Information and Systems. E92-D(12), pp. 2527-2530, 2010.[6]H. T. Chen, H. W. Chang, and T. L. Liu, “Local DiscriminantEmbedding and Its Variants,” Proc. Conf. Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 846-853, 2005.[7]S. C. Yan, D. Xu, B. Y. Zhang, H. J. Zhang, Q. Yang, and S. Lin,“Graph Embedding and Extensions: A General Framework for Dimensionality Reduction,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 29, pp. 40-51, 2007.[8]X. Y. Jing, D. Zhang, Y. F. Yao, and M. Li, “A novel face recognitionapproach based on kernel discriminative common vectors (KDCV) feature extraction and RBF neural network,” Neurocomputing, 71(13), pp. 3044-3048, 2008.[9]S. Li, X. Y. Jing, L. S. Bian, S. Q. Gao, Q. Liu, and Y. F. Yao, “Facialimage recognition based on a statistical uncorrelated near clear discriminant approach,” IEICE Trans. Information and Systems. E93-D(4), pp. 934-937, 2010.[10]L. Qiao, S. Chen and X. Tan, “Sparsity preserving projections withapplications to face recognition,” Pattern Recognition, vol. 43, pp. 331-341, 2010.[11] D. Donoho, “Compressed sensing,” IEEE Trans. Information Theory,vol. 52, pp. 1289-1306, 2006.[12]S. Chen, D. Donoho and M. Saunders, “Atomic decomposition by basispursuit,” SIAM Review, vol. 43, pp. 129-159, 2001.。
Hierachical feature extraction(层次特征提取)
Conclusion
build a pyramid and combine two different descriptors For each scene class, multiple descriptors are mapped into respective vocabulary and connected into a final one make use of a spatial pyramid model SVM with a histogram intersection kernel
Hierarchical Feature Extraction
Scale
images in different scales
Hierarchical Feature Extraction
Input image GIST descriptors
Multiple descriptors
Grey-scale map
Thank You !
Hierarchical Feature Extraction Scheme with Special Vocabulary Generation for Natural Scene Classification
Luo, Tian July 20, 2014
Luo, Tian July 20, 2014
Layout
• Introduction • Related work • Methodology
Hierarchical Feature Extraction
Special Vocabulary Generation
Spatial Pyramid Matching
Feature Extraction (or Selection):特征提取(或选择)-精品文档
Mapping from image space to feature space…
Benign ROIs
0
X
7
Parzen Density Estimation
0 X
Histogram Bins
Parzen Density Est.
bad estimation with limited data available!
X1 X2 maximize XN C
S Selected
Generalized Divergence:
{X1, X2, X3,…, X8}
β=0 P ( X , Cancerous ) J E P ( X , Cancerous ) P ( X , Benign ) log β = 0.5 x P ( X , Benign )
If the features are “biased” towards a class, J is large. A good set of features should have small J.
21
Results: J with respect to β
First feature selected: GLDM ASM Second feature selected: …
17
Mutual Information Based Feature Selection (MIFS)
1. 2.
Select first feature with highest I(C;X). Select next feature with highest:
深度剖析医学图像处理课件教程
The Im portance of M edical Im ag e P ro c es s in g
深入探讨医学图像处理在现代医学中的重要性,讨论其对疾病的早期检测、预防和治疗的潜在影响。
M edical Im ag e Acquisition Technolog ies
介绍和比较常用的医学图像获取技术,包括X射线、CT扫描、核磁共振等,并 讨论它们在不同疾病诊断中的应用。
介绍用于医学图像分割的算法和技术,包括基于阈值、区域生长、图割和神 经网络的方法,以提取感兴趣的结构和病变。
Feature Extraction for M edical Im ag es
讨论医学图像中的特征提取方法,包括形态学分析、纹理分析和形状分析等,用于自动化疾病诊断和定量分析。
Im ag e Reg istration Techniques for M edical Im ag es
深度剖析医学图像处理课 件教程
通过深入挖掘医学图像处理的原理和应用,本课程将方法。
Introduction to M edical Im ag e Processing
了解医学图像处理的基本概念和原理,探索如何将数字图像处理技术应用于 医学领域,以提高医学诊断和治疗的效果。
Im ag e Enhancem ent Techniques for M edical Im ag es
探索应用于医学图像的图像增强技术,包括噪声去除、对比度增强和边缘增 强等方法,以改善图像质量和可视化效果。
Seg m entation Techniques for M edical Im ag es
3 Feature Extraction I
Fall 2004
Pattern Recognition for Vision
General Remarks—why talk about FT, WFT, WT?
Fourier Transform(FT), Windowed FT (WFT) and Wavelet Transform (WT) •used in many computer vision applications
0.5
−T / 2 j 2π kt T
∫
f (t )e
− j 2π kt T
dt = T δ kl
2
0
-0.5
, sk , sl
T ∞
T
-1 -15
T f (t ) periodic with period T
ck = sk , f
, f (t ) T
1 ∞ = ∑ ck T k =−∞
2
f (t ) ∈ L2 ([ −T / 2, T / 2])
u, v ≡ ∑ un vn
n =1
N
u ≡ u, u
2 L2
2
L norm: u
f ( n), g ( n) ≡ ∑ f (n) g (n)
−∞ ∞
2
= ∑ un
n
2
f (n) ≡ f (n), f (n) f (t ) ≡ f (t ), f (t )
2
2
f (t ), g (t ) ≡
∞
−∞
∫
∞
−∞
∫
ˆg ˆ dω f
ˆ, g ˆ f,g = f
Fall 2004 Pattern Recognition for Vision
基于Mu_Beta节律想象运动脑电信号特征的提取
中国组织工程研究与临床康复第 14 卷 第 43 期 2010–10–22 出版October 22, 2010 Vol.14, No.43Journal of Clinical Rehabilitative Tissue Engineering Research基于Mu/Beta节律想象运动脑电信号特征的提取*★黄思娟,吴效明Feature extraction of electroencephalogram for imagery movement based on Mu/Beta rhythmHuang Si-juan, Wu Xiao-mingAbstractBACKGROUND: Different sports produce different electroencephalogram (EEG) signals. Brain-computer interface (BCI) utilized characteristics of EEG to communicate brain and external device by modern signal processing technique and external connections. The speed of EEG signals processing is important for BCI online research. OBJECTIVE: To investigate a rapid and accurate method for extracting and classifying EEG for imagery movement. METHODS: Using the attribute of event-related synchronization and event-related desynchronization during imagery movement, the BCI dataset of 2003 was processed. Mu/Beta rhythm was obtained from bandpass filtering and wavelet package analysis. Then feature was formed by the average energy of lead C3, C4, and was sorted out by the function classify of matlab. RESULTS AND CONCLUSION: Appropriate parameters were obtained by detection of training data and used for identification of training data and testing data, with a correct rate of classification of 87.857% and 88.571%. Huang SJ, Wu XM. Feature extraction of electroencephalogram for imagery movement based on Mu/Beta rhythm.Zhongguo Zuzhi Gongcheng Yanjiu yu Linchuang Kangfu. 2010;14(43): 8061-8064. [ ]School of Bioscience and Bioengineer, South China University of Technology, Guangzhou 510006, Guangdong Province, China Huang Si-juan★, Studying for master’s degree, School of Bioscience and Bioengineer, South China University of Technology, Guangzhou 510006, Guangdong Province, China huangsijuan123@ Correspondence to: Wu Xiao-ming, Doctoral supervisor, School of Bioscience and Bioengineer, South China University of Technology, Guangzhou 510006, Guangdong Province, China bmxmwus@scut. Supported by: the Science and Technology Development Program of Guangdong Province, No. 2009B030801004* Received: 2010-05-17 Accepted: 2010-07-13摘要背景:不同的运动会产生不同的脑电信号,脑机接口技术就是利用脑电信号的特异性,通过现代信号处理技术和外部的连 接实现人脑与外部设备的通信。
Feature extracting method, subject recognizing met
专利名称:Feature extracting method, subjectrecognizing method and image processingapparatus发明人:Tsutomu Kawano申请号:US10289981申请日:20021107公开号:US20030095698A1公开日:20030522专利内容由知识产权出版社提供专利附图:摘要:A feature extracting method for a radiation image formed by radiation image signals each corresponding to an amount of radiation having passed through aradiographed subject, has plural different feature extracting steps, each of the plural different feature extracting steps having a respective feature extracting condition to extract a respective feature value; a feature value evaluating step of evaluating a combination of the plural different feature values; and a controlling step of selecting at least one feature extracting step from the plural different feature extracting steps based on an evaluation result by the feature value evaluating step, changing the feature extracting condition of the selected feature extracting step and conducting the selected feature extracting step so as to extract a feature value again based on the changed feature extracting condition from the radiation image.申请人:KONICA CORPORATION更多信息请下载全文后查看。
Feature extraction for automatic speech recognitio
专利名称:Feature extraction for automatic speechrecognition发明人:Brian E. D. Kingsbury,StevenGreenberg,Nelson H. Morgan申请号:US09318592申请日:19990525公开号:US06308155B1公开日:20011023专利内容由知识产权出版社提供专利附图:摘要:An automatic speech recognition apparatus and method with a front endfeature extractor that improves recognition performance under adverse acousticconditions are disclosed. The inventive feature extractor is characterized by a critical bandwidth spectral resolution, an emphasis on slow changes in the spectral structure of the speech signal, and adaptive automatic gain control. In one the feature extractor includes a feature generator configured to compute short-term parameters of the speech signal, a filter system configured to filter the time sequences of the short-term parameters, and a normalizer configured to normalize the filtered parameters with respect to one or more previous values of the filtered parameters. Accordingly, the feature extractor is operable to carry out the following steps: computing short-term parameters of the speech signal; filtering time sequences of the short-term parameters; and normalizing the filtered parameters with respect to one or more previous values of the filtered parameters. The steps of filtering and normalizing preferably are performed independently of one another.申请人:INTERNATIONAL COMPUTER SCIENCE INSTITUTE代理机构:Fish & Richardson P.C.更多信息请下载全文后查看。
特征负荷提取
特征负荷提取Feature extraction is a critical step in data analysis and machine learning tasks. By extracting relevant information from the raw data, it helps to reduce the dimensionality of the data and improve the performance of machine learning models. 特征提取是数据分析和机器学习任务中的关键步骤。
通过从原始数据中提取相关信息,有助于降低数据的维度并提高机器学习模型的性能。
There are various techniques for feature extraction, such as principal component analysis, independent component analysis, and wavelet transform. These techniques aim to extract the most informative features from the data while reducing redundancy and noise. 有各种特征提取技术,如主成分分析、独立成分分析和小波变换。
这些技术旨在从数据中提取最具信息量的特征,同时减少冗余和噪音。
One common approach to feature extraction is to analyze the statistical properties of the data, such as mean, variance, and correlation. By understanding the underlying patterns in the data, it is possible to identify important features that can discriminate between different classes or categories. 特征提取的一种常见方法是分析数据的统计特性,如均值、方差和相关性。
example based feature extraction workflow
example based feature extraction workflow 特征提取是机器学习中的关键步骤,其目的是从原始数据中提取出最具代表性的信息,用于训练模型。
在实际应用中,特征提取往往需要结合领域专业知识和数据特点来设计和优化。
本文将介绍一种基于实例的特征提取工作流程,以便更好地利用数据中的信息。
第一步:数据预处理在进行特征提取之前,需要对原始数据进行预处理,包括数据清洗、去除噪声、归一化等操作。
这些步骤的目的是保证数据的可靠性和一致性,减少特征提取的误差。
第二步:选择特征提取方法不同的数据类型和应用场景需要不同的特征提取方法。
例如,对于图像数据,可以使用基于颜色、纹理、形状等特征的方法;对于文本数据,则可以使用基于文本词频、TF-IDF 等方法。
在选择特征提取方法时,需要结合数据特点和目标任务进行评估和选择。
第三步:提取特征在确定了特征提取方法后,需要对数据进行特征提取。
由于不同数据之间的差异性很大,同样的方法在不同数据上可能会产生不同效果。
因此,在提取特征时,需要结合实际数据进行调整和优化。
这里,一种基于实例的方法可以提高特征提取的效果。
第四步:特征选择在得到了特征向量后,需要对特征进行选择。
特征选择的目的是减少冗余特征和噪声特征的影响,提高模型的泛化能力和准确性。
常用的特征选择方法包括过滤式、包裹式和嵌入式方法。
第五步:模型训练和评估在完成特征提取和选择后,可以将特征向量输入到模型中进行训练和预测。
模型的选择和调整也是机器学习中的关键步骤。
为了评估模型的性能,需要使用交叉验证和测试集等方法进行评估。
结论本文介绍了一种基于实例的特征提取工作流程,利用数据中的信息提高特征提取的效果。
在实际应用中,特征提取的效果往往决定了模型的性能和准确性,因此需要结合领域专业知识和数据特点进行调整和优化。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Carnegie Mellon
Detector features
• Face detector (*.faceinfo)
• Detecting faces in the images
• VOCR detector (*.vocrinfo and *.mpg.txt)
• Detecting and recognizing VOCR
Carnegie Mellon
Closed caption alignment and Shot mapping
• Closed caption alignment (*.wordtime)
• Each word in the closed caption file is assigned an approximate time in millisecond
Multi-modal Features
Concept 1 Concept 2
Multi-concepts Combination
Feature Tasks
1. Boat / Ship
SFFT MFCC
2. Madeleine Albright
Concept 3 3. Bill Clinton Concept 4
• Failure analysis
Carnegie Mellon
A Text Retrieval Approach
• Search for shots with names appearing in transcript
• Vector-based IR model with TF*IDF weighting shots transcript
Feature Extraction Techniques CMU at TRECVID 2004
Ming-yu Chen and Jun Yang
School of Computer Science Carnegie Mellon University
Carnegie Mellon
Outline
Carnegie Mellon
Outline
• Low level features
• Generic high level feature extractions
• Uni-modal • Multi-modal • Multi-concept
• Specialized approach for person finding • Failure analysis
• Kinetic energy (*.kemotion)
• Capture the pixel difference between frames
• Mpeg motion (*.mpgmotion)
• Mpeg motion vector extracted from p-frame
• Optical flow (*.opmotion)
Carnegie Mellon
Top result for TRECVID tasks
• Uni-modal gets 2 best over CMU results • Multi-modal gets 3 best, but includes Boat/Ship which is the best overall all • Multi-concept gets 6 best
• Detector features
• Face detection • VOCR detection
Carnegie Mellon
Image features
• 5 by 5 grids for key-frame per shot • Color histogram (*.hsv, *.hvc, *.rgb)
0.002
0.045
Multi0.110 concept
0.001
0.039
0.517
0.035
0.099
0.003
0.062
Carnegie Mellon
Outline
• Low-level features • Generic high-level feature extractions
• Specialized approaches for person finding
• Temporal Mismatch Drawback
• Faces do not always temporally co-occur with names • Cause false alarms and misses
Carnegie Mellon
Expand the Text Retrieval Results
Boat/Ship Train Beach Basket Scored Airplane Takeoff People Walking/running Physical violence Road Boat, Water_Body, Sky, Cloud Car_Crash, Man_Made_scene, Smoke, Road Sky, Water_Body, Nature_Non-Vegetation, Cloud Crowd, People, Running, Non-Studio_Setting Airplane, Sky, Smoke, Space_Vehicle_Launch Walking, Running, People, Person Gun_Shot, Building, Gun, Explosion Car, Road_Traffic, Truck, Vehicle_Noise
Carnegie Mellon
Low level features overview
• Low level features
• • • • • CMU distributed 16 feature sets available to all TRECVID participants Development set: /trec04/devFeat/ Test set: /trec04/testFeat/ These features were used for all our submissions We encourage people to compare against these features to eliminate confusion about better features vs better algorithms
• Model 1: Gaussian model (trained using Maximum Likelihood) • Model 2: Linear model (different gradients set on two sides)
Carnegie Mellon
Carnegie Mellon
Multi-concepts
• Learning Bayesian Networks from 168 common annotation concepts • cepts to combine with the target concept
• Every 20 msecs (512 windows at 44100 HZ sampling rate)
• FFT (*.FFT) – Short Time Fourier Transform • MFCC (*.MFCC) – Mel-Frequency cepstral coefficients • SFFT (*.SFFT) – simplified FFT
Video OCR Visual Info. Face Feature Kinetic Motion Optical Motion Gabor Texture Canny Edge HSV/HVC/GRB Color
O O O
O O O
Concept 168
10. Road
Common Annotation
Carnegie Mellon
Generic high level feature extraction
Uni-Modal Features
Structural Info. Textual Info. Audio Info. Timing Transcript SVM-based Combination
• • • • • 5 by 5, 125 bins color histogram HSV, HVC, and RGB color space 3125 dimensions (5*5*125) row-wise grids 19980202_CNN.hsv
0.000000,0.036541,0.009744,0.010962,0.001218,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000 ,0.000000,………………….. 0.000000,0.091776,0.055418,0.025415,0.008825,0.000000,0.007413,0.000353,0.000000,0.000000,0.000000,0.000000,0.000000 ,0.000000, …………………..
• Texture (*.texture_5x5_bhs)
• Six orientated Gabor filters
• Edge (*.cannyedge)
• Canny edge detector, 8 orientations
Carnegie Mellon
Audio features & Motion features