Bayesian video reranking
基因沉默番木瓜环斑病毒复制酶基因(PRSV-Nib)获得抗病毒病番木瓜的研究
热带作物学报2024, 45(4): 837 846Chinese Journal of Tropical Crops基因沉默番木瓜环斑病毒复制酶基因(PRSV-Nib)获得抗病毒病番木瓜的研究吴清铧1,2,贾瑞宗2*,郭静远2,杨牧之2,胡玉娟2,郝志刚2,赵辉2**,郭安平2** 1. 海南大学热带作物学院,海南海口 570228;2. 海南省南繁生物安全与分子育种重点实验室/中国热带农业科学院三亚研究院/中国热带农业科学院热带生物技术研究所,海南三亚 572024摘要:番木瓜是重要的热带经济水果。
番木瓜环斑病毒(Papaya ringspot virus, PRSV)是番木瓜的重要病毒病,经常导致严重的产量损失和质量恶化。
自从1998年第一例转基因番木瓜问世以来,使得基于“致病菌衍生的抗病性(pathogen-derived resistance, PDR)”的抗病育种策略获得成功广泛应用。
然而依赖于序列同源性的抗病性与病毒突变导致多样性增加之间的矛盾成为番木瓜育种科学家的新挑战。
本研究拟采用RNAi策略针对复制酶(nuclear inclusion b. Nib)获得广谱抗PRSV番木瓜新种质。
通过团队已建立的胚性愈伤诱导-农杆菌介导转化-再生苗诱导的番木瓜遗传转化体系,共获得经过抗性筛选的再生苗52株,通过特异性PCR进行筛选共计获得24株转基因阳性植株。
通过对T0代田间自然发病试验中,转基因番木瓜株系抗病性明显高于非转基因对照,其中NibB5-2田间抗病性最优。
通过hi TAIL-PCR方法确定NibB5-2插入位点位于第2号染色体supercontig_30的1976766的位置。
T1代接种试验中,无病毒积累且无发病症状,初步确认具有良好的病毒抗性,为番木瓜抗病育种提供新思路。
关键词:番木瓜;番木瓜环斑病毒;Nib基因;RNA介导的病毒抗性中图分类号:S436.67 文献标志码:AGene Silencing of Papaya ringspot virus Replicase Gene (PRSV-Nib) to Obtain Virus Resistant PapayaWU Qinghua1,2, JIA Ruizong2*, GUO Jingyuan2, YANG Muzhi2, HU Yujuan2, HAO Zhigang2, ZHAO Hui2**, GUO Anping2**1. College of Tropical Crops, Hainan University, Haikou, Hainan 570228, China;2. Hainan Key Laboratory for Biosafety Monitor-ing and Molecular Breeding in Off-Season Reproduction Regions / Sanya Research Institutey, Chinese Academy of Tropical Agri-cultural Sciences / Institute of Tropical Bioscience and Biotechnology, Chinese Academy of Tropical Agricultural Sciences, Sanya, Hainan 572024, ChinaAbstract: Papaya is an economically important tropical fruit. Papaya ringspot virus (PRSV) is an important virus dis-ease of papaya, often causing significant yield losses and quality deterioration. Since the introduction of the first trans-genic papaya in 1998, PDR-based breeding strategies for disease resistance have been successfully applied. The contra-diction between disease resistance based on sequence homology and increased virus genetic diversity became a new challenge for papaya breeding. In this study, we propose to use RNAi strategies aim at nuclear inclusion b gene (Nib) to obtain broad-spectrum resistance to PRSV papaya. With optimized embryo callus generation-Agrobatium meidated transformation-shoot regeneration, 52 shoots were obtained after resistance screening and a total of 24 transgenic posi-tive shoots were obtained by specific PCR screening for the T0 generation. In the T0 generation field natural disease test, 收稿日期 2022-12-16;修回日期 2023-02-15基金项目 海南省重大科技计划项目(No. ZDKJ202002);海南省重点研发计划项目(No. ZDYF2022XDNY257);崖州湾科技城菁英人才项目(No. SCKJ-JYRC-2022-67)。
cv研究方向及综述
cv研究方向及综述
计算机视觉(CV)是一个涉及多个子领域的学科,包括图像分类、目标检测、图像分割、目标跟踪、图像去噪、图像增强、风格化、三维重建、图像检索等。
1.图像分类:多类别图像分类、细粒度图像分类、多标签图像分类、实
例级图像分类、无监督图像分类等。
2.目标检测:吴恩达机器学习object location目标定位,关键在于将全
连接层改为卷积层。
3.图像分割:使用深度学习进行图像分割,包括全卷积像素标记网络,
编码器-解码器体系结构,多尺度以及基于金字塔的方法,递归网络,视觉注意模型以及对抗中的生成模型等。
4.目标跟踪:基于滤波理论、运动模型、特征匹配等多种方法的混合跟
踪算法研究,以及基于深度学习的目标跟踪算法研究。
5.图像去噪:比较研究不同深度学习技术对去噪效果的影响,包括加白
噪声图像的CNN、用于真实噪声图像的CNN、用于盲噪声去噪的CNN和用于混合噪声图像的CNN等。
6.图像增强:通过对图像进行变换、滤波、增强等操作,改善图像的视
觉效果或者提取更多的信息,例如超分辨率技术。
7.风格化:通过将一种艺术风格应用到图像上,改变其视觉效果。
8.三维重建:从二维图像中恢复三维场景的过程。
9.图像检索:基于内容的图像检索(CBIR),通过提取图像的特征,
进行相似度匹配,实现图像的检索。
总的来说,CV是一个充满活力的领域,涉及的研究方向非常广泛。
随着深度学习技术的发展,CV领域的研究和应用也取得了很大的进展。
计算机科学重要国际会议
1.2计算机科学与技术重要国际学术会议一、A类会议二、B类会议1.3自动化重要国际学术会议一、A类会议二、B类会议数据挖掘相关的权威期刊和会议-----------------------------------------------[Journals]1.ACM Transactions on Knowledge Discovery from Data (TKDD)2.IEEE Transactions on Knowledge and Data Engineering (TKDE)3.Data Mining and Knowledge Discovery4.Knowledge and Information Systems5.Data & Knowledge Engineering[Conferences]1.SIGMOD:ACM Conference on Management of Data (ACM)2.VLDB:International Conference on Very Large Data Bases (Morgan Kaufmann/ACM)3.ICDE:IEEE International Conference on Data Engineering (IEEE Computer Society)4.SIGKDD:ACM Knowledge Discovery and Data Mining (ACM)5.WWW:International World Wide Web Conferences (W3C)6.CIKM:ACM International Conference on Information and Knowledge Management (ACM)7.PKDD:European Conference on Principles and Practice of Knowledge Discovery in Databases (Springer-Verlag LNAI)JournalsACM TKDD /DMKD/content/1573-756X/?p=859c3e83455d41679ef1be783 e923d1d&pi=0IEEE TKDE /organizations/pubs/transactions/tkde.htm ACM TODS /tods/VLDB Journal /ACM Tois /pubs/tois/ConferencesSigKDD /ICDM /~icdm/SDM /meetings/sdm08/PKDD /VLDB /SigMod /sigmod/ICDE http://www.ipsi.fraunhofer.de/tcde/conf_e.htmlWWW /conferencesOnline Resources网址集合/Computers/Software/Databases/Data_Mining// A google co-op search engine for Data Mining/coop/cse?cx=006422944775554126616%3Aixcd3tdxkke Data Mining, University of Houston/boetticher/CSCI5931%20Data%20Mining.htmlData Mining Program, University of Central Florida / Data Mining Group, University of Dortmundhttp://www-ai.cs.uni-dortmund.de/index.htmlData Mining, MIT OCW/OcwWeb/Sloan-School-of-Management/15-062Data-MiningSpri ng2003/CourseHome/Data Mining Group, Tsinghua /dmg.html KDD oral presentations video Data Mining Events Feed /DataMiningEvents ToolsWeka /ml/weka/Rapid Miner(Yale) /content/view/3/76/lang,en/IlliMine /Alpha Miner http://www.eti.hku.hk/alphaminerPotter's Wheel A-B-C /abc/。
一种监控摄像机干扰检测模型
引言监控视频是犯罪案件调查中一项重要的信息来源与破案依据,但犯罪者极有可能通过对摄像机进行人为干预甚至破坏来掩盖其可疑活动,因此,如何有效地检测监控摄像机干扰事件具有重要的应用价值。
在目前的监控摄相机干扰检测方法中,特殊场景误检测依旧是巨大的挑战,如:照明变化、天气变化、人群流动、大型物体通过等。
对于一些纹理较少或无纹理背景的、黑暗或低质量的监控视频来说,大多数检测方法都会将其误识别为散焦干扰;对于使用带纹理物体进行镜头遮挡的情况,当遮挡物的灰度和亮度与图像背景非常相似时,无法将遮挡物与图像背景区分。
除此之外,物体缓慢地遮挡镜头也是一个具有挑战性的检测问题。
本文使用深度神经网络建立检测模型,利用改进的ConvGRU (Convolutional Gated Recurrent Unit )提取视频的时序特征和图像的空间全局依赖关系,结合Siamese 架构,提出了SCG 模型。
一种监控摄像机干扰检测模型刘小楠,邵培南(中国电子科技集团公司第三十二研究所,上海201808)摘要为减少监控干扰检测中因特殊场景引起的误检测,文中提出一种基于Siamese 架构的SCG(Siamese with Convolutional Gated Recurrent Unit )模型,利用视频片段间的潜在相似性来区分特殊场景与干扰事件。
通过在Siamese 架构中融合改进ConvGRU 网络,使模型充分利用监控视频的帧间时序相关性,在GRU 单元间嵌入的非局部操作可以使网络建立图像空间依赖响应。
与使用传统的GRU 模块的干扰检测模型相比,使用改进的ConvGRU 模块的模型准确率提升了4.22%。
除此之外,文中还引入残差注意力模块来提高特征提取网络对图像前景变化的感知能力,与未加入注意力模块的模型相比,改进后模型的准确率再次提高了2.49%。
关键词Siamese ;ConvGRU ;Non-local block ;相机干扰;干扰检测中图分类号TP391文献标识码A文章编号1009-2552(2021)01-0090-07DOI 10.13274/ki.hdzj.2021.01.016A surveillance camera tampering detection modelLIU Xiao -nan,SHAO Pei -nan(The 32nd Research Institute of China Electronics Technology Group Corporation ,Shanghai 201808,China )Abstract :This paper proposes an SCG model based on Siamese network to reduce the detection error caused by some special scenes in camera tampering detection.The model can use the potential similarity be⁃tween video clips to distinguish special scenes from tampering events.The improved ConvGRU network is in⁃tegrated to capture the temporal correlation between the frames of surveillance video.We embed non -local block s between GRU cells simultaneously,so the model can establish the spatial dependence of the image.The improved ConvGRU network improves model performance by 4.22%.We also add residual attention mod⁃ule to improve the perception ability of the model to the change of image foreground,this again improves the accuracy of the model by 2.49%.Key words :Siamese ;ConvGRU ;Non-local block ;camera tampering ;tampering detection作者简介:刘小楠(1994-),女,硕士研究生,研究方向为计算机视觉、深度学习。
压缩视频贝叶斯超分辨率重建
e td s e t ag rt m i r p s d un e he Ba e i n r me r Is pef r a c s lo a ly e n e c n lo h i s p o o e d r t y sa fa wo k。 t ro m n e i as na z d。 Si ain multo s
Vo . 7 No 5 12 . Oc . 2 o t 07
文章编号 :6 35 3 (0 7 0 40 60 17 -4 9 2 0 )5 21 -6
压 缩 视 频 贝 叶 斯 超 分 辨 率 重 建
徐 忠强 , 朱秀 昌
(南京邮电大学 通信与信息工程学 院, 江苏 南京 200 ) 10 3
n ie; Mo in e t ain os t s m to o i
1 引 言
超分辨率 ( pr e l o ,R 重建通过利用低 s e r o tn S ) u —su i 分辨 率 ( wrsltnL 图像 序列 之 间的运 动 和附 1 — oui ,R) o e o 加的信息来重建高分辨率 (i — s u o , R 图像 , h hr o tn H ) g e li 最早 由 Ta…提 出。其 后纷 纷 出现 了一 些更 实 用 的 si
INSA de Lyon
Keywords: Text Extraction, image enhancement, binarization, OCR, video Indexing 1
1
Introduage Retrieval and its extension to videos is a research area which gained a lot of attention in the recent years. Various methods and techniques have been presented, which allow to query big databases with multimedia contents (images, videos etc.) using features extracted by low level image processing methods and distance functions which have been designed to resemble human visual perception as closely as possible. Nevertheless, query results returned by these systems do not always match the results desired by a human user. This is largely due to the lack of semantic information in these systems. Systems trying to extract semantic information from low level features have already been presented [10], but they are error prone and very much depend on large databases of pre-defined semantic concepts and their low level representation. Another method to add more semantics to the query process is relevance feedback, which uses interactive user feedback to steer the query process. See [20] and [8] for surveying papers on this subject. Systems mixing features from different domains (image and text) are an interesting alternative to mono-domain based features [4]. However, the keywords are not available for all images and are very dependent on the indexer’s point of view on a given image (the so-called polysemy of images), even if they are closely related to the semantic information of certain video sequences (see figure 1). In this paper we focus on text extraction and recognition in videos. The text is automatically extracted from the videos in the database and stored together with a link to the video sequence and the frame number. This is a complementary approach to basic keywords. The user submits a request by providing a keyword, which is robustly matched against the previously extracted text in the database. Videos containing the keyword are presented to the user. This can also be merged with image features like color or texture. <Figure 1 about here> Extraction of text from images and videos is a very young research subject, which nevertheless attracts a large number of researchers. The first algorithms, introduced by the document processing community for the extraction of text from 2
海州湾细菌群落结构多样性及环境因子分析
DOI:10.16605/ki.1007-7847.2022.12.0239海州湾细菌群落结构多样性及环境因子分析收稿日期:2022-12-27;修回日期:2023-06-08;网络首发日期:2023-10-19基金项目:国家虾蟹产业技术体系专项资金项目(CARS-48);江苏省农业公共服务补助专项(2021)作者简介:顾颖(1997—),女,江苏连云港人,硕士研究生;*通信作者:孙苗苗(1983—),男,江苏连云港人,博士,高级工程师,主要从事水产养殖与水生生物病害防控等方面的研究,E-mail:*******************。
顾颖1,伏光辉1,姚永琪2,梁宝贵2,叶仁智1,王超1,卢璐1,孙苗苗1*(1.连云港市海洋与渔业发展促进中心,中国江苏连云港222000;2.上海海洋大学海洋科学学院,中国上海201306)摘要:为探究海州湾浅海域的细菌群落结构多样性及其影响因素,选取海州湾6个点位的表层水和底层水样品,基于高通量测序技术,对海州湾浅海域细菌群落结构多样性及其分布特征与环境因子的关系进行了分析。
实验结果表明,底层水样的细菌群落丰度和多样性优于表层水样,二者细菌群落间的进化方向具有一定的差异,但二者的细菌群落间差异不显著;海州湾水域共发现2491个操作分类单元(operational taxonomic unit,OTU),分别属于32个门、74个纲、116个目、200个科和364个属;在门水平上,海州湾水域的主要优势菌群为变形菌门(Proteobacteria)、拟杆菌门(Bacteroidetes)和蓝细菌门(Cyanobacteria)等;铵盐(ammonium,NH 4-N)和叶绿素(chlorophyll,Chl)是影响海州湾细菌群落结构的主要环境因子。
本研究证实海州湾浅海域细菌群落结构多样性分布与环境因子有一定的关联性,为海州湾浅海域生态系统的可持续发展提供了理论依据。
美国“大众点评网”Yelp如何利用深度学习对美食照片进行评分
美国“大众点评网”Yelp如何利用深度学习对美食照片进行评分Yelp的数据库中已经存储了几千万张相片,用户们现在每天都会上传大概十万张,而且速度还在不断加快。
事实上,我们发现相片的上传增长率大于相片的查看率。
这些相片反映着本地商业的内容和质量,提供了非常丰富的信息。
关于这些相片非常重要的一方面,就是展示出来的内容的类型。
在2015年8月,我们上线了一套新系统,用于将传统饭店有关的相片分为食物、饮料、外观、内景和菜单等几大类。
从那以后,我们又为咖啡店、酒吧等类似的商店上线了类似的系统,以此来帮助用户们尽快发现他们想要寻找的那些相片。
最近的一段时间,我们又在研究如何进一步提高用户的满意度,具体方法就是给他们看更多漂亮的图片,改进我们的相片排名系统。
理解相片的质量对比相片的质量,看起来很像是一件非常主观性的工作。
喜欢哪张相片或不喜欢哪张相片,有许多因素会影响这样的决定,而且依正在做搜索的用户个人不同,结论也会有所不同。
为了能为Yelp的用户提供更好的体验,相片理解团队必须担当起这项非常有挑战性的工作:确定哪些特点会让相片更受人喜爱,并研发出一套算法,可以依据这些特点来可靠地对相片做出评判。
首先我们试着为相片构建一个点击率预测器,数据源就是从日志中挖掘出来的点击数据。
我们的假设是,那些被点击了更多次数的相片应该很明显地会比其它相片好。
可事实上这个想法的效果却没有想象中好,原因有几点。
首先,人们常常会点开那些比较模糊、或者里面有非常多文字的相片,这么来看看里面到底是什么内容。
另外,因为Yelp上的相片有许多种展示方法,所以很难有效地对比某些特定相片的指标。
之后,我们试用了好几种不同的计算机视觉技术,试着发现一些相片的内在特征,希望可以直接用于质量评分。
比如,对摄影师来说有个非常重要的特征叫“景深”,它用来测量相片有多少内容是在焦点里面的。
用浅景深可以非常有效地将相片中的物体与它的背景区别开来,上传到Yelp的相片也不例外。
自适应阈值新闻视频镜头分割算法
v d o s o s S h l o t m a i h ra a t b l y t e h s t x d tr s o d Th n l e c flmp l h lo t k n o i e h t, O t e ag r h h sa h g e d p a i t n t o e wi f e e h l . i i h hi h e i fu n e o a i ti a s a e it g s
a c u t n t e ag r h , t ee p rme t h wst a ea g r h a e e tr e u t c o l o t m n i h i h x e i n o t h l o i s h t t m c ng t te s l b r .
Ke r s n wsv d o v d o s o ; s g n a i n ag rt m; h so r m ; a t ma e e h l ; f s g t e e t n y wo d : e i e ; i e h t e me tto l o i h it g a uo tdt s od r h l h l h t ci a i d o
13 2 1, o 3 , o 计 算 机 工程 与设 计 C m u r ni en d ei 74 01 V 1 2 N . . 5 o pt E g er g n D s n e n i a g
自适应阈值新闻视频镜头分割算法
王 国 营 , 寇 红 召 , 李 涛
( 放 军信 息 工程 大学 电子技 术 学院 ,河 南 郑 州 4 0 0 ) 解 5 0 4
势是 能够 根据镜 头 内容 的复 杂程度 自动 确定 阈值 ,在 一定程 度 上避免 了固定 阈值 算 法适应 性不 强的 问题 ,同时算法 还考 虑 了如 何 消除新 闻视 频 中常见 的 闪光灯对 镜头检 测 的影 响 。实验 结果表 明 , 算法对 新 闻视频 镜 头分割 具有较 好 的效 果。 该
Multimodality Image Registration by Maximization of Mutual Information
Multimodality Image Registration byMaximization of Mutual Information Frederik Maes,*Andr´e Collignon,Dirk Vandermeulen,Guy Marchal,and Paul Suetens,Member,IEEEAbstract—A new approach to the problem of multimodality medical image registration is proposed,using a basic concept from information theory,mutual information(MI),or relative entropy,as a new matching criterion.The method presented in this paper applies MI to measure the statistical dependence or information redundancy between the image intensities of corresponding voxels in both images,which is assumed to be maximal if the images are geometrically aligned.Maximization of MI is a very general and powerful criterion,because no assumptions are made regarding the nature of this dependence and no limiting constraints are imposed on the image content of the modalities involved.The accuracy of the MI criterion is validated for rigid body registration of computed tomog-raphy(CT),magnetic resonance(MR),and photon emission tomography(PET)images by comparison with the stereotactic registration solution,while robustness is evaluated with respect to implementation issues,such as interpolation and optimization, and image content,including partial overlap and image degra-dation.Our results demonstrate that subvoxel accuracy with respect to the stereotactic reference solution can be achieved completely automatically and without any prior segmentation, feature extraction,or other preprocessing steps which makes this method very well suited for clinical applications.Index Terms—Matching criterion,multimodality images,mu-tual information,registration.I.I NTRODUCTIONT HE geometric alignment or registration of multimodality images is a fundamental task in numerous applications in three-dimensional(3-D)medical image processing.Medical diagnosis,for instance,often benefits from the complemen-tarity of the information in images of different modalities. In radiotherapy planning,dose calculation is based on the computed tomography(CT)data,while tumor outlining is of-ten better performed in the corresponding magnetic resonance (MR)scan.For brain function analysis,MR images provide anatomical information while functional information may beManuscript received February21,1996;revised July23,1996.This work was supported in part by IBM Belgium(Academic Joint Study)and by the Belgian National Fund for Scientific Research(NFWO)under Grants FGWO 3.0115.92,9.0033.93and G.3115.92.The Associate Editor responsible for coordinating the review of this paper and recommending its publication was N.Ayache.Asterisk indicates corresponding author.*F.Maes is with the Laboratory for Medical Imaging Research, Katholieke Universiteit Leuven,ESAT/Radiologie,Universitair Ziekenhuis Gasthuisberg,Herestraat49,B-3000Leuven,Belgium.He is an Aspirant of the Belgian National Fund for Scientific Research(NFWO)(e-mail: Frederik.Maes@uz.kuleuven.ac.be).A.Collingnon,D.Vandermeulen,G.Marchal,and P.Suetens are with the Laboratory for Medical Imaging Research,Katholieke Universiteit Leuven, ESAT/Radiologie,Universitair Ziekenhuis Gasthuisberg,Herestraat49,B-3000Leuven,Belgium.Publisher Item Identifier S0278-0062(97)02397-5.obtained from positron emission tomography(PET)images, etc.The bulk of registration algorithms in medical imaging(see [3],[16],and[23]for an overview)can be classified as being either frame based,point landmark based,surface based,or voxel based.Stereotactic frame-based registration is very ac-curate,but inconvenient,and cannot be applied retrospectively, as with any external point landmark-based method,while anatomical point landmark-based methods are usually labor-intensive and their accuracy depends on the accurate indication of corresponding landmarks in all modalities.Surface-based registration requires delineation of corresponding surfaces in each of the images separately.But surface segmentation algorithms are generally highly data and application dependent and surfaces are not easily identified in functional modalities such as PET.Voxel-based(VSB)registration methods optimize a functional measuring the similarity of all geometrically cor-responding voxel pairs for some feature.The main advantage of VSB methods is that feature calculation is straightforward or even absent when only grey-values are used,such that the accuracy of these methods is not limited by segmentation errors as in surface based methods.For intramodality registration multiple VSB methods have been proposed that optimize some global measure of the absolute difference between image intensities of corresponding voxels within overlapping parts or in a region of interest(ROI) [5],[11],[19],[26].These criteria all rely on the assumption that the intensities of the two images are linearly correlated, which is generally not satisfied in the case of intermodality registration.Crosscorrelation of feature images derived from the original image data has been applied to CT/MR matching using geometrical features such as edges[15]and ridges[24] or using especially designed intensity transformations[25]. But feature extraction may introduce new geometrical errors and requires extra calculation time.Furthermore,correlation of sparse features like edges and ridges may have a very peaked optimum at the registration solution,but at the same time be rather insensitive to misregistration at larger distances,as all nonedge or nonridge voxels correlate equally well.A mul-tiresolution optimization strategy is therefore required,which is not necessarily a disadvantage,as it can be computationally attractive.In the approach of Woods et al.[30]and Hill et al.[12], [13],misregistration is measured by the dispersion of the two-dimensional(2-D)histogram of the image intensities of corresponding voxel pairs,which is assumed to be minimal in the registered position.But the dispersion measures they0278–0062/97$10.00©1997IEEEpropose are largely heuristic.Hill’s criterion requires seg-mentation of the images or delineation of specific histogram regions to make the method work [20],while Woods’criterion is based on additional assumptions concerning the relationship between the grey-values in the different modalities,which reduces its applicability to some very specific multimodality combinations (PET/MR).In this paper,we propose to use the much more general notion of mutual information (MI)or relative entropy [8],[22]to describe the dispersive behavior of the 2-D histogram.MI is a basic concept from information theory,measuring the statistical dependence between two random variables or the amount of information that one variable contains about the other.The MI registration criterion presented here states that the MI of the image intensity values of corresponding voxel pairs is maximal if the images are geometrically aligned.Because no assumptions are made regarding the nature of the relation between the image intensities in both modalities,this criterion is very general and powerful and can be applied automatically without prior segmentation on a large variety of applications.This paper expands on the ideas first presented by Collignon et al .[7].Related work in this area includes the work by Viola and Wells et al .[27],[28]and by Studholme et al .[21].The theoretical concept of MI is presented in Section II,while the implementation of the registration algorithm is described in Section III.In Sections IV,V,and VI we evaluate the accuracy and the robustness of the MI matching criterion for rigid body CT/MR and PET/MR registration.Section VII summarizes our current findings,while Section VIII gives some directions for further work.In the Appendexes,we discuss the relationship of the MI registration criterion to other multimodality VSB criteria.II.T HEORYTwo randomvariables,,with marginal probabilitydistributions,and:.MI,and(1)MI is related to entropy by theequationsandgivengiven(5)(7)Theentropy,whilewhenknowingby the knowledge of another randomvariablecontainsaboutandandand.The MI registration criterion states that the images are geometrically aligned by thetransformation forwhichMAES et al.:MULTIMODALITY IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION189(a)(b)Fig.1.Joint histogram of the overlapping volume of the CT and MR brain images of dataset A in Tables II and III:(a)Initial position:I (CT;MR )=0:46,(b)registered position:I (CT;MR )=0:89.Misregistration was about 20mm and 10 (see the parameters in Table III).If both marginaldistributionsand,the MI criterion reduces to minimizing the jointentropyor ,which is the case if one of the images is always completely contained in the other,the MI criterion reduces to minimizing the conditionalentropyisvariedandand.The MI criterion takes this into accountexplicitly,as becomes clear in (2),which can be interpreted as follows [27]:“maximizing MI will tend to find as much as possible of the complexity that is in the separate datasets (maximizing the first two terms)so that at the same time they explain each other well (minimizing the last term).”For.Thisrequiresis varied,which will be the case if the image intensity values are spatially correlated.This is illustrated by the graphs in Fig.2,showing the behaviorofaxis along the row direction,theaxis along the plane direction.One of the images is selected to be the floatingimage,are taken and transformed intothe referenceimage,or a sub-or superset thereof.Subsampling of the floating image might be used to increase speed performance,while supersampling aims at increasing accuracy.For each value of the registrationparameterfalls inside the volumeofis a six-component vector consisting of three rotationanglestoimage(8)with3diagonal matrixes representing thevoxel sizes ofimages,respectively (inmillimeters),3rotation matrix,with thematrixes-,-axis,respectively,and190IEEE TRANSACTIONS ON MEDICAL IMAGING,VOL.16,NO.2,APRIL1997Fig.3.Graphical illustration of NN,TRI,and PV interpolation in 2-D.NN and TRI interpolation find the reference image intensity value at position T s and update the corresponding joint histogram entry,while PV interpolation distributes the contribution of this sample over multiple histogram entries defined by its NN intensities,using the same weights as for TRI interpolation.B.CriterionLetatposition.The joint image intensityhistogramis computed by binning the image intensitypairs forallbeing the total number of bins in the joint histogram.Typically,weusewill not coincide with a grid pointofis generally insufficient to guaranteesubvoxel accuracy,as it is insensitive to translations up to one voxel.Other interpolation methods,such as trilinear (TRI)interpolation,may introduce new intensity values which are originally not present in the reference image,leading tounpredictable changes in the marginaldistributionof the reference image for small variationsof,the contribution of the imageintensityofon the gridofis varied.Estimations for the marginal and joint image intensitydistributionsis then evaluatedby(12)and the optimal registrationparameter is foundfrom,using Brent’s one-dimensional optimization algorithm for the line minimizations [18].The direction matrix is initialized with unit vectors in each of the parameter directions.An appropriate choice for the order in which the parameters are optimized needs to be specified,as this may influence optimization robustness.For instance,when matching images of the brain,the horizontal translation and the rotation around the vertical axis are more constrained by the shape of the head than the pitching rotation around the left-to-right horizontal axis.There-fore,first aligning the images in the horizontal plane by first optimizing the in-planeparameters may facilitate the optimization of the out-of-planeparametersMAES et al.:MULTIMODALITY IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION 191TABLE IID ATASETS U SEDIN THEE XPERIMENTS D ISCUSSED IN S ECTIONS VANDVIIV.E XPERIMENTSThe performance of the MI registration criterion was eval-uated for rigid-body registration of MR,CT,and PET images of the brain of the same patient.The rigid-body assumption is well satisfied inside the skull in 3-D scans of the head if patient related changes (due to for instance interscanning operations)can be neglected,provided that scanner calibration problems and problems of geometric distortions have been minimized by careful calibration and scan parameter selection,respectively.Registration accuracy is evaluated in Section V by comparison with external marker-based registration results and other retrospective registration methods,while the robust-ness of the method is evaluated in Section VI with respect to implementation issues,such as sampling,interpolation and op-timization,and image content,including image degradations,such as noise,intensity inhomogeneities and distortion,and partial image overlap.Four different datasets are used in the experiments described below (Table II).Dataset A 1contains high-resolution MR and CT images,while dataset B was obtained by smoothing and subsampling the images of dataset A to simulate lower resolution data.Dataset C 2contains stereotactically acquired MR,CT,and PET images,which have been edited to remove stereotactic markers.Dataset D contains an MR image only and is used to illustrate the effect of various image degradations on the registration criterion.All images consist of axial slices and in all casestheaxis is directedhorizontally front to back,andthedirection.In all experiments,the joint histogram size is256axis (0.7direction due to an offset inthedirection for the solution obtainedusing PV interpolation due to a 1rotation parameter.For MR to PET as well as for PET to MR registration,PV interpolation yields the smallest differences with the stereotactic reference solution,especially inthedirection due to offsets inthe192IEEE TRANSACTIONS ON MEDICAL IMAGING,VOL.16,NO.2,APRIL 1997TABLE IIIR EFERENCE AND MI R EGISTRATION P ARAMETERS FOR D ATASETS A,B,AND C AND THE M EAN AND M AXIMAL A BSOLUTE D IFFERENCE E V ALUATED AT E IGHT P OINTS N EAR THE B RAIN SURFACEvolume as the floating image and using different interpolation methods.For each combination,various optimization strate-gies were tried by changing the order in which the parameters were optimized,each starting from the same initial position with all parameters set to zero.The results are summarized in Fig.5.These scatter plots compare each of the solutions found (represented by their registrationparameterson the horizontal axis (using mm and degreesfor the translation and rotation parameters,respectively)and by the difference in the value of the MI criterion(MI)on the vertical axis.Although the differences are small for each of the interpolation methods used,MR to CT registration seems to be somewhat more robust than CT to MR registration.More importantly,the solutions obtained using PV interpolation are much more clustered than those obtained using NN or TRI interpolation,indicating that the use of PV interpolation results in a much smoother behavior of the registration criterion.This is also apparent from traces in registration space computed around the optimal solution for NN,TRI,and PV interpolation (Fig.6).These traces look very similar when a large parameter range is considered,but in the neighborhood of the registration solution,traces obtained with NN and TRI interpolation are noisy and show manylocal maxima,while traces obtained with PV interpolation are almost quadratic around the optimum.Remark that the MI values obtained using TRI interpolation are larger than those obtained using NN or PV interpolation,which can be interpreted according to (2):The TRI averaging and noise reduction of the reference image intensities resulted in a larger reduction of the complexity of the joint histogram than the corresponding reduction in the complexity of the reference image histogram itself.B.SubsamplingThe computational complexity of the MI criterion is pro-portional to the number of samples that is taken from the floating image to compute the joint histogram.Subsampling of the floating image can be applied to increase speed perfor-mance,as long as this does not deteriorate the optimization behavior.This was investigated for dataset A by registration of the subsampled MR image with the original CT image using PV interpolation.Subsampling was performed by takingsamples on a regular grid at sample intervalsofand direction,respectively,using NNinterpolation.No averaging or smoothing of the MR image before subsampling was applied.Weused,and .The same optimization strategy was used in each case.RegistrationsolutionsandMAES et al.:MULTIMODALITY IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION193(a)(b)Fig.5.Evaluation of the MI registration robustness for dataset A.Horizontal axis:norm of the difference vector j 0 3j for different optimization strategies,using NN,TRI,and PV interpolation. 3corresponds to the registration solution with the best value for the registration criterion for each of the interpolation schemes applied.Vertical axis:difference in the registration criterion between each solution and the optimal one.(a)Using the CT image as the floating image.(b)Using the MR image as the floatingimage.(a)(b)(c)(d)Fig.6.MI traces around the optimal registration position for dataset A:Rotation around the x axis in the range from 0180to +180 (a)and from 00.5to +0.5 (bottom row),using NN (b),TRI (c),and PV (d)interpolation.intheand 0.2mm off from the solutionfound without subsampling.C.Partial OverlapClinically acquired images typically only partially overlap,as CT scanning is often confined to a specific region to minimize the radiation dose while MR protocols frequently image larger volumes.The influence of partial overlap on the registration robustness was evaluated for dataset A for CT to MR registration using PV interpolation.The images were initially aligned as in the experiment in Section V and the same optimization strategy was applied,but only part of the CT data was considered when computing the MI criterion.More specifically,three 50-slice slabs were selected at the bottom (the skull basis),the middle,and the top part of the dataset.The results are summarized in Table IV and compared with the solution found using the full dataset by the mean and194IEEE TRANSACTIONS ON MEDICAL IMAGING,VOL.16,NO.2,APRIL 1997TABLE IVI NFLUENCEOFP ARTIAL O VERLAPONTHE R EGISTRATION R OBUSTNESSFORCTTOMR R EGISTRATIONOFD ATASETAFig.7.Effect of subsampling the MR floating image of dataset A on the registration solution.Horizontal axis:subsampling factor f ,indicating that only one out of f voxels was considered when evaluating the MI criterion.Vertical axis:norm of the difference vector j 0 3j . 3corresponds to the registration solution obtained when no subsampling is applied.maximal absolute difference evaluated over the full image at the same eight points as in Section V.The largest parameter differences occur for rotation aroundthedirection,resulting in maximal coordinate differencesup to 1.5CT voxel inthe direction,but on average all differences are subvoxel with respect to the CT voxel sizes.D.Image DegradationVarious MR image degradation effects,such as noise,in-tensity inhomogeneity,and geometric distortion,alter the intensity distribution of the image which may affect the MI registration criterion.This was evaluated for the MR image of dataset D by comparing MI registration traces obtained for the original image and itself with similar traces obtained for the original image and its degraded version (Fig.8).Such traces computed for translation inthewas alteredinto(15)(a)(b)(c)(d)Fig.8.(a)Slice 15of the original MR image of dataset D,(b)zero mean noise added with variance of 500grey-value units,(c)quadratic inhomogeneity (k =0:004),and (d)geometric distortion (k =0:00075).with being the image coordinates of the point around which the inhomogeneity is centeredand.All traces for all param-eters reach their maximum at the same position and the MI criterion is not affected by the presence of the inhomogeneity.3)Geometric Distortion:Geometricdistortions(16)(17)(18)withthe image coordinates of the center of each image planeandtranslation parameter proportionalto the averagedistortionMAES et al.:MULTIMODALITY IMAGE REGISTRATION BY MAXIMIZATION OF MUTUAL INFORMATION195(a)(b)(c)(d)Fig.9.MI traces using PV interpolation for translation in the x direction of the original MR image of dataset D over its degraded version in the range from 010to +10mm:(a)original,(b)noise,(c)intensity inhomogeneity,and (d)geometric distortion.VII.D ISCUSSIONThe MI registration criterion presented in this paper assumes that the statistical dependence between corresponding voxel intensities is maximal if both images are geometrically aligned.Because no assumptions are made regarding the nature of this dependence,the MI criterion is highly data independent and allows for robust and completely automatic registration of multimodality images in various applications with min-imal tuning and without any prior segmentation or other preprocessing steps.The results of Section V demonstrate that subvoxel registration differences with respect to the stereo-tactic registration solution can be obtained for CT/MR and PET/MR matching without using any prior knowledge about the grey-value content of both images and the correspondence between them.Additional experiments on nine other datasets similar to dataset C within the Retrospective Registration Evaluation Project by Fitzpatrick et al .[10]have verified these results [29],[14].Moreover,Section VI-C demonstrated the robustness of the method with respect to partial over-lap,while it was shown in Section VI-D that large image degradations,such as noise and intensity inhomogeneities,have no significant influence on the MI registration crite-rion.Estimations of the image intensity distributions were ob-tained by simple normalization of the joint histogram.In all experiments discussed in this paper,the joint histogram was computed from the entire overlapping part of both images,using the original image data and a fixed number of bins of256andand .For low-resolutionimages,the optimization often did not converge to the global optimum if a different parameter order was specified,due to the occurrence of local optima especially forthe196IEEE TRANSACTIONS ON MEDICAL IMAGING,VOL.16,NO.2,APRIL1997theand40mm,but we have not extensivelyinvestigated the robustness of the method with respect to theinitial positioning of the images,for instance by using multiplerandomised starting estimates.The choice of thefloating imagemay also influence the behavior of the registration criterion.In the experiment of Section VI-A,MR to CT matching wasfound to be more robust than CT to MR matching.However,it is not clear whether this was caused by sampling andinterpolation issues or by the fact that the MR image is morecomplex than the CT image and that the spatial correlation ofimage intensity values is higher in the CT image than in theMR image.We have not tuned the design of the search strategy towardspecific applications.For instance,the number of criterionevaluations required may be decreased by taking the limitedimage resolution into account when determining convergence.Moreover,the results of Section VI-B demonstrate that forhigh-resolution images subsampling of thefloating imagecan be applied without deteriorating optimization robustness.Important speed-ups can,thus,be realized by using a mul-tiresolution optimization strategy,starting with a coarselysampled image for efficiency and increasing the resolution asthe optimization proceeds for accuracy[20].Furthermore,thesmooth behavior of the MI criterion,especially when usingPV interpolation,may be exploited by using gradient-basedoptimization methods,as explicit formulas for the derivativesof the MI function with respect to the registration parameterscan be obtained[27].All the experiments discussed in this paper were for rigid-body registration of CT,MR,and PET images of the brainof the same patient.However,it is clear that the MI criterioncan equally well be applied to other applications,using moregeneral geometric transformations.We have used the samemethod successfully for patient-to-patient matching of MRbrain images for correlation of functional MR data and forthe registration of CT images of a hardware phantom to itsgeometrical description to assess the accuracy of spiral CTimaging[14].MI measures statistical dependence by comparing the com-plexity of the joint distribution with that of the marginals.Bothmarginal distributions are taken into account explicitly,whichis an important difference with the measures proposed by Hillet al.[13](third-order moment of the joint histogram)andCollignon et al.[6](entropy of the joint histogram),whichfocus on the joint histogram only.In Appendexes A and B wediscuss the relationship of these criteria and of the measureof Woods et al.[30](variance of intensity ratios)to the MIcriterion.MI is only one of a family of measures of statisticaldependence or information redundancy(see Appendix C).We have experimentedwith,the entropy correlation coefficient[1].In some cases these measures performed better thanthe original MI criterion,but we could not establish a clearpreference for either of these.Furthermore,the use of MIfor multimodality image registration is not restricted to theoriginal image intensities only:other derived features such asedges or ridges can be used as well.Selection of appropriatefeatures is an area for further research.VIII.C ONCLUSIONThe MI registration criterion presented in this paper allowsfor subvoxel accurate,highly robust,and completely automaticregistration of multimodality medical images.Because themethod is largely data independent and requires no userinteraction or preprocessing,the method is well suited to beused in clinical practice.Further research is needed to better understand the influenceof implementation issues,such as sampling and interpolation,on the registration criterion.Furthermore,the performance ofthe registration method on clinical data can be improved bytuning the optimization method to specific applications,whilealternative search strategies,including multiresolution andgradient-based methods,have to be investigated.Finally,otherregistration criteria can be derived from the one presented here,using alternative information measures applied on differentfeatures.A PPENDIX AWe show the relationship between the multimodality reg-istration criterion devised by Hill et al.[12]and the jointentropy th-order moment of thescatter-plot[22]with the following properties.1)andand with。
VIDEO ENHANCEMENTS FOR LIVE SHARING OF MEDICAL IMA
专利名称:VIDEO ENHANCEMENTS FOR LIVE SHARINGOF MEDICAL IMAGES发明人:Harish P. HIRIYANNAIAH,Muhammad ZafarJaved SHAHID申请号:US14463127申请日:20140819公开号:US20160055305A1公开日:20160225专利内容由知识产权出版社提供专利附图:摘要:In a telemedicine application there is live sharing of a video stream of medical images from a first site to a second site as well as a two-way conferencing capability. Livestreaming of medical images in a live interactive session imposes many limitations on the video streaming process not found in conventional video conferencing. The network conditions are heterogeneous and low latency is required to support: ) live streaming of medical images to a remote site and ) supporting two-way conferencing in which a doctor or clinician at the remote site can provide real-time analysis or guidance on how to adjust a location of an imaging device. A suite of video enhancements is disclosed to improve the capability to sustain live video streaming of medical images in a telemedicine environment including a two-way conference between doctors or clinicians.申请人:eagleyemed, Inc.地址:Santa Clara CA US国籍:US更多信息请下载全文后查看。
MULTIVARIATE RATE CONTROL FOR TRANSCODING VIDEO CO
专利名称:MULTIVARIATE RATE CONTROL FORTRANSCODING VIDEO CONTENT发明人:JOHN, Sam,ADSUMILLI, Balineedu,GADDE, Akshay申请号:EP20731311.5申请日:20200519公开号:EP3939288A1公开日:20220119专利内容由知识产权出版社提供摘要:A learning model is trained for rate-distortion behavior prediction against a corpus of a video hosting platform and used to determine optimal bitrate allocations for video data given video content complexity across the corpus of the video hosting platform. Complexity features of the video data are processed using the learning model to determine a rate-distortion cluster prediction for the video data, and transcoding parameters for transcoding the video data are selected based on that prediction. The rate-distortion clusters are modeled during the training of the learning model, such as based on rate-distortion curves of video data of the corpus of the video hosting platform and based on classifications of such video data. This approach minimizes total corpus egress and/or storage while further maintaining uniformity in the delivered quality of videos by the video hosting platform.申请人:Google LLC地址:1600 Amphitheatre Parkway Mountain View, CA 94043 US国籍:US代理机构:Grant, David Michael 更多信息请下载全文后查看。
基于非参数秩和检验的B站电影评分系统研究
The Science Education Article CollectsNo.6,2021 Sum No.5222021年第6期总第522期摘要该文通过爬虫代码搜集了当前B站电影栏目列表中的所有电影(约1000部),同时爬取每部电影下的所有评分数据(约65万条),每条评分数据包含评分时间与用户的ID信息。
通过非参数统计中的Mann-Whitney秩和检验对搜集的数据进行分析和研究,结果表明:B站电影栏目中第一次评分人员的比例会对评分产生显著影响。
同时参考美国IMDb贝叶斯加权统计算法中只收录“老用户”评分的处理方式,对B站评分系统提出建议,使评分能更加客观、全面地为观众提供参考。
关键词Mann-Whitney秩和检验;电影评分;B站;爬虫Research on Movie Scoring System of bilibili Based on Nonparametric Rank Sum Test//ZHANG Keqi,LI Qiumin Abstract Through the crawler code,this paper collects all the movies(about1000)in the current bilibili movie column list,and crawls all the rating data(about650000pieces)of current movies under each movie.Each rating data contains the rating time and user ID information.The data collected were analyzed by Mann-Whitney rank sum test in nonparametric statistics.The results showed that the proportion of people who scored for the first time in bilibili film column had a significant impact on the score.At the same time,referring to the American IMDb Bayes weighted statistical algorithm which only includes"old users" rating behavior,this paper puts forward suggestions for bilibili scoring system,so that the scoring can be more objective and comprehensive to provide reference for the audience.Key words Mann-Whitney rank sum test;movie score;bilibili;crawler1引言《国民经济和社会发展第十二个五年规划纲要》指出,要大力发展影视制作等重点文化产业,加快中西部地区中小城市影院建设。
基于背景相似度模型的视频检索技术研究
基于背景相似度模型的视频检索技术研究近年来,视频成为人类最主要的信息载体之一,视频数据的数量和复杂度也日益增长。
在这种情况下,开发基于背景相似度模型的视频检索技术已经成为一个重要的课题。
本文将探讨这一课题,并介绍该技术的实现方法和应用场景。
一、背景相似度模型背景相似度模型(Background Similarity Model)是基于背景特征的视频检索技术。
其基本思想是将视频分成空间域和时间域两个维度,利用相邻帧之间的相似度计算得到视频的背景特征。
使用这些背景特征,可以进行视频剪辑、视频分类、视频摘要等多种应用。
背景相似度模型的计算方法可分为两类:一类是基于运动的方法(如光流算法),另一类是基于纹理的方法(如局部二值模式)。
其中光流算法计算速度较快,但对光照和物体位移敏感,容易出现误差。
局部二值模式则适用范围广泛,但计算较慢,需要更高的计算资源。
二、基于背景相似度模型的视频检索技术基于背景相似度模型的视频检索技术有很多应用场景,比如视频搜索、视频推荐、视频摘要、视频分类等。
其中,视频摘要和视频分类是背景相似度模型的两个主要应用。
视频摘要通常仅截取视频的主体部分,以减少网络传输和存储开销。
而背景相似度模型可以通过计算视频的背景特征,自动识别视频中的主体。
摘要过程中,根据用户需求和视频特征,选取相应的帧作为摘要。
视频分类是对视频进行标签化,以方便后续的搜索和管理。
在分类过程中,一个视频的背景特征会被与数据库中的其他视频进行比较,寻求相似度较高的样本。
基于此,可以自动将视频分类为不同类别,比如电影、体育、新闻等。
三、实现方法基于背景相似度模型的视频检索技术有多种实现方法。
下面将分别介绍两种常用的实现方法。
1、基于GOOGLE NETGOOGLE NET是一种深度卷积神经网络,在视频分析中得到了广泛的应用。
基于GOOGLE NET的视频检索技术,主要分为以下几步。
首先,将视频通过GOOGLE NET提取的特征表示为向量形式。
基于深度学习的视频流行度预测与推荐研究
基于深度学习的视频流行度预测与推荐研究近年来,随着互联网的普及和视频平台的快速发展,视频已经成为人们日常生活中的重要组成部分。
在这个信息过载的时代,如何预测视频的流行度并向用户推荐合适的视频内容,成为了一个亟待解决的问题。
基于深度学习的视频流行度预测与推荐研究应运而生,为我们提供了一种有效的解决方案。
在深度学习的框架下,视频流行度预测和推荐主要通过建立一个复杂的神经网络模型来实现。
该模型通过对视频的特征进行提取和分析,来预测视频的受欢迎程度,并在用户行为数据的基础上进行推荐。
具体而言,视频的特征可以包括视频的标题、标签、封面图像、时长等信息,而用户行为数据可以包括用户的历史观看记录、点赞、评论等。
首先,深度学习模型可以利用卷积神经网络(CNN)对视频的图像特征进行提取。
通过在网络中引入多层卷积和池化层,CNN可以自动学习到视频中的特征,并将其编码为高维向量表示。
这样的编码可以更好地捕捉到视频的视觉信息,例如颜色、纹理和结构等。
此外,为了增强模型的表示能力,我们还可以引入循环神经网络(RNN)来对视频的帧序列进行建模,从而考虑到视频中的时间关系。
其次,深度学习模型可以利用自编码器(Autoencoder)对视频的音频特征进行提取。
自编码器是一种无监督学习算法,可以学习到数据的紧凑表达。
在视频中,音频可以包含语音、音乐等信息,对用户的喜好产生重要影响。
通过训练一个自编码器,我们可以从视频的音频中提取出有价值的特征信息,以辅助视频流行度的预测和推荐。
在得到视频的视觉和音频特征之后,我们可以将它们输入到一个多层感知器(MLP)中,通过学习视频的特征表示和用户行为之间的关系,来进行视频流行度的预测。
通过训练这个模型,我们可以利用已有的视频数据和用户行为数据进行学习,从而建立一个准确的预测模型。
预测模型可以为视频平台提供培训建议和营销策略,从而更好地满足用户的需求。
同时,基于深度学习的视频流行度预测与推荐还可以应用于视频内容的个性化推荐。
基于机器学习的视频分类技术研究
基于机器学习的视频分类技术研究随着互联网的发展,视频的应用变得越来越广泛。
然而,在互联网上存在着大量的视频,如何对这些视频进行分类、搜索和推荐已经成为了一个亟待解决的问题。
在这个问题中,机器学习技术已经展现出了非常大的潜力。
本文将探讨基于机器学习的视频分类技术研究。
一、视频分类技术的应用视频分类技术是一种对大量视频进行自动分类的技术。
这项技术可应用于视频存储、视频搜索和推荐等方面。
视频分类技术可以对视频进行自动分类,使用户能够更好地获得所需的信息。
例如,在视频搜索中,用户可以通过输入关键词来搜索他们想要的视频,并在搜索结果中找到相应类别的视频。
二、视频分类技术的研究现状目前,视频分类技术最常用的方法是基于视觉特征的方法。
这种方法利用计算机视觉技术从视频中提取特征向量,并将其与事先训练的模型相匹配,从而对视频进行分类。
但是,基于视觉特征的方法不足以准确地描述视频的内容,因为视频的内容比图像更为复杂。
基于此,研究人员提出了新的视频分类技术,如基于语义特征的视频分类技术。
这种方法利用自然语言处理技术,将视频的文本描述转化为特征向量,并进行分类。
此外,还有基于深度学习的视频分类技术,该技术使用深度神经网络从视频中提取特征,并对其进行分类。
三、视频分类技术的研究方法和算法基于视觉特征的方法是目前视频分类技术中最广泛应用的一种方法。
该方法利用视觉信息对视频进行分类。
首先,需要从视频中提取视觉特征,然后将特征向量送入分类器进行分类。
在这个过程中,使用的算法有主成分分析(PCA)、局部二值模式(LBP)等算法。
基于语义特征的方法使用自然语言处理技术来将视频描述转换为特征向量。
该方法可以更准确地描述视频内容,但是需要对大量的文本数据进行处理。
在该方法中,使用的算法有主成分分析(PCA)、朴素贝叶斯(Naive Bayes)和支持向量机(SVM)等算法。
基于深度学习的方法是一种新兴的视频分类技术。
该方法使用深度神经网络从视频中提取特征,并对其进行分类。
无参考视频质量评价的泛化性
确保数据集中包含各种类型的视频,并确保每种类型的视频都有足 够的数量,以便模型能够学习到更全面的特征。
数据增强技术
利用数据增强技术,如随机裁剪、旋转、缩放等,增加数据集的大 小和多样性,提高模型的泛化能力。
模型的泛化性
深度学习模型的泛化性
深度学习模型具有强大的表示能 力,但容易过拟合训练数据。为 了提高模型的泛化性,可以使用 正则化技术,如权重衰减、 dropout等,以减少过拟合。
特征比较
跨帧比较
01
将连续帧之间的特征进行比较,以检测视频中的动态变化和稳
定性。
相似度比较
02
将提取的特征与已知高质量视频的特征进行比较,以评估视频
的质量。
差异度比较
03
将提取的特征与已知低质量视频的特征进行比较,以评估视频
的质量。
质量评估
主观质量评估
通过人类观察者对视频进行质量评估,以获得准确的 评价结果。
通过实验验证,该方法在不同场景下均表现出良好的泛化性能
,能够适应不同的视频内容和质量变化。
与传统方法相比,该方法具有更高的准确性和稳定性,能够更
03
好地满足实际应用需求。
研究不足与展望
虽然本文提出的方法在实验中取得了较 好的效果,但在实际应用中仍存在一些 局限性,例如对于某些特殊场景或复杂 视频的处理能力有待进一步提高。
实验结果
泛化性能
在多个不同数据集上测试 模型的泛化性能,包括自 然场景、室内场景、高清 和标清等。
准确性
评估模型在不同数据集上 的准确性,包括客观评价 指标如PSN分辨率、 不同压缩比、不同编码格 式等条件下的鲁棒性。
结果分析
01
对比分析
产品质量规制与电影在线评分——基于经典估计贝叶斯平均法和倾向得分匹配法
关键词:产业规制 产品质量规制 新产业组织实证 电影产业 后验 PSM
一 、引 言 与 文 献 综 述
近年来,我国 电 影 产 业 发 展 迅 猛,票 房 收 入 节 节 攀 升,国 产 影 片 平 均 票 房 收 入 从 2009 年 的 5546.44万元增加到2019年的24308.85万元,10年增长了 3.38 倍,增 速 高 达 15.92% ①。 然 而,来 自消费者直接反馈的产品评价却 不 尽 如 人 意。 同 期,电 影 在 线 评 分 呈 现 明 显 的 波 动 下 降 趋 势,平 均 分 值 从 2009 年 的 6.22 下 滑 至 2019 年 的 5.43。 影 片 评 分 与 票 房 的 倒 挂 、电 影 产 品 叫 卖 不 叫 好 ,凸 显 了当前我国电影产业发展面临的 现 实 困 境。 一 方 面,随 着 物 质 生 活 水 平 的 显 著 提 升,居 民 的 精 神 文 化消费需求 迅 速 增 加。 尽 管 影 视 作 品 的 供 给 也 在 扩 大,但 总 体 上 仍 较 为 匮 乏 (黄 立 玮、吴 曼 芳, 2018),导致我国电影产品市场竞争 不 足,抑 制 了 市 场 的 甄 别 效 应。另 一 方 面,受 限 于 现 有 的 产 业 规 制体系,我 国 国 产影 片 题材 和风格较 为单一、同质 化现象日趋 严 重(赖春,2017),越来越 难以适 应 互 联网时代下青年消费群体的个性化偏好。
基于深度学习的视频质量评价方法研究综述
基于深度学习的视频质量评价方法研究综述
杨文兵;邱天;张志鹏;施博凯;张明威
【期刊名称】《现代信息科技》
【年(卷),期】2024(8)7
【摘要】互联网时代充斥着海量的质量参差不齐的视频,低质量的视频极大地削弱人的视觉感官体验同时对储存设备造成极大压力,进行视频质量评价(VQA)势在必行。
深度学习理论的发展为视频质量评价提供了新的思路,首先简单介绍视频质量
评价理论知识和传统的评价方法,其次对基于深度学习的评价模型进行神经网络分类——2D-CNN和3D-CNN,并分析模型的优缺点,再次在公开数据集上分析经典
模型的性能表现,最后对该领域存在的缺点和不足进行总结,并展望未来的发展趋势。
研究表明:公开的数据集仍不充足;无参考的评价方法最具发展潜力,但其在公开数据集上的性能表现一般,仍有很大的提升空间。
【总页数】9页(P73-80)
【作者】杨文兵;邱天;张志鹏;施博凯;张明威
【作者单位】五邑大学中国科学院半导体研究所数字光芯片联合实验室
【正文语种】中文
【中图分类】TP391.4;TP18
【相关文献】
1.立体视频质量评价方法的研究综述
2.基于视觉感知的网络视频质量评价方法研究
3.张云研究团队提出一种基于稀疏表示的3D虚拟视点视频质量评价方法
4.基于深度学习的视频质量评价研究综述
因版权原因,仅展示原文概要,查看原文内容请购买。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Bayesian Video Search RerankingXinmei Tian∗Univ.of Sci.&Tech.of China Hefei,Anhui,230027China xinmei@Linjun Y angMicrosoft Research AsiaBeijing,100190Chinalinjuny@Jingdong WangMicrosoft Research AsiaBeijing,100190Chinai-jingdw@Yichen Y ang∗Zhejiang UniversityHangzhou,310027China starswing1987@ Xiuqing WuUniv.of Sci.&Tech.of China Hefei,Anhui,230027Chinawuxq@Xian-Sheng Hua Microsoft Research Asia Beijing,100190China xshua@ABSTRACTContent-based video search reranking can be regarded as a process that uses visual content to recover the“true”rank-ing list from the noisy one generated based on textual in-formation.This paper explicitly formulates this problem in the Bayesian framework,i.e.,maximizing the ranking score consistency among visually similar video shots while mini-mizing the ranking distance,which represents the disagree-ment between the objective ranking list and the initial text-based.Different from existing point-wise ranking distance measures,which compute the distance in terms of the indi-vidual scores,two new methods are proposed in this paper to measure the ranking distance based on the disagreement in terms of pair-wise orders.Specifically,hinge distance pe-nalizes the pairs with reversed order according to the de-gree of the reverse,while preference strength distance further considers the preference degree.By incorporating the pro-posed distances into the optimization objective,two rerank-ing methods are developed which are solved using quadratic programming and matrix computation respectively.Evalu-ation on TRECVID video search benchmark shows that the performance improvement up to21%on TRECVID2006 and61.11%on TRECVID2007are achieved relative to text search baseline.Categories and Subject DescriptorsH.3.3[Information Search and Retrieval]:Retrieval modelsGeneral TermsAlgorithms,Experimentation,Performance∗This work was performed when Xinmei Tian and Yichen Yang were visiting Microsoft Research Asia as research in-terns.Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.MM’08,October26–31,2008,Vancouver,British Columbia,Canada. Copyright2008ACM978-1-60558-303-7/08/10...$5.00.…...…...Video corpusText query:Figure1:Example of video search reranking.Firstly the text search engine returns the video shots relevant to the query“Soccer Match”and then the reranking process is applied to reorder the text search result by mining visual information.KeywordsVideo Search Reranking,Bayesian Reranking,Pair-wise 1.INTRODUCTIONMost of the currently available video search engines are based on“query by keyword”scenario,which are built on text search engines mainly using the associated textual infor-mation such as surrounding text from the web page,speech transcript,closed caption,and so on.However,the perfor-mance of text-based video search is yet unsatisfying,due to the mismatch between surrounding text and the asso-ciated video,as well as the low performance of automatic speech recognition(ASR),video text recognition and ma-chine translation(MT)techniques.Figure1shows a typical process of video search reranking, in which a list of baseline search results is returned through textual information only and visual information is applied to reorder the initial results,so as to refine the text based search results.As illustrated in Figure1,after a query,e.g.,“Soccer Match”,is submitted,an initial ranking list of video segments(i.e.,shots in general)is obtained by text search engine based on the relevance between the associated textualinformation and the query keywords.It is observed that text-based search often returns“inconsistent”results,which means some visually similar ones(and semantically close to each other at the same time in most cases)are scattered in the ranking list,and frequently some irrelevant results are filled between them.For instance,as shown in Figure1,four of the topfive results of the query“Soccer Match”are the relevant samples and visually similar while the other,the anchor person,is not similar.It is reasonably assumed that the visually similar samples should be ranked together.Such a visual consistency pattern within the relevant samples can be utilized to reorder the initial ranking list,e.g.,to assign the anchor person a lower ranking score.Such a process, which reorders the initial ranking list based on some visual pattern,is called content-based video search reranking,or video search reranking in brief.Video search reranking can be regarded as recovering the “true”ranking list from the initial noisy one by using visual information,i.e.,to refine the initial ranking list by incorpo-rating the text cue and visual cue.As for text cue,we mean that the initial text-based search result provides a baseline for the“true”ranking list.Though noisy,it still reflects par-tial facts of the“true”list thus needs to be preserved to some extent,i.e.,to keep the correct information in the initial list. The visual cue is introduced by taking visual consistency as a constraint,e.g.,visually similar video shots should have close ranking scores and vice versa.Reranking is actually a trade-offbetween the two cues.It is worthy emphasizing that this is actually the basic underlying assumption of most of the existing video search reranking approaches,though it may not be clearly presented[7,8,10].In this paper,we model the two cues from the proba-bilistic perspective within a Bayesian framework.The text cue is modeled as a likelihood which reflects the disagree-ment between the reranked list and the initial text-based one;and the visual cue is modeled as a conditional prior which indicates the ranking score consistency between visu-ally similar samples.In the Bayesian framework,reranking is formulated as maximizing the product of the conditional prior and the likelihood.That is the reason that we call the proposed reranking approach Bayesian Reranking.Existing random walk based methods[8,10]can be unified into such a framework.In this paper,we mainly focus on the likelihood term while the conditional prior can be modeled by visual consistency directly.The likelihood is estimated by the ranking dis-tance,i.e.,the disagreement between the reranked list and the initial text-based one.Ranking distance is a crucial fac-tor in video search reranking,which significantly affects the overall reranking performance but has not been well stud-ied before.The point-wise ranking distance,which sums the individual score difference for each sample in the two ranking score lists,is used in existing video search reranking methods[8,10].However,such point-wise approach fails to capture the disagreement between two lists in terms of rank-ing accurately.To tackle this problem,two novel ranking distances are proposed based on the pair-wise order disagree-ment.Specifically,hinge distance penalizes the pairs with reversed order according to the degree to which they are reversed,while preference strength distance further consid-ers the preference degree over pairs.By incorporating the distances into the optimization objective,hinge reranking and preference strength reranking are developed,which are solved by Quadratic Programming(QP)and matrix compu-tation,respectively.The main contributions of this paper are summarized as follows:•We explicitly formulate video search reranking as aglobal optimization problem within a Bayesian frame-work.Many effective reranking methods can be devel-oped under this framework for different applications.•By investigating the effects of ranking distances invideo search reranking,two more reasonable rankingdistance measures,hinge distance and preference strength distance are proposed.•Two reranking methods,hinge reranking and prefer-ence strength reranking,are developed.With deriva-tion,hinge reranking can be solved using quadraticprogramming,while preference strength reranking issolved through matrix computation efficiently.The rest of this paper is organized as follows.Firstly, we briefly review the existing video search reranking meth-ods in Section2.In Section3reranking is formulated in a Bayesian framework and the general reranking model is derived.Two pair-wise ranking distances are developed and the corresponding reranking methods are presented in Sec-tion4.Implementation details for video search reranking are considered in Section5.The connections between our proposed methods and“learning to rank”as well as random walk reranking are presented in Section6.Experimental results and analysis are given in Section7,followed by the conclusion and future work in Section8.2.RELATED WORKRecently many methods are proposed for video search reranking[7,8,10,11,18,19],which can be divided into three categories:PRF(Pseudo-Relevance Feedback)based, clustering based and random walk based.Thefirst category is PRF based[11,18,19].PRF is a concept introduced from text retrieval,which assumes that a fraction of the top-ranked documents in the initial search results are pseudo-positive[3].In PRF based video search reranking there are normally three steps:(1)select the pseudo-positive and pseudo-negative samples from the initial text-based search results;(2)train a classifier using the selected samples;(3)rerank the video shots with the rel-evance scores predicted by the trained classifier.Due to the low performance of text-based video search,the top ranked video shots cannot be used as pseudo positives directly.Al-ternatively,[19]uses the query images or example video clips as the pseudo-positive samples.The pseudo-negative sam-ples are selected from either the least relevant samples in the initial ranking list or the database with the assumption that few samples in the database are relevant to the query [11,19].In step(2),different classifiers,such as SVM[19], Boosting[18],and Ranking SVM[11],can be adopted.Al-though the above classifiers are effective,sufficient training data are demanded to achieve a satisfactory performance since a lot of parameters need to be estimated.The second category is clustering based.In[7],each video shot is given a soft pseudo label according to the initial text-based ranking score,and then the Information Bottle-neck principle is adopted tofind optimal clustering whichmaximizes the mutual information between the clusters and the labels.Reranked list is achieved by ordering the clus-ters according to the cluster conditional probabilityfirstly and then ordering the samples within a cluster based on their local feature density estimated via kernel density es-timation.This method achieves good performance on the named-person queries as shown in[7]while it is limited to those queries which have significant duplicate characteristic.In the third category,random walk based methods[8,10], a graph is constructed with the samples(video shots)as the nodes and the edges between them being weighted by multi-modal similarity.Then,reranking is formulated as random walk over the graph and the ranking scores are propagated through the edges.To leverage the text-based search result, a“dongle”node is attached to each sample with the value fixed to be the initial text-based ranking score.The station-ary probability of the random walk process is used as the reranked score directly.In Section6.2we will show that random walk reranking can be unified into the proposed Bayesian reranking framework,while the adopted ranking distance is actually point-wise,which can not capture the “true”difference between the reranked list and the initial text-based one precisely.There are also methods which incorporate auxiliary knowl-edge,including face detection[12],query example[19],and concept detection[9,10,13],into video search reranking. Though the incorporation of auxiliary knowledge leads to the performance improvement it is not a general treatment. They suffer from either the limited applicability to the spe-cific queries(face detection),the desire of the specific user interfaces(query example)or the limited detection perfor-mance and small vocabulary size(concept detection).In this paper,we only consider the general reranking problem which doesn’t assume any auxiliary knowledge besides the visual information of samples and thus the reranking meth-ods proposed in this paper can be applied to many tasks directly.3.BAYESIAN RERANKING3.1RerankingBefore formulating reranking,a few terms are defined be-low.Definition1.A ranking score list(score list in brief), r=[r1,r2,···,r N]T,is a vector of the ranking scores,which corresponds to a sample set X={x1,x2,···,x N}. Definition2.A ranking list l is a permutation of X sorted by the ranking scores with descending order.Generally,reranking can be regarded as a mapping from the initial ranking list to the objective ranking list.However, the ranking scores are also useful in most situations.For this reason,we define reranking on the score list instead of the ranking list.Definition3.A reranking function is defined asr=f(X,¯r),(1) where¯r=[¯r1,¯r2,···,¯r N]T is the initial ranking score list.Permutating the samples according to this reranking func-tion is called reranking.By defining reranking on the score list instead of the rank-ing list moreflexibility will be achieved[7,8].For the ap-plication scenarios where the initial ranking scores are un-available,such as Google image search reranking[5],the initial score list¯r can be set according to the initial rank of samples,as detailed in5.2.3.2Bayesian RerankingThe crucial problem in reranking is how to derive the optimal reranking function(1).In this paper,we will in-vestigate the reranking problem from the probabilistic per-spective and derive an optimal reranking function based on Bayesian analysis.Supposing the ranking score list is a random variable, reranking can be regarded as a process to derive the most probable score list given the initial one as well as the vi-sual content of samples.From the probabilistic perspective reranking is to derive the optimum r∗with the maximum a posterior probability given the samples X and the initial score list¯r,r∗=arg max r p(r|X,¯r).(2) According to Bayes’formula,the posterior is proportional to the product of the conditional prior probability and the likelihood,p(r|X,¯r)∝p(r|X)×p(¯r|X,r),(3) where p(r|X)is the conditional prior of the score list given the visual content of samples.For instance,the ranking score list with dissimilar scores for visually similar video shots may be assigned a small probability.p(¯r|X,r)is the likelihood,which expresses how probable the initial score list¯r is given the“true”ranking score list r.It can be estimated based on the ranking distance which represents the disagreement between the reranked score list and the initial one,as discussed later.In most of the video search systems,the initial ranking score list is obtained by using the textual information re-gardless of the visual content,therefore the conditional in-dependency assumption of the visual information X and the initial score list¯r given the objective score list r can be made,i.e.,p(¯r,X|r)=p(¯r|r)×p(X|r),hence,p(¯r|X,r)=p(¯r|r).(4) Substituting(4)into(3)we obtainp(r|X,¯r)∝p(r|X)×p(¯r|r).(5) Replacing the posterior in(2)with(5),we formulate rerank-ing as maximizing the product of a conditional prior and a likelihood,which is defined as Bayesian Reranking.Definition4.Bayesian Reranking is reranking using the functionf(X,¯r)=arg maxrp(r|X)×p(¯r|r),(6) where¯r is the initial ranking score list,and X is the corre-sponding samples.In Bayesian Reranking,the likelihood and the conditional prior need to be estimated to complete the reranking func-tion.In the following sections,we will show how to model the prior and likelihood using energy functions.jr i r Figure 2:The graphical model representation for theconditional prior of the ranking list.A graph G is built with the ranking scores as the nodes while the edges being weighted by the visual similarities.3.3The Conditional PriorIn video search reranking,it is expected that visually sim-ilar video shots should have close ranking scores.This canbe modeled in the conditional prior in the Bayesian Rerank-ing formulation.Specifically,we formulate the conditional prior as a pair-wise Markov network,p (r |X )=1Z exp(−i,j ψij (r ,X )),where ψij (r ,X )is the energy function defined on a pair ofsamples {r i ,x i ,r j ,x j },Z is a normalizing constant with Z = r exp(−i,j ψij (r ,X )).A graph G ,as illustrated in Figure 2,is constructed with nodes being the scores in r while weights on edges being the similarities between the corresponding samples.Specifically,the weight w ij on the edge between nodes r i and r j can be computed using Gaussian kernel,w ij =exp(− x i −x j 22σ2),where σis the scaling parameter.Various methods can be used to derive the energy function ψij (r |X ).Based on the assumption that if the samples x i and x j are visually similar then the corresponding scores r i and r j should be close as well and vice versa,so-called visual consistency assumption,the energy function is defined asψij (r ,X )=12w ij (r i −r j )2.(7)Hence the conditional prior isp (r |X )=1Z exp(−12i,j w ij (r i −r j )2),(8)which is widely used in semi-supervised learning,and the ex-ponent is named as Laplacian Regularization [22].An alter-native method,Normalized Laplacian Regularization [21],can also be used to derive the prior,p (r |X )=1Z exp(−12i,j w ij (r i √d i −r j d j)2),(9)where d i =j w ij .From our experimental analysis,Laplacian Regularization performs better than Normalized Laplacian Regularization,therefore we adopt the former in our methods.3.4The LikelihoodThe likelihood is modeled asp (¯r |r )=1Zexp(−c ×Dist (r ,¯r )),(10)where Z is the normalizing constant,c is a scaling parame-ter,and Dist (r ,¯r )is the ranking distance representing the disagreement between the two ranking score list,which will be discussed in detail in Section 4.The graphical model representation is illustrated in Figure 3.Figure 3:The factor graph representation of the rankingdistance between r and ¯r .Table 1:A toy example for ranking distanceSamples x 1x 2x 3x 4x 5r 0 1.00.90.80.70.6r 10.60.70.80.9 1.0r 2 1.50.70.80.9 1.0r 30.50.40.30.20.14.OUR APPROACHESThe Bayesian Reranking formulation in Eq.(6)is equiva-lent to minimizing the following energy function,E (r )=12i,j w ij (r i −r j )2+c ×Dist (r ,¯r ),(11)where the first and second terms correspond to the condi-tional prior in Eq.(8)and the likelihood in Eq.(10)respec-tively,and c can be viewed as a trade-offparameter to thetwo terms.The main work of this paper focuses on the second term,i.e.,the evaluation of ranking distance.4.1Ranking DistanceIn the below we will analyze the issues in existing ranking distances and propose to measure the ranking distance from the pair-wise perspective.A toy example is given for illus-tration,which comprises five samples {x 1,x 2,x 3,x 4,x 5}and four ranking score lists {r 0,r 1,r 2,r 3},as shown in Table 1.Sorting the samples by their scores,the corresponding ranking lists are derived from r 0,r 1,r 2and r 3asl 0=<x 1,x 2,x 3,x 4,x 5>,l 1=<x 5,x 4,x 3,x 2,x 1>,l 2=<x 1,x 5,x 4,x 3,x 2>,l 3=<x 1,x 2,x 3,x 4,x 5>.To measure the ranking distance between the score lists,one intuitive idea is to take each score list as an “instance”and then use the list-wise approach,which has been ex-ploited in “learning to rank”.However,as shown in [2],which defines the distance of two score lists as the cross entropy between the two distributions of permutations conditioned respectively on the each of the score lists,the list-wise ap-proach is computationally intractable since the number of permutations is O (N !)where N is the number of samples.Alternatively,the most direct and simple method to mea-sure the ranking distance between two score lists is to com-pute the individual score difference for each sample respec-tively and then sum them,so-called point-wise approach,as shown below,Dist (r ,¯r )=id (r i ,¯r i )= i(r i −¯r i )2.(12)Figure4:The factor graph representation of point-wise ranking distance in which the ranking distance is com-puted by summing each sample’s distance.The corresponding graphical model representation is il-lustrated in Figure4.Such a point-wise approach has been applied in random walk reranking with a slightly different form,as detailed in Section6.2.Point-wise ranking distance,however,fails to capture the disagreement between the score lists in terms of ranking in some situations.Take the toy example in Table1for illus-tration.The distances between r0and r1,r2,r3computed using Eq.(12)are:Dist(r1,r0)=0.63,Dist(r2,r0)=0.70,and Dist(r3,r0)=1.12.Dist(r3,r0)is the largest,however,in terms of ranking,the distance between r3and r0should be the smallest since l3is identical with l0while different from l1and l2.As the ranking information can be represented entirely by the pair-wise ordinal relations,the ranking distance between two score lists can be computed from the pairs,so-called pair-wise approach.The graphical model representation of pair-wise distance is illustrated in Figure5.Before further discussing pair-wise approach,wefirstly define the notation r.Definition5.x i r x j is a relation on a pair(x i,x j) if r i>r j,i.e.,x i is ranked before x j in the ranking list l derived from r.All the pairs with(x i,x j)satisfying x i r x j compose a set S r={(i,j):x i r x j}.For any two samples x i and x j either(i,j)or(j,i)belongs to S r.Therefore,all the pair-wise ordinal relations are reflected in S r.The simplest pair-wise ranking distance could be definedas below,Dist(r,¯r)=(i,j)∈S¯rδ(x j r x i),(13)whereδ(t)=1,t=ture 0,t=false.The basic idea of(13)is to count the number of pairs which disagree on the order relations in the two ing (13),Dist(r1,r0)=10,Dist(r2,r0)=6,and Dist(r3,r0)=0. It really captures the differences of the ranking lists. However,the optimization problem of(11)with the rank-ing distance(13)is computationally intractable.Below we will define two pair-wise ranking distances with which the optimization problem of(11)can be solvable.4.2Hinge RerankingIntuitively,if a pair’s order relation keeps the same before and after reranking,the distance of this pair will be zero,just as shown in(13).However,if a pair’s order is reversed after reranking,insteadof giving equal penalization(1in(13))Figure5:The factor graph representation of pair-wise ranking distance in which the ranking distance is computed by summing each pair’s distance d ij= d((r i,r j),(¯r i,¯r j)).for each pair,the penalization should be given according to the degree to which the pair’s order is reversed.Hence,we define hinge distance asDist(r,¯r)=(i,j)∈S¯rd((r i,r j),(¯r i,¯r j))=(i,j)∈S¯r[(r j−r i)+]2,(14) where(x)+=0,x≤0x,x>0is the hinge function.Substitute hinge distance(14)into(11)and the following optimization problem is derivedminr12i,jw ij(r i−r j)2+c(i,j)∈S¯r[(r j−r i)+]2, which is equivalent tominr12i,jw ij(r i−r j)2+c(i,j)∈S¯rξ2ij(15) s.t.r i−r j>−ξij,when x i ¯r x j,whereξij is a slack variable.By introducing a factor a which is a small positive con-stant the following quadratic optimization problem is achieved,minr12i,jw ij(r i−r j)2+c(i,j)∈S¯rξ2ij(16) s.t.r i−r j≥a−ξij,when x i ¯r x j.Reranking with the above optimization problem is called hinge reranking since hinge distance is adopted.The op-timization problem(16)can be solved using Interior-Point method[16].In some situations the computational cost is high especially when the constraints are excessive.For in-stance,if there are1000samples in the initial ranking list, there will be about one million constraints.Below,we will develop a more efficient method using a different ranking distance,which can be solved analytically by matrix com-putation.4.3Preference Strength RerankingIn reranking,not only the order relation but also the pref-erence strength,which means the score difference of the sam-ples in a pair,i.e.,r i−r j for the pair(x i,x j),is indicative. For example,given two pairs,one comprising two tigers with different typicality,and the other comprising a tiger and a stone,obviously the preference strength is different for these two pairs.Such information can be utilized in video searchreranking and then we define an alternative ranking dis-tance,called preference strength distance ,as followsDist (r ,¯r )=(i,j )∈S ¯r d ((r i ,r j ),(¯r i ,¯r j ))= (i,j )∈S ¯r (1−r i −r j ¯r i −¯r j )2.(17)From Eq.(17)we can see that with preference strength theorder relations on pairs are also reflected in preference strength ranking distance.Replacing the distance function in (11)with the preference strength distance (17),the optimization problem of prefer-ence strength reranking isminr 12i,j w ij (r i −r j )2+c (i,j )∈S ¯r (1−r i −r j ¯r i −¯r j )2.(18)Supposing one solution of (18)is r ∗,it is apparent that r =r ∗+μe is also the solution of (18),where e is a vector with all elements equal 1and μis an arbitrary constant.Obviously,all solutions give the same ranking list.Here,we simply add a constraint r N =0to (18)where N is the length of r and then the unique solution can be derived,as given in the following proposition.Proposition 1.The solution of (18)with a constraint r N =0isr =12˘L −1˘c ,where ˘Land ˘c are obtained by replacing the last row of ˜L with [0,0,···,0,1]1×N and last element of ˜cwith zero re-spectively.˜L =˜D −˜W and ˜c=2c (Ae )T .˜W =[˜w ij ]N ×N with ˜w ij =w ij +cα2ij .˜D =Diag (˜d )is a degree matrix with ˜d =[˜d 1,···,˜d N ]T and ˜d i = j ˜w ij .A =[αij ]N ×N is aanti-symmetric matrix with αij =1/(¯r i −¯r j ).Proof.min r 12i,j w ij (r i −r j )2+c (i,j )∈S ¯r(1−αij (r i −r j ))2=min r 12 i,j w ij (r i −r j )2+c (i,j )∈S ¯rα2ij (r i −r j )2−2c (i,j )∈S ¯rαij (r i −r j )+const =min r 12i,j (w ij +cα2ij )(r i −r j )2−2c (i,j )∈S ¯r αij (r i −r j )=minr 12i,j ˜w ij (r i −r j )2−2c (i,j )∈S ¯rαij (r i −r j )=min rr T ˜Lr −˜c T r .Take derivatives and equate it to zero gives:2˜Lr=˜c (19)The solution of (19)is non-unique since the Laplacianmatrix ˜Lis singular [4].With the constraint r N =0,we can replace the last row of ˜Lwith [0,0,···,0,1]1×N to obtain ˘L and the last element of ˜c with zero to obtain ˘crespectively.Then,the solution isr =12˘L −1˘c 5.IMPLEMENTATIONAs aforementioned,we have developed two methods forreranking.When applied into video search reranking some implementation details should be considered.5.1Pair SelectionAs can be observed in Eq.(11),ranking distance is actually employed to preserve the information of initial score list to some extent.Currently,all the pairs in S ¯r are involved;however,it will be better to preserver a portion of the pairs,i.e.,a subset of S ¯r ,in some situations where the initial score list is very noisy.Below,we will introduce some methods to select the appropriate pairs.•Clustering(C)We directly use the pair selection method proposed in [11].In this method,the pseudo-positive samples are selected by clustering and then the bottom-ranked samples in the initial ranking list are selected as pseudo-negative.Each one of the pseudo-positive samples and one of the pseudo-negative samples are selected to form a pseudo-positive pair.•Top and Bottom(TB)Usually,the samples ranked at the top of the initial ranking list are more likely to be positive than those which are ranked at the bottom.Based on this as-sumption,the top-ranked samples in the initial rank-ing list are selected as pseudo-positive and the bottom-ranked samples as pseudo-negative and then formed into pseudo-positive pairs.•ρ-adjacentThe ρ-adjacent pair is defined as below.Definition 6.A pair (x i ,x j )which satisfies 0<l x j −l x i ≤ρ,where l x i and l x j are the ranks of x i and x j in ranking list l derived from score list r ,is ρ-adjacent pair in l (or ρ-adjacent pair in r ).For a ranking list l ={x 1,x 2,···,x N },the 1-adjacent pairs are {(x 1,x 2),(x 2,x 3),···,(x N −1,x N )}.It is ap-parent that all other pairs can be deduced from the 1-adjacent pairs based on the transition property of the pair relation r .From this perspective 1-adjacent pairs actually preserve all the necessary information of the ranking list.However,as can be seen from the optimization objectives that the preservation of pair relations is not hard due to the trade-offparameter c ,other pairs than the 1-adjacent ones could also be use-ful.To be noted is that all the pairs in S ¯r are used when ρ=N −1where N is the sample count in the ranking list.5.2Initial ScoreIn video search,the performance of text baseline is of-ten poor and the text scores are mostly unreliable because of the inaccuracy and mismatch of ASR and MT from the video.Besides,in some situations the text search scores are unavailable for reranking,e.g.,in web image search.Below we propose three strategies to assign the initial scores.。