Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Cr

合集下载

压缩空气储能系统用多级离心压缩机进口导叶调节策略研究

压缩空气储能系统用多级离心压缩机进口导叶调节策略研究

Adjustment Strategy of Inlet Guide Vane of MultistageCentrifugal Compressor Applied in CAES *Kai-xuan Wang 1,2Zhi-tao Zuo 1,2,3Qi Liang 1Wen-bin Guo 1Ji-xiang Chen 1,2Hai-sheng Chen 1,2,3,*(1.Institute of Engineering Thermophysics,Chinese Academy of Science;2.University of Chinese Academy of Science;3.National Energy Large Scale Physical Energy Storage Technology R&D Center (Bijie))Abstract:In the process of compressed air energy storage system working,the internal pressure of the gas storage device continues to rise,which requires the compressor to work in a wide pressure ratio range.High efficiency variable operating condition is the core requirement of compressors in compressed air energy storage system.In order to achieve this design goal,it is necessary to adopt appropriate adjustment methods and adjustment strategies.The adjustable inlet guide vane is simple in structure,can be operated in the working process,and can be automated by the servo device,which is one of the most suitable adjustment methods for the compressor in the compressed air energy storage system.This paper takes a 4-stage centrifugal compressor in compressed air energy storage system as the research object,and establishes a performance prediction method suitable for the multi-stage centrifugal compressor under varying working conditions.The performance curve of the single-stage centrifugal compressor is obtained by numerical simulation,and the performance superposition program of the stage is written to obtain the performance curve of the whole machine.The performance data of multistage centrifugal compressor is fitted to polynomial function by least square method,and the adjustable inlet guide vane adjustment strategy program is established by genetic algorithm with isentropic efficiency as optimization objective,inlet guide vane opening as optimization variable and outlet pressure or flow of the whole machine as constraint conditions.Keywords:Inlet Guide Vane Adjustment;Multistage Centrifugal Compressor;Adjustment Strategy摘要:压缩空气储能系统在储能过程中,储气装置内部压力不断升高,这要求压缩机在较大压比范围内工作。

PitchDetection

PitchDetection

The sensations of frequencies are commonly referred to as theA high pitch sound corresponds to a high frequency and a low pitch sound corresponds to a low frequency.Many people are capable of detecting a difference in frequency between two separate sounds which is as little as 2 Hz.When two sounds with a frequency difference of greater than 7 Hz are played simultaneously, most people areWave interference phenomenon which occurs whenalong the same medium. Constructive interference Destructive interference principle of superposition When two waves interfere, the All object has their own nature frequencyEach natural frequency which an object or instrument produces has its own characteristic vibration patterns.These patterns are only created within the object or instrument at specific frequencies of vibration; frequencies, or merelyAt any frequency other than a harmonic frequency, the resulting disturbance of theFor musical instruments and other objects which vibrate in regular and periodic fashion, theby simple whole number ratios.The lowest frequency produced by any particular instrument is known as theFundamental Frequency andHarmonicsHarmonic frequencies of a guitar string.The harmonic frequencies are related to each other by simple whole number ratios.This is part of the reason why such instruments sound musical rather than noisy.The musical pitch of an audio signal is a perceptual feature, relevant only in the context of a human listening to that signal.Pitch is loosely related to the log of the frequency, perceived pitch increasing about an octave with everyan interval slightly more than an octave.The perception of pitch changes with this harmonic content. Pitch perception also changes with intensity, duration and other physical features of the waveform.Because the psychological relationship betweenpitch is well known, pitch detection actually becomesIt is difficult to empirically measure the performance of a f0 estimator for severalFirst, performance depends on domain. Second, it is difficult to automatically rate the result of a f0 estimator againstFor pure tone pitch detection it should suffice to determine the period of the oscillation.When harmonic components are added to a sinusoidal waveform, the appearance of pitch of be considered.If the waveform has few higher harmonics or the power of the higher harmonics is small, the f0 isSeeking to discover how often the waveform fully repeats itself.If a waveform is periodic, then there are extractable time-repeating events that canZero-crossing rate (ZCR)Peak rateSlope event rate.The thought was that the ZCR should be directly related to the number of times the waveform repeated per unit time.If the waveform contains higher-frequency spectral components, then it might cross the zero line more than twice per cycle.Detect zero-crossing pattern and hypothesize a value for f0 based on these patterns.Calculate mean and variance of the zero-crossing rate to increase the robustness and of aPeak Rate:This method counts the number of positive Slop event Rate:If a waveform is periodic, the slope of the It is difficult to detect time-event rate because spectrally complex waveforms Positive aspect is these methods are simple to understand and implement andThe correlation between two waveforms is a measure of their similarity.The waveforms are compared at different time intervals, and their “sameness” is calculated at each interval.The autocorrelation function itself is periodic. Problems with this method arise when the autocorrelation of a harmonically complex, pseudo periodic waveform is taken.The basic phase space representation is to plotsame point.Periodic signal will forms a closed circle.The f0 of a signal is related to the speed with 1979, Martin Piszczalski, complete automatic music transcription systemSpectral transform and identify the partials in the signal using peak detection.For each pair of these partials, the algorithm finds the “smallest harmonic numbers” that would correspond to a harmonic series.Work through each pair and a hypothetic f0 is made. Higher amplitude pair has higher weight.This method does not require that the fundamental frequency of the signal be present, and it works well withFrequency-Domain Method- Filter-Based MethodsFilters are used for f0 estimation by trying different filters with different centre frequencies, and comparing their output.When a spectral peak lines up with the passband of a filter, the result is a higher value in the output. Optimum Comb FilterIt has many equally spaced pass-bands. If a set of regularly spaced harmonics are present in the signal, then the output of the comb filter will be greatest when the passbands of the comb line up with the harmonics. Tunable IIR FilterA more recent filter-based f0 estimator, this method consists of a narrow user-tunable band-pass filter, which is swept across the frequency spectrum.- Cepstrum Analysis The name cepstrumcomes from reversingFourier transform ofthe log of theBy using the logspectrum make theAn improvement that can be applied to any spectral f0 estimation method.If the accuracy of a certain algorithm at a certain resolution is somewhat suspect, Slow, computational intensive.Statistical Frequency DomainMethod1. Neural NetworkThey consist of a collection of nodes, connected bylinks with associated weights. At each node, signalsfrom all incoming links are summed according to theweights of these links, and if the sum satisfies a certain transfer function, an impulse is sent to other nodesIn the training stage, input is presented to the network along with a suggested output, and the weights of thelinks are altered to produce the desired output.In the operation stage, the network is presented withinput and provides output based on the weights of theEven if a good model is found, it does not provide any understanding of how the problem is solved.2. Maximum Likelihood Estimator Most of the models described can be improved by pre-processing the input, reducing the input domain, or byinformation.Human auditory modelingPresently, the most we can do is make the computermake the system more accurate and robust.Frequency estimator trackingHuman system keep tracking pitch information of anThree Detecting MethodsTime domain methodTime-event RateZRCPeak RateSlop Event rateAutocorrelationFrequency domain methodStatistical methodGeneral ImprovementsHuman Auditory ModelingFrequency Estimator TrackingDiscussion。

IEEESignalProcessingSociety:IEEE信号处理学会

IEEESignalProcessingSociety:IEEE信号处理学会
Juan Liu, for the paper co-authored with Pierre Moulin, entitled, "Information-Theoretic Analysis of Interscale and Intrascale Dependencies Between Image Wavelet Coefficients," published in the IEEE Transactions on Image Processing, Volume 10, Number 11, November 2001. S. Basu
Jerome N. Shapiro, for the paper entitled, "Embedded Image Coding Using Zerotrees of Wavelet Coefficients," published in the IEEE Transactions on Signal Processing, Volume 41, Number 12, December 1993.
Digital Signal Processing
Avideh Zakhor, for the paper co-authored with Søren Hein entitled, "Reconstruction of Oversampled Band-Limited Signals from Sigma-Delta Encoded Binary Sequences," published in the IEEE Transactions on Signal Processing, Volume 42, Number 4, April 1994.

一种基于MELP模型600bps声码器的设计

一种基于MELP模型600bps声码器的设计

一种基于MELP模型600bps声码器的设计石乔林;韦凯;吴辉【摘要】The paper describes a 600bps speech coder based on MELP (enhanced mixed excitation linear prediction) algorithm. Consecutive three speech frames are grouped into super-frame and are jointly quantized by utilizing inter-frame redundancy in coder. The LSF vector is quantized with multi-mode predictive and multistage matrix quantization that handle mode transition by predictive coefficient and different mode in super-frames. The efficiency of the quantization is improved by joint quantization of pitch and gain. All of that make the quality of the synthetic voice better ever at 600bps.% 基于增强型混合激励线性预测(MELPe)模型,设计了一款600bps低速率语音编码器。

该编码器在保持MELPe算法特征的同时,利用相邻帧的帧间冗余,把连续的三帧构成一个超帧,对超帧采用多模式预测和多级矩阵量化技术进行联合量化。

同时针对超帧的不同模式,通过预测系数对相邻超帧的模式转换进行处理,实现线谱对参数(LSF)的矢量量化。

最后对基音周期与增益参数进行联合量化,进一步提高量化效率,完成一款在600bps下仍具有较好合成语音质量的语音编码器的设计。

天津大学信息与通信工程考研复习辅导资料及导师分数线信息

天津大学信息与通信工程考研复习辅导资料及导师分数线信息

天津大学信息与通信工程考研复习辅导资料及导师分数线信息天津大学信息与通信工程考研科目包括政治、外语、数学一以及通信原理、信号与系统。

主要研究方向分为两个,方向一考试科目为通信原理,方向二考试科目为信号与系统,此专业是报考人数较多的专业,考生需进一步把握备考方向。

考试科目备注专业代码、名称及研究方向081000信息与通信工程①101思想政治理论②201英语一③301数学一④814通信原理①101思想政治理论②201英语一③301数学一④815信号与系统天津大学信息与通信工程考研录取情况院(系、所) 专业 报考人数 录取人数信息与通信工程506 95 电子信息工程学院(2012年)信息与通信工程463 92 电子信息工程学院(2013年)天津大学信息与通信工程2012年的报考人数为506人,录取人数为95人,2013年的报考人数为463人,录取人数为92人。

由真题可以发现,现在考点涉及的广度和深度不断扩宽和加深。

由天津考研网签约的天津大学在读本硕博团队搜集整理了天津大学电子信息工程学院信息与通信工程考研全套复习资料,帮助考生梳理知识点并构建知识框架。

真题解析部分将真题按照知识点划分,条理清晰的呈现在同学们眼前。

然后根据各个考点的近几年真题解析,让同学对热点、难点了然于胸。

只有做到了对真题规律和趋势的把握,8—10月底的提高复习才能有的放矢、事半功倍!天津大学电子信息工程学院信息与通信工程考研导师信息刘开华纵向课题经费课题名称情境感知服务位置信息获取机理与算法2009-01-01--2011-12-31负责人:刘开华科技计划:国家基金委拨款单位:国家基金委合同经费:32 课题名称智能航空铅封技术研究2010-01-01--2012-12-31 负责人:刘开华科技计划:天津市科技支撑计划重点项目拨款单位:天津市科学技术委员会合同经费:50 横向课题经费课题名称基于相位法的RFID定位技术2013-01-01--2013-12-31 负责人:刘开华科技计划: 拨款单位:中兴通信有限公司合同经费:16课题名称基于ADoc芯片组的产品开发2008-09-01--2009-08-31 负责人:刘开华科技计划: 拨款单位:THOMSON宽带研发(北京)有限公司合同经费:6.3 期刊、会议论文Tan, Lingling; Bai, Yu; Teng, Jianfu; Liu, Kaihua; Meng, WenqingTrans-Impedance Filter Synthesis Based on Nodal Admittance Matrix Expansion CIRCUITS SYSTEMS AND SIGNAL PROCESSINGnullTan, Lingling; Liu, Kaihua; Bai, Yu; Teng, Jianfu Construction of CDBA and CDTA behavioral models and the applications in symbolic circuits analysis ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSINGnullMa Yongtao,Zhou Liuji,Liu Kaihua A Subcarrier-Pair Based Resource AllocationScheme Using SensorsnullMa Yongtao,Zhou Liuji,Liu Kaihua, Wang Jinlong Iterative Phase Reconstruction and Weighted IEEE sensorsnull罗蓬,刘开华,闫格基于FrFT能量重心谱校正的LFM信号参数估计信号处理null 潘勇, 刘开华,等 A novel printed microstrip antenna with frequency reconfigurable characteristics for Bluetooth/WLAN/WiMAX applications Microwave and Optical Technology Lettersnull阎格,刘开华,吕西午基于分数阶Fourier变换的新型时频滤波器设计哈尔滨工业大学学报nullLin Zhu, Kaihua Liu, Zhang Qijun, Yongtao Ma and Bo Peng An enhanced analytical Neuro-Space Mapping method for large-signal microwave device modeling null罗蓬,刘开华,于洁潇,马永涛一种相干宽带线性调频信号的波达方向估计新方法通信学报nullLin Zhu, Yongtao Ma, Qijun Zhang and Kaihua Liu An enhanced Neuro-Space Mapping method for nonlinear device modeling nullYue Cui, Kaihua Liu, Junfeng Wang Direction-of-arrival estimation for coherent GPS signals based on oblique projection Signal ProcessingnullLV Xi-wu, LIU Kai-hua, et al. Efficient solution of additional base stations in time-of-arrival positioning systems Electronics Lettersnull省部级以上获奖刘开华;等数字电视接收系统、软件技术的研发与应用”天津市科技进步奖三等奖2011-04-29李华;刘开华;等数字视频压缩与码流测试技术的研发及应用天津市科技进步奖二等奖2009-04-29知识产权刘开华, 于洁潇高速公路上车辆的车速和相对位置实时测量系统及方法刘开华;潘勇;于洁潇;陈征一种基于无联网的车载自动实时监控远程终端刘开华,黄翔东,于洁潇,王兆华,闫格基于相位差测距的RFID无线定位方法王安国纵向课题经费课题名称基带处理与天线协同2007-07-16--2011-11-16 负责人:王安国科技计划:国家科技部拨款单位:财政部合同经费:157.41课题名称无线网络多源稀疏协作编码研究2011-01-01--2013-12-31 负责人:韩昌彩科技计划:国家基金委拨款单位:国家基金委合同经费:20横向课题经费课题名称具有波束多选择性的多频段可重构天线研究2013-01-01--2014-12-31 负责人:王安国科技计划: 拨款单位:东南大学毫米波国家重点实验室合同经费:5课题名称双方向图算法在室内定位中的应用2012-01-01--2012-12-31 负责人:冷文科技计划: 拨款单位:中兴通讯股份有限公司合同经费:14.5期刊、会议论文马宁王安国姬雨初石和平Cooperative Space Shift Keying for Multiple-Relay Network IEEE Communications Lettersnull裴静王安国高顺,冷文Miniaturized Triple-Band Antenna With a Defected Ground Plane for WLAN/WiMAX Applications IEEE Antennas and Wireless Propagation Lettersnull赵国煌王安国冷文陈彬陈华Wideband internal antenna with coupled feeding for 4G mobile phone Microwave and Optical Technology Lettersnull陈彬王安国赵国煌Design of a novel ultrawideband antenna with dualband-notched characteristics Microwave and Optical technology lettersnull 蔡晓涛王安国马宁冷文 A Novel Planar Parasitic Array Antenna with Reconfigurable Azimuth pattern IEEE Antennas and Wireless Propagation Lettersnull 马宁王安国聂仲尔曲倩倩姬雨初Adaptive Mapping Generalized Space Shift Keying Modulation China Communicationsnull王安国蔡晓涛冷文带寄生贴片的圆盘形方向图可重构天线设计电波科学学报null 王安国陈彬冷文赵国煌一种小型化五频段可重构蝶形天线的设计电波科学学报null蔡晓涛王安国马宁冷文Novel radiation pattern reconfigurable antenna with six beam choices The Journal of China Universities of Posts and Telecommunicationsnull 曲倩倩王安国聂仲尔郑剑锋Block Mapping Spatial Modulation Scheme forMIMO Systems The Journal of China Universities of Posts and Telecommunicationsnull王安国刘楠兰航方向图可重构宽带准八木天线的设计天津大学学报null李锵纵向课题经费课题名称基于稀疏核支持向量机的音乐自动分类系统关键技术研究2009-06-01--2010-06-01 负责人:李锵科技计划: 拨款单位:天津大学建筑设计研究院合同经费:3课题名称jg预研项目2010-03-01--2010-12-01 负责人:李锵科技计划:拨款单位:渤海石油运输有限责任公司合同经费:3课题名称超声波热治疗中非侵入式温度成像与弹性成像关键技术研究2015-01-01--2018-12-31 负责人:李锵科技计划:国家自然科学基金项目拨款单位: 国家自然科学基金委员会合同经费:85课题名称高等学校学科创新引智计划综合管理平台的设计与开发2010-04-01--2012-04-01 负责人:李锵科技计划: 拨款单位:苏州国芯科技有限公司合同经费:3横向课题经费课题名称微粒捕集器数据采集系统开发2008-01-01--2008-06-01 负责人:李锵科技计划: 拨款单位:润英联新加坡私人有限公司合同经费:22.5课题名称电子系统可靠性增长建模与仿真2006-12-01--2008-01-01 负责人:李锵科技计划: 拨款单位:中国人民解放军海军航空工程学院合同经费:5期刊、会议论文李锵,滕建辅,赵全明,李士心Wavelet domain Wiener filter and its application in signal denoising null张立毅,李锵,刘婷,滕建辅The research of the adaptive blind equalizer's steady residual error null徐星,李锵,关欣Chinese folk instruments classification via statistical features and sparse-based representation null张立毅,李锵,刘婷,滕建辅Study of improved constant modulus blind equalization algorithm null张立毅,孙云山,李锵,滕建辅Study on the fuzzy neural network classifier blind equalization algorithm null郭继昌,滕建辅,李锵Research of the gyro signal de-noising method based on stationary wavelets transform null肖志涛,于明,李锵,国澄明Symmetry phase congruency: Feature detector consistent with human visual system characteristics nullCai wei,李锵,关欣Automatic singer identification based on auditory features. null李锵,滕建辅,王昕,张雅绮,郭继昌Research of gyro signal de-noising with stationary wavelets transform null郭继昌,滕建辅,李锵,张雅绮The de-noising of gyro signals by bi-orthogonalwavelet transform nullLiu Tianlong,李锵,关欣Double boundary periodic extension DNA coding sequence detection algorithm combining base content null关欣,滕建辅,李锵,苏育挺Blind acoustic source separation combiningtime-delayed autocorrelation and 4TH-order cumulants null张立毅,李锵,滕建辅Kurtosis-driven variable step size blind equalization algorithm with constant module nullQin Lu,李锵,关欣Pitch Extraction for Musical Signals with Modified AMDF null Zhang Xueying,李锵,关欣The Improved AMDF Gene Exon Prediction null 李锵,Jian Dong,Ming-Guo Wang,滕建辅Analysis and simulation of antenna protocol optimization for ad hoc networks nullFeng Yanyan,李锵,关欣Entropy of Teager Energy in Wavelet-domain Algorithm Applied in Note Onset Detection nullBao Hu, Li ShangSheng, 李锵,滕建辅Research on the technology of RFSS in large-scale universal missile ATE null张立毅,Haiqing Cheng,李锵,滕建辅 A research of forward neural network blind equalization algorithm based on momentum term null张立毅,李锵,滕建辅 A New Adaptive Variable Step-size Blind Equalization Algorithm Based on Forward Neural Network nullYutao Ma,李锵,Chao Li,Kun Li,滕建辅Design of active transimpedanceband-pass filters with different Q values International Journal of Electronicsnull 夏静静,李锵,刘浩澧,Wen-shiang Chen,Po-Hsiang Tsui An Approach for the Visualization of Temperature Distribution in Tissues According to Changes in Ultrasonic Backscattered Computational and Mathematical Methods in Medicinenull 耿晓楠,李锵,崔博翔,王荞茵,刘浩澧超声温度影像与弹性成像监控组织射频消融南方医科大学学报null谭玲玲, 李锵, 李瑞杰, 滕建辅Design of transimpedance low-pass filters International Journal of Electronicsnull李锵,李秋颖,关欣基于听觉图像的音乐流派自动分类天津大学学报(自然科学与工程技术版)nullChong Zhou, Wei Pang, 李锵, Hongyu Yu, Xiaotang Hu, HaoZhang, Extracting the Electromechanical Coupling Constant of Piezoelectric Thin Film by the High-Tone Bulk Acoustic Resonator IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Controlnull朱琳, 李锵, 刘开华基于ADS的声表面波单端对谐振器建模压电与声光null董丽梦, 李锵, 关欣基于稀疏表示分类器的音乐和弦识别系统研究计算机工程与应用null关欣,李锵,田洪伟基于差分全相位MFCC的音符起点自动检测计算机工程null 关欣,李锵,郭继昌,滕建辅二、四阶组合时延统计量多乐器盲分离计算机工程与应用null杨甲沛, 李锵, 刘郑, 袁晓琳基于自适应学习速率的改进型BP算法研究计算机工程与应用null李锵, 张法朝, 张瑞峰System design of DPF data recorder and data analysisnull李锵, 袁晓琳, 杨甲沛Application of ant colony algorithm in the optimization of the time environmental conversion factor of the reliability models null 张立毅,白煜,李锵,滕建辅复数系统中五二阶归一化积累盲均衡算法的研究通信学报null郭继昌,关欣,李锵,刘志杨红外图像预处理系统中模拟视频输出时序设计电子技术应用null关欣,滕建辅,李锵,苏育挺,Wang Shu-Yan Blind source separation combining time-delayed second and fourth order statistics 天津大学学报(自然科学与工程技术版)null张立毅,李锵,滕建辅复数系统中三、二阶归一化累积量盲均衡算法的研究计算机工程与应用null张立毅,李锵,滕建辅经典盲均衡算法中稳态剩余误差的分析天津大学学报null 滕建辅,董健,李锵,关欣Design of maximally flat FIR filters based on explicit formulas combined with optimization 天津大学学报(英文版)null郭继昌,陈敏俊,李锵,关欣红外焦平面失效元处理方法及软硬件实现光电工程null 马杰,王昕,李锵,滕建辅基于特征值和奇异值分解方法的盲分离天津大学学报(自然科学与工程技术版)null李锵,郭继昌,关欣,滕建辅基于通用DSP的红外焦平面视频图像数字预处理系统天津大学学报(自然科学与工程技术版)null李锵,郭继昌,关欣,刘航,童央群基于DSP的红外焦平面视频图像数字处理系统的设计测控技术null马杰,滕建辅,李锵具有参考噪声源的多路传感器信号盲分离方法测控技术null 周郭飞,李锵,滕建辅微带扇形分支线在低通滤波器设计中应用电子测量技术null 李锵,滕建辅,李士心,肖志涛小波域Wiener滤波器信号的去噪方法天津大学学报(自然科学与工程技术版)null肖志涛,于明,李锵,唐红梅,国澄明Log Gabor小波性能分析及其在相位一致性中应用天津大学学报(自然科学与工程技术版)null罗批,李锵,郭继昌,滕建辅Improved genetic algorithm and its performance analysis 天津大学学报(英文版)null罗批,郭继昌,李锵,滕建辅一种实用的电子线路参数优化算法电路与系统学报null 罗批,李锵,郭继昌,滕建辅基于偏最小二乘回归建模的探讨天津大学学报null 知识产权李锵,闫志勇,关欣一种结合SVM和增强型PCP特征的和弦识别方法中国2014100089231李锵, 冯亚楠, 关欣基于Teager能量熵的音符切分方法学术专著(关欣, 杨爱萍, 白煜, 李锵), 信号检测与估计:理论与应用(译著), 电子工业出版社2012-01-31(白煜, 李锵), 模拟集成电路设计的艺术(译著), 人民邮电出版社2010-11-04(李锵,周进等), 无线通信基础(译著), 人民邮电出版社2007-06-30(李锵,董健,关欣,鲍虎), 数字通信(原书第2版)(译著), 机械工业出版社2006-02-28(张为,关欣,刘艳艳,李锵), 电子电路设计基础(译著), 电子工业出版社2005-10-01(张雅绮,李锵等), Verilog HDL高级数字设计(译著), 电子工业出版社2005-01-31(李锵,侯春萍,赵宇), 网络(原书第2版)(译著), 机械工业出版社2004-11-30(李锵,郭继昌), 无线通信与网络, 电子工业出版社2004-06-30本文内容摘自《天津大学814通信原理考研红宝书》,更多考研资料可登陆网站下载!。

基于小波阈值的多语音增强谱联合噪声估计算法(IJIGSP-V11-N9-5)

基于小波阈值的多语音增强谱联合噪声估计算法(IJIGSP-V11-N9-5)

I.J. Image, Graphics and Signal Processing, 2019, 9, 44-55Published Online September 2019 in MECS (/)DOI: 10.5815/ijigsp.2019.09.05Speech Enhancement based on Wavelet Thresholding the Multitaper Spectrum Combined with Noise Estimation AlgorithmP.SunithaResearch Scholar, Dept. of ECE, JNTUK,IndiaEmail:Sunitha4949@,Dr.K.Satya PrasadRetd.Professor, Dept. of ECE, JNTUK,IndiaEmail:sprasad.kodati@Received: 26 May 2019; Accepted: 26 June 2019; Published: 08 September 2019Abstract—This paper presents a method to reduce the musical noise encountered with the most of the frequency domain speech enhancement algorithms. Musical Noise is a phenomenon which occurs due to random spectral speaks in each speech frame, because of large variance and inaccurate estimate of spectra of noisy speech and noise signals. In order to get low variance spectral estimate, this paper uses a method based on wavelet thresholding the multitaper spectrum combined with noise estimation algorithm, which estimates noise spectrum based on the spectral average of past and present according to a predetermined weighting factor to reduce the musical noise. To evaluate the performance of this method, sine multitapers were used and the spectral coefficients are threshold using Wavelet thresholding to get low variance spectrum .In this paper, both scale dependent, independent thresholdings with soft and hard thresholding using Daubauchies wavelet were used to evaluate the proposed method in terms of objective quality measures under eight different types of real-world noises at three distortions of input SNR. To predict the speech quality in presence of noise, objective quality measures like Segmental SNR ,Weighted Spectral Slope Distance ,Log Likelihood Ratio, Perceptual Evaluation of Speech Quality (PESQ) and composite measures are compared against wavelet de-noising techniques, Spectral Subtraction and Multiband Spectral Subtraction provides consistent performance to all eight different noises in most of the cases considered.Index Terms—Speech Enhancement, Wavelet thresholding, Multitaper Power Spectrum, Noise power estimation, smoothing parameter, SNR, threshold.I.I NTRODUCTIONSpeech is a basic way of communicating ideas from one person to another. This speech is degraded due to background noise .To reduce this background noise numerous speech enhancement algorithms were available, among them spectral subtractive algorithms are more popular because of their simple implementation and their effectiveness. In these algorithms Noise power spectrum is subtracted from the noisy power spectrum by assuming the noise spectrum is available. These methods introduce musical noise due to inaccurate estimate of noise. These spectral subtractive algorithms works well in stationary noise but they fails in non-stationary noise. This led to the use of low variance spectral estimation methods because spectral estimation plays a key role in speech enhancement algorithms. To reduce the variance an average of estimate can be calculated across all frequencies. To improve the speech quality and intelligibility in presence of highly non stationary noise, a speech enhancement algorithm requires noise estimation algorithms which update the noise spectrum continuously. Most of the speech enhancement applications in non-stationary scenarios use noise estimation methods algorithms which track the noise spectrum continuously. Now, researchers focus their attention to improve the speech quality and intelligibility using efficient noise estimation algorithms. Estimate of noise signal strongly depends on the smoothing parameter. If its value is too large i.e closer to one results in over estimation of the noise level. Generally, smoothing parameter is set to be small during speech activity to track the non-stationary of the speech. This makes the smoothing parameter as time and frequency dependent, taking into the consideration of speech presence or absence probability. Numerous noise estimation algorithms are available in literature. One among them is minimum statistics algorithm, proposed in [1] estimates the noise by considering the instantaneous SNR of speech using smoothing parameter and bias correction factor. It tracks the minimum over a fixed window and updates the noise PSD. The performance of this method was tested under non-stationary noise it results in large error and it is unable to respond for fast changes in increasing levels of noise power. Martín .R implemented spectral subtraction with minimum statistics and its performance was evaluated in terms of both objective and subjective measures. This was comparedagainst spectral subtraction method that uses voice activity detection which results in improved speech intelligibility measures [2].Another variant of minimum statistics suggested in [3] implements estimates noise by continuous spectral minimum tracking in sub bands. In this method a different approach was used to obtain spectral minimum, by smoothing the noisy speech power spectra continuously using a non-linear smoothing rule. This non-linear tracking provides continuous smoothing over PSD without making any distinction between speech presence and absence segments. The shortcoming of this was when noise power spectrum increases, then the noise estimate increases irrespective of changes in the noise power level. Similarly when the noisy power is decreasing then the noise power is decreasing .This will results in overestimation of speech during speech presence regions i.e clipping of speech. This method was evaluated in terms of objective and subjective quality measures it shows its superior performance over Minimum statistics algorithm. The non-uniform effect of noise on the speech spectrum affects few frequency components severely than others. This led to the use of time recursive noise estimation algorithm which updates the noise spectrum when the effective SNR in a particular band is too small [4]. In this method noise spectrum is estimated as a weighted average of past and present estimates of noisy power spectrum depending on the effective SNR in each frequency bin .This algorithm works well in tracking the non-stationary noise in case of multitalker babble noise. Another type of recursive algorithm ,which uses a fixed smoothing factor, but the noise spectrum should be updated based on the comparison of the estimated a-posteriori SNR over a threshold[5].If this a-posteriori SNR is larger than the threshold indicates that speech presence and no update is required for noise spectrum. Otherwise it is treated as a speech absence segment, which requires a noise updating. This method is well known as weighted spectral averaging. In this method the threshold value, have a significant effect on the noise spectrum estimation .If the threshold value is too small noise spectrum is underestimated, conversely the threshold value is too high then the spectrum is over estimated. Improvements to the Minimum statistics was suggested in [6] by using optimal smoothing for noise power spectral density estimation. Cohen proposed noise estimation algorithm, which uses time-frequency dependent smoothing factor which requires continuous updating depending on the speech presence probability in each frequency bin .Speech presence probability was calculated as the ratio of the noisy power spectrum to its local minimum [6].This local minimum is computed considering the smoothed noisy PSD, over a fixed window by sample wise comparison of noisy PSD. This has a short coming ,it may lag when the noise power is raising from the true noise PSD .To address this shortcoming, a different approach was suggested in[8,9] uses continuous spectral minimal tracking and frequency dependent threshold was used to identify the speech presence segments .This method was evaluated in terms of subjective preference tests over other noise estimation algorithms like MS and MCRA. This method shows better performance .Further refinement to this algorithm was reported in [10] i.e noise power spectrum estimation in adverse environments by Improved Minima Controlled Recursive Averaging (IMCRA).This method involves two steps smoothing and minimal tracking .Minimal tracking provides Voice Activity Detection in each frame whereas smoothing excludes strong speech components. Speech presence probability is calculated using a –posteriori and a-priori SNRs. This method yields in lower values of error for different types of noise considered.The structure of the paper is as follows, Section II provides Literature review , Multitaper spectral estimation and spectral refinement is given in Section III ,noise estimation by weighted spectral averaging technique was presented in section IV ,section V presents proposed speech enhancement method, results and discussion in VI and finally section VII gives conclusion.II.L ITERATURE R EVIEWThis section presents literature review on spectral subtractive type algorithms for single channel enhancement techniques. In the past, number of researchers proposed different speech enhancement methods. Most of them are based on Spectral Subtraction (SS), Statistical Model based, Sub space algorithms and Transform based methods. One of the popular noise reduction method which is computationally efficient and less complexity for single channel speech enhancement is Spectral subtraction proposed by Boll S.F for both Magnitude and Power Spectral Subtraction which itself creates a bi-product named as synthetic noise[17].A significant improvement to spectral subtraction with over subtraction factor and spectral floor parameter to reduce the musical noise given by Berouti [19]is Non –Linear Spectral subtraction . Multi Band Spectral Subtraction (MBSS) proposed by S.D. Kamath with multiple subtraction factors in non-overlapping frequency bands [18] .Ephraim and Malah proposed spectral subtraction with MMSE using a gain function based on priori and posteriori SNRs[20].Spectral subtraction based on perceptual properties using masking properties of human auditory system proposed by Virag [21].Another method in spectral subtraction with Wiener filter to estimate the noise spectrum is extended spectral subtraction by Sovka [22]. Spectral Subtraction algorithm based on two-band is Selective spectral subtraction described by He,C.and Zweig,G. [23].Spectral subtraction with Adaptive Gain Averaging to reduce the overall processing delay given by Gustafsson et al[24].A frequency dependent spectral subtraction is non-linear spectral subtraction (NSS) method conferred by Lockwood and Boudy[25].The spectral subtractive type algorithms works well in case of additive noise but fails in colored noise. To overcome this problem Hu and Loizou proposed a Speech enhancement technique based on wavelet thresholding the multitaper spectrum [11] and its performance is evaluated in terms of objective quality measures.III. M ULTITAPER S PECTRAL E STIMATION A ND S PECTRALR EFINEMENT Due to sudden changes and sporadic behavior, Speech signal can be modeled as a non- stationary signal. As time evolves the statistics like mean, variance, co-variance and higher order moments of a non-stationary signal changes over time. Spectral analysis plays a major rule in speech enhancement techniques to get accurate noise estimation. FFT method is widely used to get power spectrum estimation in most of the speech enhancement algorithms especially in spectral subtractive type methods. The estimated power spectrum obtained by FFT is reduced by variance of the estimate and energy leakage across frequencies which create bias. To avoid leakage, multiply the signal in time domain with a suitable window which having less energy in side lobes. Type of window affects the noise estimate in speech enhancement algorithms, hence selection of desirable window which provides an accurate noise estimation plays a significant role in Speech enhancement process .Generally Hamming window is preferable with less energy in side lobes but it effects the estimate by reducing leakage but not the variance. In most of the speech enhancement algorithms noise estimate is obtained by using suitable windows which reduce the bias but not the variance. The variance can be reduced by taking multiple estimates from the sample which can be achieved by using tapers. Hu and Loizou [11] used these multi-tapers to get low variance spectral estimate ,further the spectrum was refined using wavelet thresholding ,Finally this was used to improve the quality of speech signal in case of highly non-stationary noise. Results shows that this method has superior performance in terms of quality measures with high correlation between subjective listening test and objective quality measures. Speech enhancement techniques find wide range of applications like hearing aids to personal communication, teleconferencing, Automatic Speech Recognition (ASR), Speaker Authentication and Voice operated Systems. The multitaper spectrum estimator is given byS ̂mt (ω)=1L ∑Ŝpmt L−1p=0(ω) (1) WithS ̂p mt (ω)=|∑b p (m )x (m )e−jωm N−1M=0|2 (2) Here data length is given by N and b p is the p thsine taper used for spectral estimate [12] and b p is given byb p (m )=√2N+1sinπp(m+1)N+1,m =0,.N − 1 (3)Further refinement of spectrum is obtained by applying wavelet thresholding techniquesv (ω)=S ̂mt (ω)S(ω)~X 2L22L,0<ω<π (4)Where v (ω)is the ratio of the estimated multitaper spectrum to the true power spectrum .Taking logarithm on both sides, we getlogŜmt (ω)=log S (ω)+logv(ω) (5)From this equation , we conclude that sum of the true log spectrum and noise can be treated as log of multitaper spectrum. If L is at least equivalent to 5 then logv(ω) will be nearer to normal distribution and the random variable n (ω) is given byn (ω)=logv (ω)−∅(L )+log (L ) (6)Z (ω) is defined asZ (ω)=logS ̂mt (ω)−∅(L )+log (L). (7)The idea behind multitaper spectral refinement [11] can be summarized as1. Obtain the multitaper spectrum of noisy speech using orthogonal sine tapers by equation1 .2. Apply Dabauchies Discrete Wavelet Transform to get the DWT coefficients.3. Perform thresholding procedure on the DWT coefficients.4. Apply Inverse Discrete Wavelet Transform to get the refined log spectrum.Fig.1. Multiple window method for spectrum estimation by individualwindows.Fig.2. Speech signal(mtlb.wav)(a)(b)(c)Fig.3. (a),(b) and (c) Spectrum obtained by N=1,2,3.’N’ is the number oftapersFig.4. Final spectrum obtained by averagingIV.N OISE E STIMATION B Y W EIGHTED S PECTRALA VERAGINGNoise estimation algorithms works on the assumption that the duration of analysis segment is too long enough that it should contain both low energy segments and speech pauses .Noise present in analysis segment is more stationary than speech. This paper uses noise estimation based on the variance of the spectrum suggested in [12].The noise spectrum updating will take place when the magnitude spectrum of noisy speech falls within a variance of the noise estimate. The noise spectrum was updated based on the following condition.|Ŝmt(λ,K)|−σd(λ,K)<ϵ√Var d(λ,K) (8) Where Var d(λ,K)represents the instantaneous variance of the noise spectrum and ϵis a adjustable parameter, |Ŝmt(λ,K)|is multitaper magnitude spectrum and σd(λ,K) is the estimate of the noise PSD .The variance of the noise spectrum was evaluated using the recursive equationVar d(λ,K)=δ Var d(λ−1,K)+(1−δ)[|Ŝmt(λ,K)|−σd(λ,K)]2 (9)Where δis a smoothing parameter. ‘λ, is a frame index and ‘K’ is a frequency bin .The noise estimation algorithm can be summarized as ifIf|Ŝmt(λ,K)|−σd(λ−1,K)<ϵ√Var d(λ−1,K)σd(λ,K)=α σd(λ−1,K)+(1−α)|Ŝmt(λ,K)|(9)Var d(λ,K)=δ Var d(λ−1,K)+(1−δ)[|Ŝmt(λ,K)|−σd(λ,K)]2(10)Elseσ̂d(λ,K)=σ̂(λ−1,K) (11)This paper uses this weighted spectral averaging method for noise estimation from noisy power spectrum using the parameters δ=α=0.9 and ϵ=2.5.V.P ROPOSED M ETHODThe implementation details of speech enhancement method can be given as follows:1. Obtain the multi taper estimate of the Noisy speech using sine tapers using equation12. Perform spectral refinement with the help of wavelet thresholding procedure, which involves Forward Discrete Wavelet Transform (FDWT), Thresholding and Inverse Discrete Wavelet Transform (IDWT).In this paper Dabauchies wavelets were used at level 5 decompition by using both soft and hard thresholding.3. Compute Z(ω)from the equation (6) and apply Discrete Wavelet Transform to Z(ω)then threshold the multitaper spectrum for further refinement of spectrum and the refined log spectrum .4. Estimate of the noise can be evaluated using weighted spectral recursive averaging algorithm discussed in section IV.5. Perform multitaper spectral subtraction between the refined log spectrum of noisy speech and noise spectrumto get an estimate of Clean Speech spectrum .S ̂x ωmt (ω)=S ̂y ωmt (ω)−S ̂n mt (ω) (12)and it results in negative values which are rounded asS ̂X mt ={S ̂y mt −S ̂n mt , if S ̂y mt >S ̂n mt βS ̂n mt , if S ̂y mt ≤ S ̂n mt ,(13) Where ‘ β’ is spectral floor parameter .6. Finally the enhanced speech Signal can be reconstructed using Inverse Discrete Fourier Transform and overlap- add method.Fig.5. Block diagram of proposed methodVI. R ESULTS A ND D ISCUSSIONAssessment of speech enhancement techniques can be done either by using objective quality or subjective listening tests. Comparative analysis of original speech and processed speech signals by a group of listeners is known as subjective listening test based on human auditory system.. Which involves a complex process and it is difficult to identify the persons with good listening skills. While objective evaluation is done on mathematical comparison of clean and enhanced signals .In order to calculate the objective measures, the speech signal is first divided into frames of duration of 10-30 msec. This result in a single measure which gives the average of distortion measures calculated for all the processed frames. This section gives the performance analysis of the proposed method by using four numbers of bands. Simulations were performed in the MATLAB environment. NOIZEUS is used as a speech corpus which is available at [15] and used by the most of the researchers, containing 30 sentences of six different speakers, three are male and other three are female speakers originally sampled at 25 KHz and down sampled to 8 KHz with 16 bits resolution quantization. Clean Speech is distorted by eight different real-world noises (babble, airport, station, street, exhibition, restaurant, car and train) at three distinct ranges of input SNR (0dB, 5dB, 10dB). In this algorithm speech sample is taken from a male speaker, English sentence is ”we can find joy in the simplest things”. This paper presents the performance evaluation based on different quality measures which are segmental-SNR, Weighted Slope Spectral Distance(WSSD) [13], Log Likelikelihood Ratio, Perceptual Evaluation of Speech Quality (PESQ) [14]and three different composite measures[13]. A. Segmental SNR (seg-SNR)To improve the correlation between clean and processed speech signals summation can be performed over each frame of the signal [13] this results in segmental SNR .The segmental Signal-to-Noise Ratio (seg-SNR) in the time domain can be expressed asSNR seg =10M∑log 10M−1M=0∑x 2(n)Nm+N−1n=NM∑(x (n)−x̂(n))2Nm+N−1n=NM (14)Here x(n) shows the original speech signal. x(n)̂ is theprocessed speech signal, frame length is given by N andthe number of frames is given by M. The geometric mean of all frames of the speech signal is seg-SNR [10], whose value was limited in the range of [-10, 35dB] B. Log Likelihood Ratio (LLR)This measure was based on LPC analysis of speech signal.LLR (a x ⃗⃗⃗⃗ ,ax ̂)=log (a ⃗ x̂R x a ⃗ x ̂T a ⃗ x R x a⃗ x T ) (15)a x ,a x ̂Tare the LPC coefficients of the original andprocessed signals. R x is the autocorrelation matrix of the original signal .In LLR denominator term is always lower than numerator therefore LLR is always positive [13] and the LLR values are in the range of (0-2).Multitaper spectral estimation using sine-tapersWavelet thresholding the Multitaper spectrumSpectralSubtractionNoise estimation by weighted spectral averagingSignal framing using Hamming WindowInverse DFT &OLANoisy SpeechEnhanced SpeechC. Weighted Slope Spectral Distance(WSSD)This measure can be evaluated as the weighted difference between the spectral slopes in each band can be computed using first order difference operation[13].Spectral slopes in each band of original and processed signals are given byWSSD=1M ∑∑W(j,m)(X x(j,m)−X X̂(j,m))2Kj=1∑W(j,m)Kj=1M−1M=0(16)D. Perceptual Evaluation of Speech Quality (PESQ) One among the objective quality measures which provides an accurate speech quality recommended by ITU_T [14] which involves more complexity in computation. A linear combination of average asymmetrical disturbance A ind and average disturbance D ind is given by PESQ.PESQ=4.754-0.186D ind-0.008 A ind (17) E. Composite MeasuresLinear combination of existing objective quality measures results in a new measure [10].This can be evaluated by using linear regression analysis. This paper uses the multiple linear regression analysis to obtain the following new composite measures [13].These composite measures were measured on a five-point scale.(i) Signal Distortion(C sig): The linear combination of PESQ, LLR and WSSD measures results in a new composite measure named as Signal Distortion [13].This is evaluated using the following equationC sig=3.093-1.029*LLR+0.603*PESQ-0.009*WSSD(18)(ii) Noise intrusiveness(C bak): The linear combination of PESQ, seg-SNR and WSSD measures results in anew composite measure named as noise Distortion [13]. This is evaluated using the following equation.C bak=1.634+0.478*PESQ+0.007*WSSD+0.063*seg-SNR(19)(iii) Overall Quality (C ovl): Overall Quality is formed by Linear combination of LLR ,PESQ and WSSD measures and is given byC ovl=1.594+0.805*PESQ-0.512*LLR-0.007*WSSD(20) Scale of signal degradation, background intrusiveness and overall quality measures are shown in table 1,2,3.Table 1. Scale of Signal DistortionTable 2. Scale of Background IntrusivenessTable 3. Scale of Overall qualityTo obtain objective quality measures for the proposed method first the multitaper spectrum was obtained using sine tapers. Further spectral refinement is achieved through wavelet thresholding the multitaper spectrum .Then noise spectrum is estimated using weighted spectral averaging. The results were compared against Wavelet de-noising using hard thresholding (WDH) and soft thresholding (WDS) suggested in [16], Spectral subtraction(SS) [17] and Multi Band Spectral Subtraction (MBSS) [18].Table 4. Objective quality measures Segmental SNR(seg_SNR),Log Likelihood Ratio(LLR),Weighted Slope Spectral Distance (WSSD),PerceptualEvaluation of Speech Quality(PESQ)Table 5. Composite measures(C sig, C bak, C ovl) for eight different types of noises(a)(b)(c)(d)(e)(f)(g)Fig.6. a)Segmental SNR b)Log Likelihood Ratio c)Weighted spectral slope distance d)PESQ e)Signal Distortion (C sig) f) Background intrusiveness (C bak) g) Overall quality (C ovl) measures against inputSNR.Fig.7. Time domain and spectrogram representation of Clean Speech noisy speech and enhanced speech signals by SS[17],MBSS[18],WDH,WDS[16] ,Wavelet thresholding the multi taper spectrum[11] and proposed method.VII. C ONCLUSIONFrom the results shown in table.4, performance of wavelet de-noising techniques is very poor in terms of all objective quality measures i.e lower values of segmental SNR and PESQ and higher values of LLR and WSSD in all the cases considered when compared to other techniques. The proposed method exhibits its superior performance i,e higher values of segmental SNR and PESQ for all types of noises at three levels of input SNR against all the methods considered. The performance of proposed method decreases in terms of LLR and WSSD when compared to Multi Band Spectral Subtraction method. Composite measures were shown in table.5, indicates that the proposed method provides improvement in terms of all three composite measures when compared to all the four different methods considered. The same results can be shown in the form of graphs by taking average of all eight different noises at three levels in figure .6 from a to g. From the results it can be concluded that the proposed method is suitable for higher values of segmental SNR, PESQ and composite measures.Figure7.,shows the time domain and frequency domain representation of noisy speech, noise and enhanced speech signals for various methods like Spectral Subtraction [17],Multi Band Spectral Subtraction[18], Wavelet de-noising techniques with both soft and hard thresholding [16],Wavelet thresholding the Multitaper spectrum for speech enhancement[11] and proposed methods.Spectrograms are widely used in speech processing to plot the spectrum of frequencies as it varies with time. The spectrogram can be evaluated as a sequence of FFTs computed over a windowed signal of duration of 20ms In the time domain Enhanced speech signal from Spectral subtractive type algorithm introduces musical noise; it was eliminated in the Multi Band Spectral Subtraction the same can be observed in spectrograms. Wavelet de-noising techniques shows its performance in suppression of noise. The proposed method gives the enhanced signal closer to original clean speech signal and spectrogram also closer to the spectrogram of clean speech signal.A CKNOWLEDGEMENTI would like to take this opportunity to express my profound gratitude and deep regard to my Research Guide Dr.K.Satya Prasad for his exemplary guidance, valuable feedback and constant encouragement throughout the duration of the research. His valuable suggestions were of immense help throughout research. Working under him was an extremely knowledgeable experience for me. I would also like to give my sincere gratitude to the authors Hu and Loizou, for inspiring me with their research papers in the field of speech enhancement along with objective quality measures.R EFERENCES[1]R.Martin, “An efficient algorithm to estimate theinstantaneous SNR of speech signals”, proceedings of Euro speech ,Berlin,pp.1093-1096,1993.[2]R.Martin, “Spectral subtraction based on minimumstatistics, Proceedings of European Signal Processing,U.K,pp.1182-1185,1994.[3]G.Doblinger, “Computationally efficient speechenhancement by spectral minima tracking in sub bands”, proceedings of Euro speech ,Spain, pp:1513-1516,1995. [4]H.Hirch, and C.Ehrlicher, “Noise estimation techniquesfor robust speech rec ognition”, proceedings of IEEE International Conference on Acoustic Speech Signal Processing, MI, pp.153-156,1995.[5]R.Martin, “Noise Power Spectral Density Estimationbased on Optimal Smoothing and Minimum statistics”, IEEE Transactions on Audio, Speech Processing pp.504–512, 2001.[6]I.Cohen, “Noise Estimation by Minima controlledrecursive averaging for robust speech enhancement”, IEEE Signal Processing. Letter, pp.12–15,2002[7]I.Cohen, “Noise spectrum Estimation in adverseenvironments: Improved Minima controlled recursive averaging”, IEEE Transactions on Audio, Speech Processing, pp.466-475, 2003.[8]L.Lin ,W.Holmes and E.Ambikairajah , “Adaptive noiseestimation algorithm for speech enhancement”,Electron .Lett,754-555,2003[9]Loizou, R.Sundarajan,Y. Hu,”Nois e estimation Algorithmwith rapid Adaption for highly non-stationary environments “Proceedings on IEEE International Conference on Acoustic Speech Signal Processing,2004.[10]Loizou, R.Sundarajan, “A Noise estimation Algorithm forhighly non-stationary Envi ronments”. Speech Communication,48, Science Direct , pp.220-231,2006. [11]Yi.Hu ,P.C .Loizou.,"Speech enhancement based onwavelet thresholding the multitaper spectrum”, IEEE Transactions on Speech and Audio Processing,pp.59-67,2004.[12] C.Ris and S.Dupont, “Assessing local noise levelestimation methods: Applications to noise robust ASR”, Speech Communication, pp.141-158,2001.[13]Yi.Hu ,P.C .Loizou.,"Evaluation of objective QualityMeasures for Speech Enhancement " ,IEEE Transactions on Audio, Speech and Language Processing pp.229-238,Jan.2008.[14]ITU_T Rec, “Perceptual evaluation of speechquality(PESQ), An objective method for end to end speech quality assessment of narrowband telephone networks and speech codecs”.,International Telecommunications Union ,Geneva Switzerland, February 2001.[15] A Noisy Speech Corpus for Assessment of SpeechEnhancement Algorithms. https: // / Loizou /speech/noizeous.[16]DL.Donoho, “De-noising by soft thresholding “,IEEErm.Theory,41(3), 613627,1995.[17]Boll,S.F, “Suppression of acoustic noise in speech usingspectral subtraction”. IEEE Transactions on Acoustics Speech and Signal Processing, 1979,27(2), 113–120.。

一种综合的基音提取方法

一种综合的基音提取方法

收稿日期:2003-01-06。

本项目得到上海市科学技术委员会基础研究项目基金资助(01JC14033)。

章文义,硕士生,主研领域:语音识别,语言信号处理。

一种综合的基音提取方法章文义 朱 杰(上海交通大学 上海交通大学与贝尔实验室通信与网络联合实验室 上海200030)摘 要 本文提出了一种综合的基音提取算法,综合运用了平均幅度差法、自相关函数法和简单逆滤波器跟踪法等多种算法对候选的基音频率点进行打分,最后再用动态搜索的算法找出一条全局最优路径。

从而避免了单一方法的局限性,取得了很好的性能。

关键词 基音 平均幅度差函数 自相关函数 简单逆滤波器跟踪法A NEW COMPOSITE PITCH EXTRACTION AL G ORITH MZhang Wenyi Zhu Jie(Shanghai Jiaotong University and Bell Labs Communications and Network Joint Laboratory ,Shanghai 200030)Abstract This article proposes a composite pitch extraction alg orithm ,which integrates AM DF ,Autocorrelation Function and SIFT together ,scores the candidate pitch frequency ,then searches for a global optimized path using dynamic programming.The composite alg orithm abstains the limitation exists in single alg orithm and shows g ood performance in multi 2conditions.K eyw ords Pitch AM DF Autocorrelation function SIFT1 引 言基音提取是语音信号处理中一个重要的课题,在语音压缩编码、语音识别,尤其是汉语语音识别中有着重要意义。

f0值的计算公式(二)

f0值的计算公式(二)

f0值的计算公式(二)F0 值的计算公式1. F0 值的定义F0 值(Fundamental Frequency)是指声音信号中最低频率的成分。

在语音处理和音乐分析等领域中,F0 值非常重要,可以用于声音识别、性别分析、情感识别等任务。

2. 基本概念在计算 F0 值之前,我们需要先了解一些基本概念:•声音信号:声音的振动在空气中传播形成的波动,可以用波形来表示。

•周期(Period):声音波形中一个完整的波形周期,即波形的重复部分。

•基频(Fundamental Frequency):声音波形中一个周期的重复次数,即单位时间内波形的频率。

3. F0 值的计算公式自相关法(Autocorrelation Method)自相关法是一种常用的计算 F0 值的方法,其计算公式如下:F0 = 1 / T其中,T 为声音波形的周期。

自相关法的示例假设我们有一段声音信号的波形如下图所示:[波形示例]我们可以通过观察波形的周期来计算 F0 值。

在波形中选择一个周期的重复部分,并测量该部分的时间长度 T,然后计算出 F0 值:T = [测量得到的周期长度]F0 = 1 / T其他方法除了自相关法外,还有一些其他方法可以计算 F0 值,例如: - 快速傅里叶变换(Fast Fourier Transform,FFT) - 基频提取算法(Pitch Extraction Algorithm) - 等等这些方法根据具体的领域和应用选择适用的方法。

4. 结论F0 值的计算是语音处理和音乐分析中的重要任务之一。

本文介绍了 F0 值的定义和计算公式,以及自相关法的示例。

同时提到了其他一些计算 F0 值的方法。

在实际应用中,根据具体情况选择合适的方法来计算 F0 值,可以更好地分析和处理声音信号。

多目标检测算法原理的应用

多目标检测算法原理的应用

多目标检测算法原理的应用1. 简介多目标检测算法是计算机视觉领域中的一项重要技术,它可以在图像或视频中同时检测出多个目标,并标注它们的位置和类别。

多目标检测在各个领域中都有广泛的应用,包括智能监控、自动驾驶、物体识别等。

2. 常见的多目标检测算法目前,有许多不同的多目标检测算法被广泛应用于实际场景中。

以下是一些常见的多目标检测算法:•Faster R-CNN:Faster R-CNN 是一种先进的多目标检测算法,它采用了两阶段的检测流程。

首先,通过Region Proposal Network (RPN)生成候选框,然后使用候选框进行目标的分类和位置回归。

•YOLO:YOLO (You Only Look Once) 是一种基于单阶段的多目标检测算法。

它通过将图像分成网格,并在每个网格内预测目标的类别和边界框,从而实现快速检测。

•SSD:SSD (Single Shot MultiBox Detector) 是另一种常见的基于单阶段的多目标检测算法。

它通过在图像的不同尺度上提取特征,并在每个尺度上预测目标的类别和边界框,实现多尺度的检测。

3. 多目标检测算法原理多目标检测算法的原理是通过图像处理和机器学习技术来实现的。

下面是多目标检测算法的一般步骤:1.特征提取:首先,从输入图像中提取特征。

这些特征可以是局部特征、全局特征或混合特征,通常使用卷积神经网络 (CNN) 来提取高级特征。

2.候选框生成:根据特征图,在图像上生成候选框。

这些候选框是可能包含目标的区域,可以使用滑动窗口或区域提议网络 (Region ProposalNetwork) 来生成。

3.候选框分类:对候选框进行分类,判断它们是否包含感兴趣的目标。

这通常通过使用分类器或卷积神经网络来实现。

4.候选框筛选:根据一定的规则,对分类结果进行筛选,去除不符合条件的候选框,保留可能包含目标的框。

5.目标定位:根据筛选后的候选框,对目标进行精确定位,输出目标的位置信息。

小波改进最大平均相关高度法实现畸变目标识别_尚吉扬

小波改进最大平均相关高度法实现畸变目标识别_尚吉扬
[2 ]

图像模式识别系统大致可以分为 3 个主要部分: 1 ) 图像信息的获取; 2 ) 图像信息的处理加工; 3 ) 图像信息 的判断 分 类。 其 识 别 效 率 取 决 于 图 像 信 息 处 理 的 方 法
[35 ]
v) 和 式中: * 表示复共轭, 表示相关运算。 令 F( u, H( u, v) 分别表示 f( x, y) 和 h( x, y) 的傅里叶变换, 则两 y) h( x, y) 和其对应频域的乘积 个函数空域的相关 f( x, F* ( u, v) H ( u , v) 组成一个傅里叶变换对, 即: f( x, y) h( x, y) F * ( u , v) H ( u , v)
[78 ]
。 y)
Mahlanobis 等人于 1994 年提出的最大平均相关高度 ( MACH) 滤波器是综合鉴别函数( SDF ) 的改进, 而文献 [ 9] 用 MACH 滤波器成功的实现了 - 16° 到 16° 的旋转畸 是当前最为成功的畸变 变识别和 60% 的比例畸变容差, 目标识别方法
10 收稿日期: 2010-
Received Date: 201010
2058






第32 卷
1


2
图像匹配识别原理
匹配识别技术是通过一种原型模式向量表示每个
模式识别是对表征事物或现象各种形式的信息进行 以达到对事物或现象进行描述、 辨认、 分类 处理和分析, 和解释的 过 程, 是信息科学和人工智能的重要组成部 分
要: 为了解决图像匹配识别中复杂背景下畸变目标无法准确识别的问题, 结合墨西哥帽( Mexican Hat ) 小波母函数重新设

Turbostream CFD 求解器的一份用户指南说明书

Turbostream CFD 求解器的一份用户指南说明书

To produce an order of magnitude reduction in the run-time of CFD solvers for the same hardware cost•Turbomachinery research laboratory•Does both experimental and computational work•Typical routine simulation •Structured grid, steady state •3million grid nodes•8 hours on four CPU cores •20 minutes on GPU•The processor landscape is rapidly changing, but CFD codes can have a life-span of 30 years•Our work focuses on structured grids•Structured grid solvers: a series of stencil operations•Stencil operations: discrete approximations of the equations•Single implementation?•Multiple implementations?•Alternative:•High level language for stencil operations •Source-to-source compilation•Structured grid indexingi, j+1, ki, j, k+1i, j, ki-1, j, k i+1, j, ki, j, k-1i, j-1, k•in Fortran•Stencil definition:•The stencil definition is transformed at compile-time into code that can run on the chosen processor•The transformation is performed by filling in a pre-defined template using the stencil definitionStencil definition CPU templateGPU templateCPU source (.c)GPU source (.cu)•The stencil definition is transformed at compile-time into code that can run on the chosen processor•The transformation is performed by filling in a pre-defined template using the stencil definitionStencil definition CPU templateGPU templateCPU source (.c)GPU source (.cu)X sourceX template•There are many optimisation strategies for stencil operations (see paper from Supercomputing 2008 by Datta et al.)•CPUs:•Parallelise with pthreads•Cache by-pass using SSE (streaming stores)•GPUs:•Cyclic queues•Split between register storage and shared memory storage•Intel Nehalem 2.66 GHz•AMD Phenom II 3.0 GHz •NVIDIA GT200 (Quadro Fx5800)•Heat conduction benchmark kernel•We have implemented a new solver that can run on both CPUs and GPUs•The starting point was an existing solver called TBLOCK•The new solver is called Turbostream•Developed by John Denton•Blocks with arbitrary patch interfaces •Simple and fast algorithm•15,000 lines of Fortran 77•Main solver routines are only 5000 lines•Explicit scheme•Variable time steps and multi-grid for convergence acceleration •Time-accurate solutions using Jameson’s Dual Time Stepping procedure•Multi-stage calculations using mixing planes or sliding planes•3000 lines of stencil definitions (~15 different stencil kernels)•Code generated from stencil definitions is 15,000 lines •Additional 5000 lines of C for boundary conditions, file I/O etc.•Source code is very similar to TBLOCK –every subroutine has an equivalent stencil definition•TBLOCK uses all four cores on the CPU through MPI •Turbostream is ~20 times faster•Benchmark case is an unsteady simulation of a turbine stage•16 NVIDIA G200 GPUs, 1 Gb/s Ethernet •Weak scaling: 6 million grid nodes per GPU •Strong scaling: 6 million grid nodes in total•Three-stage low-speed turbine case from Rosic et al. (2006)•Used to demonstrate the importance of hub and shroud leakages in multi-stage turbinesPitch-wise averaged entropy function:•Steady calculation with mixing planes • 4 million grid nodescontours, stator 3Experiment TBLOCK Turbostream•The switch to many-core processors enables a step change in performance, but existing codes have to be rewritten•The differences between processors make it difficult to hand-code a solver that will run on all of them•We suggest a high level abstraction coupled with source-to-source compilation• A new solver called Turbostream, which is based on Denton’s TBLOCK, has been implemented•Turbostream is ~20 times faster than TBLOCK when running on an NVIDIA GPU as compared to a quad-core Intel CPU•Single blade-row calculations almost interactive on a desktop (10 –30 seconds)•Multi-stage calculations in a few minutes on a small cluster ($10,000)•Full annulus URANS completes overnight on a modest cluster ($100,000)。

改进自适应遗传算法在节距排列降噪中的运用

改进自适应遗传算法在节距排列降噪中的运用

改进自适应遗传算法在节距排列降噪中的运用徐胜利1,夏梦雷2(1.南京航空航天大学江苏南京210016;2.河海大学江苏南京211100)摘要:在轮胎花纹设计过程中以节距为单元,合理的设计出节距比例和节距排列,以达到降噪的目的。

本文以自适应遗传算法为基础将节距作为基因进行编码优化排序,采用自适应余弦函数进行交叉和变异操作,并在交叉操作结合多亲遗传算法的特点增加样本的多样性,对同一代中的样本进行适应度的排序,选择适应度高的进入下一代。

实验结果表明改进的自适应遗传算法能很好的收敛于全局最优解,降噪效果明显给轮胎花纹降噪设计提供一种可行的方法思路。

关键词:节距排列;自适应;遗传算法;交叉概率;变异概率;适应度;降噪中图分类号:TP302文献标识码:A文章编号:1674-6236(2013)01-0037-04Application of pitches arrangement noise -reduction based on the improved self -adaptive genetic algorithmXU Sheng -li 1,XIA Meng -lei 2(1.Nanjing University of Aeronautics &Astronautics ,Nanjing 210016,China ;2.Hohai University ,Nanjing 211100,China )Abstract:In order to achieve the purpose of reducing noise ,pitch has been treated as unit to design optimized pitch proportion and sequence.In this paper ,the pitches are coded as genes to sequence based on the self -adaptive genetic algorithm ,using self -adaptive cosine function to do crossed and variant operations and the characteristics of multi parents genetic algorithm are used in the crossed operation which increases the diversity of samples.Order the fitness of the samples in the same generation and select the high fitness samples into next generation.The experimental results show that the improved self -adaptive genetic algorithm can be very good convergence to the global optimum solution ,noise -reduction effect is obvious and provides a feasible method to the tread patterns noise -reduction design.Key words:pitches arrangement ;self -adaptive ;genetic algorithm ;crossover probability ;mutation probability ;fitness ;noise -reduction收稿日期:2012-09-11稿件编号:201209071作者简介:徐胜利(1988—),男,江苏宿迁人,硕士研究生。

Renishaw RESOLUTE

Renishaw RESOLUTE

L-9517-9530-03-BRESOLUTE™ UHV absolute optical encoderRenishaw’s true-absolute optical encoder, RESOLUTE™, offers Ultra-High Vacuum compatibilityin both linear and rotary (angle) encoder formats.The RESOLUTE encoder determines position immediately upon switch-on, without the need for any movement or battery back-up. This means complete control of axes can be achieved immediately, thus eliminating risks of unchecked movements or collisions, a critical advantage in applications such as wafer handling where safe extraction of high-value products is essential afterloss of power.RESOLUTE encoders have inherently very low sub-divisional error (SDE), so the fidelity of feedback is improved. This has several benefits, including minimising velocity ripple, reducing vibration, increasing scanning performance and cutting the amount of heat generated in motors. The RESOLUTE system also has low positional noise (jitter) of less than 10 nm RMS, so positional stability is significantly improved. Resolutions are available to 1 nm (linear) or 32 bit (rotary), with a maximum speed up to 100 m/s. RESOLUTE UHV encoders are available with a range of serial protocols for excellent noise immunity, including BiSS® C and Panasonic.• Clean residual gas analysis (RGA)• Low outgassing rate• Bake-out temperature of 120 °C • True-absolute non-contact optical encoder system:no batteries required• Wide set-up tolerances for quick and easy installation• Resolutions to 1 nm linear or 32 bit rotary• Up to 100 m/s maximum speed (36 000 rev/min)• ±40 nm sub-divisional errorfor smooth velocity control• Less than 10 nm RMS jitter for improved positional stability • Built-in separate position-checking algorithm provides inherent safety• Integral set-up LED enables easy installation and provides diagnostics at a glance• Operates up to 75 °C• Integral over-temperature alarm • Compatible with a wide range of linear and rotary scales• Optional Advanced Diagnostic Tool ADTa-100* ADT a-100 compatible readheads are marked with the symbolSystem featuresUnique single-track absolute optical scaleu Absolute position is determined immediately upon switch-on u No battery back-upu No yaw de-phasing unlike multiple-track systemsuFine pitch (30 µm nominal perio d) optical scale for superior motion control compared to inductive, magnetic or other non-contact optical absolute encodersuHigh-accuracy graduations marked directly onto tough engineering materials for outstanding metrology and reliabilityUnique detection methoduReadhead acts like an ultra-fast miniature digital camera, taking photos of a coded scaleuPhotos are analysed by a high-speed digital signal processor (DSP) to determine absolute position uBuilt-in position-check algorithm constantly monitors calculations for ultimate safety and reliabilityuAdvanced optics and position determination algorithms are designed to provide low noise ( j itter < 10 nm RMS) and low sub-divisional error (SDE ±40 nm)Optional Advanced Diagnostic ToolThe RESOLUTE encoder system is compatible with theAdvanced Diagnostic T ool ADT a-100* and ADT View software, which acquire detailed real-time data from the readhead to allow easy set-up, optimisation and in-field fault finding. The intuitive software interface provides:u Digital readout of encoder position and signal strength u Graph of signal strength over the entire axis travel u Ability to set a new zero position for the encoder system uSystem configuration information* For RTLA30-S axis lengths > 2 m, FASTRACKcarrier with RTLA30 is recommended.1.5 mm × 14.9 mm 1.6 mm × 14.9 mm Up to 1 m : ±1 µm 1 m to 1.5 m : ±1 µm/mUp to 1 m : ±1.5 µm 1 m to 2 m : ±2.25 µm 2 m to 3 m : ±3 µm 3 m to 5 m : ±4 µm1.5 m5 m0.4 mm × 8 mm including adhesiveRTLA30 scale: 0.2 mm × 8 mm FASTRACK carrier: 0.4 mm × 18 mmincluding adhesive±5 µm/m ±5 µm/m RTLA30 lengths up to 21 mcarrier lengths up to 25 m±1.9 arc second (Typical installed accuracy for 550 mm diameter RESA30 ring)±1 arc second *otal installed accuracy for 417 mm diameter REXA30 ring)52 mm to 550 mm52 mm to 417 mm15.5 ±0.5 µm/m/°C15.5 ±0.5 µm/m/°C* When using two RESOLUTE readheads.Resolutions, speed and scale lengthsRESOLUTE encoder system with BiSS C(uni-directional)RESOLUTE readheads using BiSS C (uni-directional) protocol are available with three options for the position word length:36 bit, 32 bit and 26 bit. The maximum scale length is determined by the readhead resolution and the number of position bits in the serial word.Shorter word lengths combined with fine resolution limit maximum scale length. Conversely, coarser resolutions or longer word lengths enable the use of longer scale lengths.The 36 bit and 32 bit position word facilitates longer lengths that can be a significant benefit, especially at fine resolutions.RESOLUTE encoder system with PanasonicRESOLUTE readheads using Panasonic serial comms are available with 1 nm, 50 nm and 100 nm resolution options.For the Panasonic protocol, maximum scale length is available at all resolutions.Contact your local Renishaw representative for details of other serial protocols.ResolutionRESOLUTE encoders are available with a variety of resolutions, to meet the needs of a wide range of applications.The choice of resolutions depends on the serial protocol being used, but there are no limitations due to ring size; for example BiSS 26 bit resolution is available on all ring sizes.RESOLUTE encoders with BiSS serial comms are available with the following resolution options:RESOLUTE encoders with Panasonic serial comms are available with the following resolution options:For resolution options on other protocols, contact your local Renishaw representative. * 32 bit resolution is below the noise floor of the RESOLUTE encoder.†The maximum speed depends on the driver, motor and mechanical components. Contact Renishaw or Panasonic regarding the maximum speed.‡ ‘T ypical’ installations are a result of graduation and installation errors combining and, to some magnitude, cancelling.Speed and accuracyCAUTION: Very high speed motion axes require additional designconsideration. For applications that will exceed 50% of the rated maximum reading speed of the ring, contact your local Renishaw representative.For REXA30 speed and accuracyfigures, refer to the REXA30 ultra-high accuracy absolute angle encoder datasheet (Renishaw part no. L-9517-9405).General specifications(angle and linear)Power supply5 V ±10% 1.25 W maximum (250 mA @ 5 V)NOTE: Current consumption figures refer to terminated RESOLUTEsystems. Renishaw encoder systems must be powered from a5 Vdc supply complying with the requirements for SELV of standardIEC 60950-1.Ripple200 mVpp maximum @ frequency up to 500 kHz maximum Temperature Storage0 °C to +80 °COperating0 °C to +75 °CBake-out (non-operating)120 °CHumidity95% relative humidity (non-condensing) to IEC 60068-2-78 Sealing IP30Acceleration(readhead)Operating500 m /s2, 3 axesShock (readhead)Non-operating1000 m /s2, 6 ms, ½ sine, 3 axesMaximum acceleration of scale with respect to readhead 2000 m /s2NOTE: This is the worst-case figure that is correct for the slowest communications request rates. For faster request rates, the maximum acceleration of scale with respect to the readhead can be higher.For more details, contact your local Renishaw representative.Vibration Operating100 m /s2 max @ 55 Hz to 2000 Hz, 3 axesRandom vibration 0.15 g2/Hz ASD 20 -1000 Hz, −6dB roll off 1-2 kHz Mass Readhead19 gCable19 g /mCable Mechanical option ‘U’Silver-coated copper braided single screen.FEP core insulation, over tin-plated copper wire.Mechanical option ‘F’Stainless steel cable braid.Communication format - BiSS RS485 / RS422 differential line-driven signalCompatible Panasonic Drivers A5 family drivers (only compatible with RESOLUTE linear):A5, A5II, A5L, A5N, A5NL, A5BL.A6 family drivers (RESOLUTE rotary will be available for all A6 family drivers):A6SM, A6SL, A6NM, A6NL.Test scheduleA quadrupole mass spectrometer (AccuQuad 200 RGA) was used to collect RGA data. Chamber pressure was measured with an Ion Gauge (G8130). After initial conditioning of the system, a background spectrum was recorded together with the total pressure in the test chamber.The component was placed in the vacuum system (0.0035 m 3) which was then pumped using an KJL Lion 802 (800/s) diode ionpump and a Divac diaphragm pump at ambient temperature for 24 hours, after which a background scan and the total pressure in the test chamber were recorded again. If the system pressure was better than 5 × 10-9 mbar, the test specimen was baked at 120 °C for 48 hours. The system was then allowed to cool to ambient temperature before a final mass spectrum and total pressure measurement were taken. The final RGA scans are shown below.NOTE: Exact reproduction of these results should not be expected, as RGA data depends on the condition, specification andperformance of the vacuum system. However, the RGA results shows no significant contamination attributable to RESOLUTE UHVencoders and that UHV conditions can be achieved in the presence of this product.P a r t i a l p r e s s u r e /m b a rRESOLUTE readhead with 1.0 m cable after bake-out (total pressure = 8 × 10−10 mbar)Mass/AMUP a r t i a l p r e s s u r e /m b a rMass/AMURESA30 (Ø115 mm) after bake-out (total pressure = 7.76 × 10−10mbar)P a r t i a l p r e s s u r e /m b a rRTLA30-S linear scale (300 mm length) after bake-out (total pressure = 1.69 × 10−10mbar)P a r t i a l p r e s s u r e /m b a rRSLA30 linear scale (180 mm length) with 2 clips and 1 clamp after bake-out (total pressure = 3.0 × 10−10 mbar)Mass/AMUMass/AMURESOLUTE UHV readhead installation drawing (on RSLA30/RELA30 scale)n t o f m o u n t i n g f a c e s .e a d d e p t hf r o m m o u n t i ng f a c e . R e c o m m e n d e d th r e a d e n g a g e m e n t 5 m m (8i n c l u d i n g c o u n t e r b o r e ). R e c o m m e n d e d t i g h t e n i n g t o r q u e 0.5 t o 0.7 N m .Dimensions and tolerances in mm11RESOLUTE readhead side exit cable installation drawing (on RSLA30 / RELA30 scale)* E x t e n t o f m o u n t i n g f a c e s .† T h r e a d d e p t h f r o m m o u n t i n g f a c e . R e c o m m e n d e d t h r e a d e n g a g e m e n t 5 m m (8 i n c l u d i n g c o u n t e r b o r e ). R e c o m m e n d e d t i g h t e n i n g t o r q u e 0.5 t o 0.7 N m .Dimensions and tolerances in mm12RL 32B US 001C 30 VSeriesR = RESOLUTE Scale form L = Linear Protocol26B = BiSS 26 bit 32B = BiSS 32 bit 36B = BiSS 36 bit48P = Panasonic 48 bitMechanical option F = Ultra High Vacuum(stainless steel cable braid) U = Ultra High Vacuum(silver coated copper braid cable)Gain optionT = RTLA30 / RTLA30-S S = RSLA30E = RELA30Resolution 001 = 1 nm005 = 5 nm (BiSS only)050 = 50 nm100 = 100 nm (Panasonic only)Scale code optionB = RTLA30 / RTLA30-S (20 mm to 10 m)C = RSLA30 (20 mm to 5 m )/RELA30 (> 1.13 m to 1.5 m)D = RELA30 ( 20 mm to 1.13 m)E = RTLA30 / RTLA30-S (> 10 m to 21 m)Cable length 02 = 0.2 m 05 = 0.5 m 10 = 1 m 15 = 1.5 m 30 = 3 m 50 = 5 m 90 = 9 m 99 = 10 mTerminationV = Vacuum flying leadRESOLUTE angle readhead nomenclatureRA 26B U A 052B 30 VSeriesR = RESOLUTE Scale form A = AngularProtocol18B = BiSS 18 bit 26B = BiSS 26 bit 32B = BiSS 32 bit23P = Panasonic 23 bit 32P = Panasonic 32 bitMechanical option F = Ultra High Vacuum(stainless steel cable braid)U = Ultra High Vacuum(silver coated copper braid cable)Gain option A = StandardRing diameter 052 = 52 mm ring 057 = 57 mm ring 075 = 75 mm ring 100 = 100 mm ring 103 = 103 mm ring 104 = 104 mm ring 115 = 115 mm ring 150 = 150 mm ring183 = 183 mm ring (REXA30 only)200 = 200 mm ring 206 = 206 mm ring 209 = 209 mm ring 229 = 229 mm ring 255 = 255 mm ring 300 = 300 mm ring 350 = 350 mm ring413 = 413 mm ring (RESA30 only)417 = 417 mm ring489 = 489 mm ring (RESA30 only)550 = 550 mm ring (RESA30 only)Scale code optionB = Standard scale code Cable length 02 = 0.2 m 05 = 0.5 m 10 = 1 m 15 = 1.5 m 30 = 3 m 50 = 5 m 90 = 9 m 99 = 10 mTerminationV = Vacuum flying leadRESOLUTE linear readhead nomenclatureNOTE: Not all combinations are valid. Check valid options online at /epcRenishaw plcNew Mills, Wotton-under-Edge, Gloucestershire GL12 8JR United KingdomT +44 (0)1453 524524F +44 (0)1453 524901E ***************RESOLUTE UHV series compatible products:RESA30 stainless steel ringREXA30 high-accuracystainless steel ringRELA30 self-adhesive or clip / clamp mounted ZeroMet spar scaleRTLA30 tape scale and FASTRACK carrierRSLA30 self-adhesive or clip / clamp mountedstainless steel spar scaleRTLA30-S self-adhesivetape scaleRENISHAW HAS MADE CONSIDERABLE EFFORTS TO ENSURE THE CONTENT OF THIS DOCUMENT IS CORRECT A T THE DA TE OF PUBLICA TION BUT MAKES NO WARRANTIES OR REPRESENT ATIONS REGARDING THE CONTENT. RENISHAW EXCLUDES LIABILITY , HOWSOEVER ARISING, FOR ANY INACCURACIES IN THIS DOCUMENT .© 2010-2022 Renishaw plc. All rights reserved.Renishaw reserves the right to change specifications without notice.RENISHAW and the probe symbol used in the RENISHAW logo are registered trade marks of Renishaw plc in the United Kingdom and other countries. apply innovation and names and designations of other Renishaw products and technologies are trade marks of Renishaw plc or its subsidiaries.BiS S is a registered trademark of iC-Haus GmbH.All other brand names and product names used in this document are trade names, trade marks or registered trade marks of their respective owners.For worldwide contact details, visit /contactPart no.: L-9517-9530-03-BIssued: 12.2022For more information about the ADT a-100 and the scale, refer to the relevant data sheets and installation guides which can be downloaded from /opticalencoders.Advanced Diagnostic Tool ADTa-100 (A-6525-0100) Compatible with RESOLUTE readheads showing themark.RKLA30-S self-adhesive tape scale。

基于多尺度特征融合的无人车目标检测算法

基于多尺度特征融合的无人车目标检测算法
分别是主干网络 VGG16、附加网络层和预测层,如 图 1所示。在特征提取过程中会生成多个特征图, 需 要 用 到 的 有 效 特 征 图 为 Conv4_3、Conv7、 Conv8_2、Conv9_2、Conv10_2、Conv11_2。
图 1 SSD网络结构
1.2 先验框选择
在有效特征图上,以每个像素点为中心,生成数
基于多尺度特征融合的无人车目标检测算法
李伟文,李 擎,高 超
(北京信息科技大学 高动态导航技术北京市重点实验室,北京 100192)
摘 要:针对无人车在行驶过程中难以准确识别目标的问题,提出一种基于多尺度特征 融合与自注意力机制的 SSD目标检测算法。引入注意力机制,在 SSD网络的特征提取阶段增加注 意力模块 ULASM处理有效特征图;设计多尺度特征融合模块,对高层特征图进行反卷积操作,提 高特征图的尺寸;将底层特征图与反卷积后的高层特征图融合,丰富其语义信息。在数据集 KITTI 中测试的结果表明,改进后的算法检测精度提升了 3.8%,能有效检测出原算法忽略的目标。
道数,h和 w是特征图的尺寸。将输入的特征图分
为 g个互斥的子空间 [F1,F2,…,Fn~,…,Fg],每个 子空间 Fn~ 中包含 G个特征图。对分离后子空间中 的特征图进行核为 1的 Depthwise操作,然后进行
核为 3×3、填充为 1的 maxpooling以及点卷积,最
后采用 softmax为激活函数,得到子空间对应的注意
本文通过将深层特征图进行反卷积操作后与浅
层特征图进行像素级融合,得到同时包含丰富语义
信息和细节特征信息的特征图,来提高算法对目标
的识别精度。其中,反卷积也称为转置卷积,是一种
上采样操作,可按照设置的参数来增大特征图的分

改进LVAMDF及综合多因素基音检测算法

改进LVAMDF及综合多因素基音检测算法
XueShuaiqiang,ChenBo,ChenFei
(SchoolofComputerScienceandTechnology,SouthwestUniversityofScienceandTechnology, Mianyang 621010,China)
犃犫狊狋狉犪犮狋:Onthebasisofthemethodologythatcategorizesspeechsignalintothreetypes,silence,voicelesssound,voicedsound,in viewoftherandomdistributionofobviousperiodicpropertyspeech,theimprovedalgorithm oflengthvariedaveragemagnitudedifference function (LVAMDF)andcomprehensivemultifactorforpitchfrequencydetectionisputforwardtocategorizevoicedsoundintotwotypes, oneisobviousperiodicpropertyspeech,theotheroneisunobviousperiodicpropertyspeech.Atthesametime,thestartingandending pointsofallaccuratepitchperiodintheobviousperiodicpropertyspeechisachieved.Forafewpitchperiodsdividedintofrequencydoubling orhalffrequency,therecognitionandcorrection methodisproposedwhichhasahighrecognitionandcorrectionrate.Finally,inalargea mountoftherealspeechprocessingexperiments,themethodcandetectthepitchperiodintheobviousperiodicpropertyspeechaccurately, andthereishardlyfrequencydoublingandhalffrequency.Theresultsshowthattheproposedalgorithmperformsmuchbetteronpitchdetec tioncomparedwithAMDFandACF.

基于相控阵技术的低空预警雷达及其性能分析

基于相控阵技术的低空预警雷达及其性能分析

基于相控阵技术的低空预警雷达及其性能分析摘要:随着现代化战争低空突防兵器的快速发展和我国低空空域的逐步开放,人民防空对低空、超低空预警能力提出了迫切需求。

本低空预警雷达使用先进的相控阵技术对分布式T/R组件进行波束控制实现俯仰向扫描,通过垂直多波位扫描及波位内和差波束测角的能力提升测高精度。

系统采取8个仰角波束的波位分配方式,为了保证足够的脉冲积累数,低波束采用三组6脉冲滤波器的MTD处理,高波束采用3脉冲MTI处理。

在处理算法上,我们采用高杂波改善因子的设计与算法,用杂波图、多种门限检测改善低、慢、小目标检测性能,最终研制一套侧重低空小目标观测的三坐标监视雷达。

关键词:相控阵;波束控制;和差波束;三坐标雷达Performance Analysis of Low-altitude Early Warning Radar based on Phased Array TechnologyWu JunAnhui Suncreate Electronics Co.,Ltd.,Hefei,230000Abstract With the rapid development of low-altitude defense weapons in modern warfare and the gradual opening of low-altitude airspace in China,civil air defense has placed an urgent need for low-altitude and ultra-low-altitude early warning capabilities.Advanced phased array technology is employed in this low-altitude early warning radar and beamforming is implemented by the distributed T/R components to achieve scanning in pitch.A higher precision measurement is obtained by using vertical multi-wave scanning and the ability of intra-wavelength and the sum & difference beamforming.The system adopts the wave position distribution mode of 8 elevation beam.In order to ensure sufficient pulse accumulation number,the low beam adopts the MTD processing of three groups of 6-pulse filters,and the high beam adopts 3-pulse MTI processing.The design and algorithm with high clutter improvement factor,clutter map and multiple threshold detection is used in the processing algorithm to improve the performance of dim target detection.Finally,a three-coordinate surveillance radar focused on low-altitude small-object targets was developed.Keywords Phased Array;Beamforming;Sum & Difference Beamforming;Three-coordinate Radar1 前言随着科技的发展,现代化战争中出现了低空突防兵器,以及随着我国经济及社会的发展及低空空域的逐步开放,人民防空和大型活动场所均有周界防范、立体安防监视的市场需求。

多尺度特征耦合双分类器的图像伪造检测算法

多尺度特征耦合双分类器的图像伪造检测算法

多尺度特征耦合双分类器的图像伪造检测算法闻凯【摘要】Current image forgery detection technologies only complete the detection of a single form of forgery,and they are difficult to adapt to a variety of complex combinations of tampering identification,the recognition accuracy and the general performance are poor.To solve the problems,a method of multi-scale feature extraction and double classifier was proposed.Four kinds of feature extraction methods were used to extract the feature information of the input image,respectively.Multi-scale features of image information were formed according to the advantages of each feature extraction method.HMM and SVM were introduced,and the double classifier decision model was designed to extract the multi-scale features as the basis for judging.The double classifier was used to determine the real image and the tampered image.Experimental research shows that compared with the previous algorithms,the proposed algorithm can effectively rotate JPEG compression,blurring and noise on the replication region of the copy paste forgery and splicing detection,and it has higher detection accuracy and better robustness performances.%为解决当前图像伪造检测技术仅局限于单一伪造形式的检测,难以适应各种复杂的组合篡改识别,使其识别精度与通用性能不佳等不足,提出多尺度特征提取耦合双分类器的图像伪造检测算法.分别利用Curve-let变换、Gabor变换、LBP(local binary pattern)与DCT(discrete cosine transform)变换采集输入图像的特征信息,融合这些提取特征,形成图像的多尺度特征;引入隐马尔科夫和支持向量机,设计双分类器的真伪决策模型,将多尺度特征视为识别依据,利用双分类器决策出真实图像和篡改图像.实验结果表明,与当前伪造检测前算法相比,所提算法具有更高的检测精度与鲁棒性,能够有效地对复制区发生旋转、模糊和噪声的复制-粘贴和拼接伪造完成精确检测.【期刊名称】《计算机工程与设计》【年(卷),期】2017(038)010【总页数】7页(P2788-2793,2819)【关键词】图像伪造检测;多尺度特征;双分类器;隐马尔科夫;支持向量机【作者】闻凯【作者单位】南京航空航天大学自动化学院,江苏南京211156;南京航空航天大学金城学院,江苏南京211156【正文语种】中文【中图分类】TP391E-mail:*************************图像篡改[1,2]具有多种形式,根据篡改取证的角度不同,主要分为:复制-粘贴伪造检测方法、拼接伪造检测技术、图像重采样取证方法、JPEG压缩取证方法、噪声不一致性取证方法等[3]。

多排螺旋CT血管造影术显示脊髓Adamkiewicz动脉_曾燕

多排螺旋CT血管造影术显示脊髓Adamkiewicz动脉_曾燕

Demonstration of the artery of Adamkiewicz bymult-i detector -row CT angiographyZENG Yan,Z HAO Jian -nong,SO NG Bin(De partment o f Radiology ,Second Colle ge o f Clinical Medicine,Chongqing Medical Unive rsity ,Chongqing 400010,China)[Abstract] Objective To visualize the artery of Adamkiewicz by mult-i detector -row spiral C T ang i ography (MDCTA),and to inves tigate the op timized scanning parameters,scan delay time and image pos -t processi ng methods.Methods A total of 125consecutive normal sub -jects wi thout any history of thoracic and abdominal lesions who underwent contras -t enhanced CT examination were enrolled to prospectively investigate the op timization of MDCTA techniques,i n terms of scan -triggering C T threshold,scan delay time and various image pos-t process -ing methodology,for the depiction of the detailed anatomy of the artery of Adamkiewicz.Results T he best scanni ng parameters were asfollows:0.5s per rotation,0.75mm slice thickness, 1.5pitch.CT value of 170Hu was selected as the threshold to trigger data acquis-i tion.When this CT threshold value was reached,the scanni ng was delayed for another 18s,and then helical scanning automatically began.Axial images were recons tructed wi th B20f algori thm,1mm slice thickness and 0.75mm interval.Multiplanar reformation (MPR),curved planar reformation (CPR)and maximum intensi ty projection (MIP)were advantageous in depicting the artery of Adamkiewicz.Conclusion Using the opti mized scanning and reconstruction techniques,16-slice MDC TA can clearly depict the detailed anatomy of the artery of Adamkiewicz (including the origin,the branching patterns and the entire course).As a noninvasive imaging technique,opti mized MDCTA should be the method of choice for the depiction of the artery of Adamkiewicz.[Key w ords] Artery of Adamkiewicz;Tomography,X -ray compu ted;Angiography多排螺旋CT 血管造影术显示脊髓Adamkiewicz 动脉曾 燕1,赵建农1,宋 彬2(1.重庆医科大学第二临床学院放射科,重庆 400010; 2.四川大学华西医院放射科)[摘 要] 目的 探讨多排螺旋C T 血管成像技术(MDCTA)显示脊髓Adamkiewicz 动脉的最佳扫描参数、强化延时时间及图像重建方法。

一种高速切变镜头边界探测算法

一种高速切变镜头边界探测算法

一种高速切变镜头边界探测算法王伟强;高文;马继涌【期刊名称】《计算机科学》【年(卷),期】2001(028)007【摘要】Shot boundary detection is a key technique in constructing video information management system. The paper proposes a fast and effective cut detection algorithm in compressed domain. Compared with other algorithms in compressed domain, the algorithm applies multi-resolution detection mode,distinct from the common comparison mode between consecutive frames. The mechanism makes the volume of data processed decrease greatly in the whole detection process. The algorithm also uses the different raw information,which can be extracted directly from frames with different coding types,as features to reduce computation complexity. To verify the validity of the algorithm, we did experiments on a data set containing 145,000 frames. The experiment results demonstrate the algorithm has not only a very fast detection speed,as 2.5 to 5.2 times fast as others' in compressed domain,but also average 98% accuracy and recall.【总页数】5页(P56-59,45)【作者】王伟强;高文;马继涌【作者单位】中科院计算所数字化技术实验室,;中科院计算所数字化技术实验室,;中科院计算所数字化技术实验室,【正文语种】中文【中图分类】TP3【相关文献】1.一种新的基于背景色度的镜头边界探测算法 [J], 雷震;吴玲达;老松杨2.一种基于直方图的切变镜头自动检测算法 [J], 刘典;刘文萍3.一种嵌入H.264/AVC的镜头切变检测算法 [J], 艾育华;叶梧;冯穗力;胡兵;周卫4.一种改进的视频镜头切变检测算法 [J], 潘磊;束鑫5.基于背景颜色不变量的改进双直方图镜头边界探测算法 [J], 雷震;吴玲达;老松杨因版权原因,仅展示原文概要,查看原文内容请购买。

  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Multi-pitch Detection Algorithm Using Constrained Gaussian Mixture Model and Information Criterion for Simultaneous SpeechHirokazu Kameoka,Takuya Nishimoto&Shigeki SagayamaGraduate School of Information Science and TechnologyThe University of Tokyo,Japan{kameoka,nishi,sagayama}@hil.t.u-tokyo.ac.jpAbstractIn this paper,a co-channel multi-pitch detection algorithm is described.We suggest the importance of this when prosodic in-formation is need to be extracted separately from respective F0 patterns of concurrent utterances.Though temporal continu-ity of speech prosody should be considered,we discuss a pro-cess done independently on each single frame as thefirst step.A model of multiple harmonic structures is constructed with a mixture of tied Gaussian mixtures with which a single harmonic structure is modeled.Our algorithm enables to detect both a number of concurrent speakers,and each spectral envelope of underlying harmonic structure based on a maximum likelihood estimation of the model parameters using EM algorithm and an information criterion.It operates without a priori information of F0contours and a restriction of a number of speakers,and it also extracts accurate F0s as continuous values with simple pro-cedures in spectral domain.Experiments showed our algorithm outperformed well-known cepstrum for both speech signals ofa single speaker and simultaneous two speakers.1.IntroductionIt is known that prosodic information offers many useful clues for speech recognition,such as location of important words and phrases,topic segment boundaries,location of disfluencies, identification of languages and others.The process of extracting prosodic information is generally conducted on the assumption that F0pattern is already(roughly)extracted.Yet F0patterns can not always be extracted simply in spontaneous dialogue speech in which simultaneous utterances by two or more speak-ers often occur.Thus,in order to incorporate proper prosodic information into spontaneous dialogue speech recognition,a number of simultaneous speakers and respective F0patterns are desired to be extracted precisely.However,the multi-pitch de-tection problem is hardly simple and is difficult to be solved analytically.Until now,numerous multi-pitch detection methods have been reported not only in speech signal processing[1,2]but also in musical signal processing[3,4,5]and auditory scene analysis[6,7].Chazan et al.addressed a speech separation method by introducing a time warped signal model which al-lows a continuous pitch variations within a long analysis frame [1].Wu et al.described a multi-pitch tracking method in noisy environment byfilter bank process and pitch tracking using HMM[2].Although these methods actualize an accurate de-tection of F0s,either of them does not include specific process of determining the number of speakers.Our objective is to develop a multi-pitch detection algo-rithm which enables to detect the number of simultaneous speakers,the accurate F0s as a continuous values,and more-over,respective spectral envelopes with spectral domain proce-dure.The basic approach is stated in Section2,and the de-tection algorithm is described in Section3.And the results of operation experiments are reported in Section4.2.A Maximum Likelihood Formulation 2.1.Model of Harmonic StructuresAn influence of a window function and a varying pitch within the short time single analysis frame inevitably cause widening of the spectral harmonics which makes it difficult to extract the precise value of F0s and to separate close partials.First we assume that each widened partial is a probability distribu-tion of frequencies,approximated by a Gaussian distribution model.Therefore,a single harmonic structure can then be mod-eled by a tied Gassian mixture model(tied-GMM),in which their means have only1degree of freedom.In log-frequency scale,means of tied-GMM are denoted here ask={µk,···,µk+log n,···,µk+log N k}whereµk ideally corresponds to the log F0of k th sound and n denotes the index of partials.We then introduce a model of multiple harmonic structures Pθ(x) which is a mixture of K tied-GMMs whose model parameter is denoted as{θ}={k,w k,σ|k=1,···,K},(1)where w k={w k1,···,w k n,···,w k Nk}andσindicate the weights and variances(which are briefly assumed here as a con-stant)of the respective Gaussian distributions.2.2.Model Parameter Estimation using EM Algorithm Since the observed spectral density function f(x),where x de-notes log-frequency,is considered to be generated from the model of multiple harmonic structures,the log-likelihood dif-ference in accordance with an update of the model parameter to¯ isf(x)log P¯θ(x)−f(x)log Pθ(x)=f(x)logP¯θ(x)Pθ(x).(2) Although Dempster formulated EM algorithm[8]in order to maximize the mean log-likelihood considering f(x)as a proba-bilistic density function,it can also be formulated in a same way even if f(x)is replaced with spectral density function.By tak-ing expectation of both sides with respect to Pθ(n,k|x)which represents the probability of the{n,k}-labeled Gaussian distri-bution from which x is generated,Q-function will be derived in the right-hand side.Given Q-function asQ(θ,¯θ)=KXk=1N kXnZ∞−∞Pθ(n,k|x)f(x)log P¯θ(x,n,k)dx,(3)thus it yieldsZ ∞−∞f (x )log P ¯θ(x )−f (x )log P θ(x )dx≥Q (θ,¯θ)−Q (θ,θ).(4)By obtaining ¯θwhich maximizes the Q function,the log-likelihood of the model of multiple harmonic structures with respect to every x will be monotonously increased.A posteriori probability P θ(n,k |x )in equation (3)is given asP θ(n,k |x )=P θ(x,n,k )P θ(x ),(5)=w k n·g (x |µk +log n,σ2)X nXk w k n ·g (x |µk +log n,σ2),(6)g (x |x 0,σ2)=1√2πσ2exp −(x −x 0)22σ2,(7)where g (x |x 0,σ2)is a Gaussian distribution.By the iterativeprocedure of the two steps as follows,the model parameter locally converges to ML estimates.Initial-stepInitialize the model parameter .Expectaion-stepCalculate Q (θ,¯θ)with equation (3).Maximization-stepMaximize Q (θ,¯θ)to obtain the next estimate=argmax ¯θQ (θ,¯θ).(8)Replace ¯with and repeat from the Expectation-step.2.3.Another Interpretation as ClusteringFrom another viewpoint,this ML procedure can be understood as a clustering method under a harmonic constraint between Gaussian mixture components where spectral density function is considered as a statistical distribution of micro-energies along frequency axis.As we regard µk as cluster centroids,the a pos-teriori probability in equation (6)as a membership degree of each micro-energy and the log-likelihood P ¯θ(x,n,k )as a dis-tance function between centroid µk and a micro-energy,thus the Q function in equation (3)turns out to be the objective function for fuzzy clustering.We call this concept “Harmonic Cluster-ing.”3.Multi-pitch Detection AlgorithmThe detection algorithm as a whole consists of two processes.In 3.1,we adopt one of the most widely used information criterion on which both processes described in 3.2and 3.3,are based.3.1.Criterion of Model SelectionProvided multiple different model candidates exist,the optimal model must somehow be judged.Here we introduce Akaike Information Criterion (AIC)which was proposed by Akaike in 1973[9].AIC is given byAIC =−2×(maximum log-likelihood of model )+2×(number of free parameters of model ),(9)whose minimum offers a proper estimate of the number of freeparameters.250300350400450500102030405062.739.736.433.330.447.5f r e q u e n c y iterative timesFigure 1:An example of convergence to the true valuesN k X n =1Z ∞−∞P θ(n,k |x )f (x )dx,(11)¯w k=1F N k N k X n =1Z ∞−∞P θ(n,k |x )dx,(12)where F is an integral of f(x)with respect to x.3.Calculate AIC with equation(9).Since there are two freeparameters for each tied-GMM,the model has2×K freeparameters altogether.If the AIC increases,the numberof tied-GMMs just before they are reduced in step4willbe the estimate of the number of harmonic structures.4.Remove the tied-GMM(s)which conforms either of thetwo conditions as below and repeat from step2.•The one whose w k is the minimum among all.Since the contribution to the maximum log-lik-elihood must be the least.•The one whose w k is smaller if the two adjacentrepresentative means become closer than a certaindistance(threshold).Since the two representativemeans are presumed to converge to the same opti-mal solution.An example of how this process actually works is shown in Fig.1where the observed spectrum used is depicted in Fig.2. The broken line represents the point where the model parame-ters were judged to be converged and the circled value indicates the value of AIC at each point.Since AIC takes minimum when 3tied-GMMs remain,the detected number here is3.3.3.Detection of F0s and Spectral EnvelopesIn the previous process,the ML procedure allows to aqcuire lo-cal optimal solutions ofµk without distinction of the true F0s or the multiples of the true F0s.Therefore,the true F0s must somehow be discovered by replacingµk each by each to their multiples.Consider now that a degree of freedom is given to ev-ery w k n and consequently allows to extract the spectral envelope, i.e.,the relative amplitudes of the partials.Ifµk is lower than the true F0,the model must be over-fit.From this point of view, the problem of obtaining the true F0s and the spectral envelope can also be handled with the information criterion.The process shown below is done with all remaining tied-GMMs after the previous process.1.Replace the representative means toµk+log t where t isan integer number whose initial value is1.The numberof Gaussians limited below the Nyquist log-frequency isdenoted as N t k.2.Estimate the ML model parameters by EM algorithm.Here we only update w k n and should be updated to¯w k n=1FZ∞−∞Pθ(n,k|x)dx.(13)3.Calculate AIC with equation(9).The number of freeparameters here is N t k.If the AIC increases,the processshould be interrupted and theµk+log(n−1)is consideredas the detected F0,and if not,add1to t and return tostep1.4.ExperimentsExperiments were carried out to validate our algorithm by eval-uating the accuracy of F0detection in comparison with well-known cepstrum.A database of every speechfile and reference F0contour are constructed from the ATR Speech Database.All signals were digitized at12kHz sampling rate and analyzed with Hamming window where frame length and shift were64 ms and10ms,respectively.The initial number of the tied-GMMs was set to4and the frequency range was from70Hz1000.50.75 1.0(s)20080(Hz)timelog-frequencyFigure3:Detected F0contour of a single speaker1000.50.75 1.0(s)20080(Hz)timelog-frequencyFigure4:Reference F0contour corresponding to Figure3Table1:Results for a single speakerAccuracy(%)Speechfile Cepstrum Proposed‘myisda01’88.298.0‘myisda02’88.499.0‘myisda03’84.898.1‘myisda04’85.192.4‘myisda05’76.893.7‘fymsda01’86.398.5‘fymsda02’87.197.5‘fymsda03’83.395.8‘fymsda04’86.796.8‘fymsda05’85.296.0to140Hz,andσwas assigned to0.45.Speechfiles begin with ‘myi-’and‘fym-’stand for speech signals of a male and a fe-male speakers.Deviations over5%from the references were deemed as gross errors.Every accuracy shown in table1,2and 3is a percentage of frames at which F0s are correctly detected.4.1.Results for Speech Signals of a single speakerThe algorithm wasfirst tested on single-channel speech signals of a single speaker.A comparison of accuracies between cep-strum and proposed method for each speaker are shown in table 1.As the results,our algorithm significantly outperforms cep-strum.An example of detected F0contour is depicted infigure 3where the reference is shown infigure4.4.2.Results for Simultaneous Speech SignalsThe algorithm was next tested on co-channel simultaneous speech signals spoken by two speakers.Each speech signalfile was artificially created by mixing two independent speech sig-nals with0dB signal-to-signal ratio.To evaluate our algorithm objectively,we also applied cepstrum for simultaneous speech signals which is not generally designed as a multi-pitch detec-tor.Results with cepstrum are shown in table2and results with our algorithm are shown in table3.An example of detected F0 contours is depicted infigure5where the reference is shown infigure6.Pairs of speechfiles by which concurrent speech signals are created are shown in thefirst and second columns in table2and3.As the results,our algorithm significantly outper-formed cepstrum as well and showed high performance.Some of the gross errors were found at thefirst process1001.0 1.25 1.5200300(Hz)Figure 5:Detected F 0contours of two concurrent speakers1001.0 1.25 1.5(s)200300(Hz)Figure 6:Reference F 0contours corresponding to Figure 5Table 2:Results for two speakers (Cepstrum)Speech files Accuracy(%)File 1File 2Speaker 1Speaker 2‘myisda01’‘myisda03’63.763.1‘myisda01’‘myisda04’45.751.6‘myisda02’‘myisda03’63.350.1‘myisda02’‘myisda04’59.442.1‘fymsda01’‘fymsda02’57.754.0‘fymsda01’‘fymsda04’53.141.0‘fymsda02’‘fymsda03’52.959.6‘fymsda02’‘fymsda04’64.964.7‘myisda01’‘fymsda03’45.743.0‘myisda02’‘fymsda05’55.044.5‘myisda03’‘fymsda04’41.459.9‘myisda04’‘fymsda02’64.950.6‘myisda05’‘fymsda03’59.462.8‘myisda04’‘fymsda01’62.071.7Table 3:Results for two speakers (Proposed)Speech files Accuracy(%)File 1File 2Speaker 1Speaker 2‘myisda01’‘myisda03’90.183.0‘myisda01’‘myisda04’92.881.3‘myisda02’‘myisda03’88.285.7‘myisda02’‘myisda04’84.487.6‘fymsda01’‘fymsda02’90.784.3‘fymsda01’‘fymsda04’85.382.6‘fymsda02’‘fymsda03’79.290.3‘fymsda02’‘fymsda04’86.292.6‘myisda01’‘fymsda03’76.184.9‘myisda02’‘fymsda05’74.892.8‘myisda03’‘fymsda04’72.688.4‘myisda04’‘fymsda02’86.385.5‘myisda05’‘fymsda03’78.086.6‘myisda04’‘fymsda01’79.086.6mainly because of unvoiced consonants.Since we focused only on harmonic structure,the gross errors caused by them were dif-ficult to avoid.Meanwhile,when the two simultaneous speak-ers were male and female,male rather resulted worse.At the second process stated in Section 3,AIC rather prefers µk to be positioned in as higher frequency as it can because the number of free parameters can be lessen.Accordingly,if both pitch and amplitude of one utterence was specificially lower than another,it tended to be ignored.5.ConclusionsWe proposed an algorithm which enables to detect the num-ber of speakers,accurate F 0s and spectral envelopes from co-channel input simultaneous speech signals with spectral domain procedure.It showed a high performance for speech signals of both single speaker and two speakers.Still,several improve-ments are prospective by considering temporal continuity of F 0contour (e.g.,introducing Fujisaki model),incorporating vari-ance into the model parameters also as a variable or by intro-ducing a priori probability distribution of the model parameters,etc.6.References[1]Chazan,D.;Stettiner,Y .;Malah,D.,1993.Optimal Multi-pitch Estimation Using the EM Algorithm for Co-channelSpeech Separation.Proc.ICASSP93,V ol.2,728–731.[2]Wu,M.;Wang,D.;Brown,G.J.,2002.A Multi-pitchTracking Algorithm for Noisy Speech.ICASSP2002,V ol.1,369–372.[3]Godsill,S.;Davy,M.,2002.Baysian Harmonic Mod-els for Musical Pitch Estimation and Analysis.Proc.ICASSP2002,V ol.2,1769–1772.[4]Klapuri,A.;Virtanen,T.;Holm,J.,2000.Robust Mul-tipitch Estimation for the Analysis and Manipulation of Polyphonic Musical Signals.In Proc.COST-G6Confer-ence on Digital Audio Effects ,233–236.[5]Virtanen,T.;Klapuri,A.,2002.Separation of HarmonicSounds Using Linear Models for the Overtone Series.Proc.ICASSP2002,V ol.2,1757–1760.[6]Abe,M.;Ando,S.,2000.Auditory Scene AnalysisBased on Time-Frequency Integration of Shared FM and AM (II):Optimum Time-Domain Integration and Stream Sound Reconstruction.Trans.IEICE ,V ol.J83-D-II,No.2,468–477,(in Japanese).[7]Karjalainen,M.;Tolonen,T.,2001.Multi-pitch and Pe-riodicity Analysis Model for Sound Separation and Audi-tory Scene Analysis.IEEE Trans.on Speech and Audio Processing ,vol.9,no.2,127–140.[8]Dempster,A.P.;Laird,N.M.;Rubin,D.B.,1977.Max-imum Likelihood from Incomplete Data via the EM Al-gorithm.J.of Royal StatisticalSociety Series B ,V ol.39,1-38.[9]Akaike,H.,rmation Theory and an Extensionof the Maximum Likelihood Principle.2nd Inter.Symp.on Information Theory ,267–281.。

相关文档
最新文档