Speaker Diarization for Multiple Distant Microphone Meetings Mixing Acoustic Features and I

合集下载

声音信号非均匀量化的pcm编码

声音信号非均匀量化的pcm编码

声音信号非均匀量化的PCM编码简介在数字音频处理中,将模拟声音信号转换为数字信号是至关重要的一步。

其中,PCM编码(Pulse Code Modulation)是一种常用的方法,将连续的模拟声音信号量化为离散的数字信号。

然而,在传统的PCM编码中,采用的是均匀量化的方式,即将声音信号根据固定的量化级别进行离散化。

这种方式的问题在于,在高动态范围的声音信号中,由于量化级别的固定性,不能很好地保留信号的细节和动态范围。

为了克服这个问题,非均匀量化的PCM编码方法被提出。

本文将介绍声音信号非均匀量化的PCM编码原理以及其优缺点,并探讨其在实际应用中的意义。

非均匀量化的PCM编码原理非均匀量化的PCM编码方法是基于人耳对声音的感知特性而设计的。

根据人耳的特点,对于较强的声音信号,人耳的感知灵敏度较低,即在相同的信噪比下,对较强的声音信号的量化误差容忍度更高;而对于较弱的声音信号,人耳的感知灵敏度较高,即在相同的信噪比下,对较弱的声音信号的量化误差容忍度较低。

基于以上特点,非均匀量化的PCM编码方法通过动态调整量化级别,使得相对较强的声音信号使用较少的量化级别,而相对较弱的声音信号使用更多的量化级别。

这样一来,可以在保持较高的信噪比的前提下,提高对较弱声音信号的量化精度,以更好地保留信号的细节和动态范围。

非均匀量化的PCM编码算法非均匀量化的PCM编码算法实现的关键在于确定合适的量化级别和对应的压缩函数。

下面简要介绍一种常用的非均匀量化的PCM编码算法:Mu-law压缩算法。

Mu-law压缩算法是一种基于非线性变换的压缩算法。

其原理是将输入的声音信号通过合适的非线性变换函数映射到较小的动态范围内,然后再进行均匀量化。

具体的非线性变换函数为:$$ f(x) = \\text{sign}(x) \\cdot \\frac{\\ln(1 + \\mu \\cdot |x|)}{\\ln(1+\\mu)} $$其中,f(f)为变换后的信号值,f为输入信号值,$\\mu$为压缩参数,$\\text{sign}(x)$为f的符号函数。

低频活动漂浮潜水船声探测系统(LFATS)说明书

低频活动漂浮潜水船声探测系统(LFATS)说明书

LOW-FREQUENCY ACTIVE TOWED SONAR (LFATS)LFATS is a full-feature, long-range,low-frequency variable depth sonarDeveloped for active sonar operation against modern dieselelectric submarines, LFATS has demonstrated consistent detection performance in shallow and deep water. LFATS also provides a passive mode and includes a full set of passive tools and features.COMPACT SIZELFATS is a small, lightweight, air-transportable, ruggedized system designed specifically for easy installation on small vessels. CONFIGURABLELFATS can operate in a stand-alone configuration or be easily integrated into the ship’s combat system.TACTICAL BISTATIC AND MULTISTATIC CAPABILITYA robust infrastructure permits interoperability with the HELRAS helicopter dipping sonar and all key sonobuoys.HIGHLY MANEUVERABLEOwn-ship noise reduction processing algorithms, coupled with compact twin line receivers, enable short-scope towing for efficient maneuvering, fast deployment and unencumbered operation in shallow water.COMPACT WINCH AND HANDLING SYSTEMAn ultrastable structure assures safe, reliable operation in heavy seas and permits manual or console-controlled deployment, retrieval and depth-keeping. FULL 360° COVERAGEA dual parallel array configuration and advanced signal processing achieve instantaneous, unambiguous left/right target discrimination.SPACE-SAVING TRANSMITTERTOW-BODY CONFIGURATIONInnovative technology achievesomnidirectional, large aperture acousticperformance in a compact, sleek tow-body assembly.REVERBERATION SUPRESSIONThe unique transmitter design enablesforward, aft, port and starboarddirectional transmission. This capabilitydiverts energy concentration away fromshorelines and landmasses, minimizingreverb and optimizing target detection.SONAR PERFORMANCE PREDICTIONA key ingredient to mission planning,LFATS computes and displays systemdetection capability based on modeled ormeasured environmental data.Key Features>Wide-area search>Target detection, localization andclassification>T racking and attack>Embedded trainingSonar Processing>Active processing: State-of-the-art signal processing offers acomprehensive range of single- andmulti-pulse, FM and CW processingfor detection and tracking. Targetdetection, localization andclassification>P assive processing: LFATS featuresfull 100-to-2,000 Hz continuouswideband coverage. Broadband,DEMON and narrowband analyzers,torpedo alert and extendedtracking functions constitute asuite of passive tools to track andanalyze targets.>Playback mode: Playback isseamlessly integrated intopassive and active operation,enabling postanalysis of pre-recorded mission data and is a keycomponent to operator training.>Built-in test: Power-up, continuousbackground and operator-initiatedtest modes combine to boostsystem availability and accelerateoperational readiness.UNIQUE EXTENSION/RETRACTIONMECHANISM TRANSFORMS COMPACTTOW-BODY CONFIGURATION TO ALARGE-APERTURE MULTIDIRECTIONALTRANSMITTERDISPLAYS AND OPERATOR INTERFACES>State-of-the-art workstation-based operator machineinterface: Trackball, point-and-click control, pull-down menu function and parameter selection allows easy access to key information. >Displays: A strategic balance of multifunction displays,built on a modern OpenGL framework, offer flexible search, classification and geographic formats. Ground-stabilized, high-resolution color monitors capture details in the real-time processed sonar data. > B uilt-in operator aids: To simplify operation, LFATS provides recommended mode/parameter settings, automated range-of-day estimation and data history recall. >COTS hardware: LFATS incorporates a modular, expandable open architecture to accommodate future technology.L3Harrissellsht_LFATS© 2022 L3Harris Technologies, Inc. | 09/2022NON-EXPORT CONTROLLED - These item(s)/data have been reviewed in accordance with the InternationalTraffic in Arms Regulations (ITAR), 22 CFR part 120.33, and the Export Administration Regulations (EAR), 15 CFR 734(3)(b)(3), and may be released without export restrictions.L3Harris Technologies is an agile global aerospace and defense technology innovator, delivering end-to-endsolutions that meet customers’ mission-critical needs. The company provides advanced defense and commercial technologies across air, land, sea, space and cyber domains.t 818 367 0111 | f 818 364 2491 *******************WINCH AND HANDLINGSYSTEMSHIP ELECTRONICSTOWED SUBSYSTEMSONAR OPERATORCONSOLETRANSMIT POWERAMPLIFIER 1025 W. NASA Boulevard Melbourne, FL 32919SPECIFICATIONSOperating Modes Active, passive, test, playback, multi-staticSource Level 219 dB Omnidirectional, 222 dB Sector Steered Projector Elements 16 in 4 stavesTransmission Omnidirectional or by sector Operating Depth 15-to-300 m Survival Speed 30 knotsSize Winch & Handling Subsystem:180 in. x 138 in. x 84 in.(4.5 m x 3.5 m x 2.2 m)Sonar Operator Console:60 in. x 26 in. x 68 in.(1.52 m x 0.66 m x 1.73 m)Transmit Power Amplifier:42 in. x 28 in. x 68 in.(1.07 m x 0.71 m x 1.73 m)Weight Winch & Handling: 3,954 kg (8,717 lb.)Towed Subsystem: 678 kg (1,495 lb.)Ship Electronics: 928 kg (2,045 lb.)Platforms Frigates, corvettes, small patrol boats Receive ArrayConfiguration: Twin-lineNumber of channels: 48 per lineLength: 26.5 m (86.9 ft.)Array directivity: >18 dB @ 1,380 HzLFATS PROCESSINGActiveActive Band 1,200-to-1,00 HzProcessing CW, FM, wavetrain, multi-pulse matched filtering Pulse Lengths Range-dependent, .039 to 10 sec. max.FM Bandwidth 50, 100 and 300 HzTracking 20 auto and operator-initiated Displays PPI, bearing range, Doppler range, FM A-scan, geographic overlayRange Scale5, 10, 20, 40, and 80 kyd PassivePassive Band Continuous 100-to-2,000 HzProcessing Broadband, narrowband, ALI, DEMON and tracking Displays BTR, BFI, NALI, DEMON and LOFAR Tracking 20 auto and operator-initiatedCommonOwn-ship noise reduction, doppler nullification, directional audio。

Dialog无线音频解决方案

Dialog无线音频解决方案

Dialog 无线音频解决方案
Dialog 无线音频模块和IC 提供低延迟的HiFi 音频,适用于无线头戴式耳机和半专业麦克风等,受益于无干扰的无线解决方案,提供低延迟音频和更
长的电池续航能力。

强大且低功耗的无线音频
Dialog 针对全球1.9 GHz 和2.4 GHz 射频频段的批量生产解决方案打破了传统障碍,为RF 应用带来了CMOS 的优势。

我们的产品可靠、功能强大、且易于使用,可提供卓越的性能和灵活性,帮助您创建真正优于竞争产品的
系统。

Dialog 的无线音频产品组合包括具有嵌入式固件的无线音频模块和由SmartBeat™软件环境支持的IC。

SmartBeat™软件平台为高质量和固定低延迟无线音频应用提供高度集成的解决方案,支持高达48 kHz 的采样频率。

它支持点对点、点对多点、多点对点音频和数据通道,针对无线headset(带麦克风的头戴式耳机)(符合Lync 标准)、headphone(不带麦克风的头戴式耳机)、扬声器、重低音音箱和麦克风等应用。

Dialog 无线音频产品组合。

P.563

P.563

INTERNATIONAL TELECOMMUNICATION UNIONITU-T P.563(05/2004) TELECOMMUNICATIONSTANDARDIZATION SECTOROF ITUSERIES P: TELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKSObjective measuring apparatusSingle-ended method for objective speech quality assessment in narrow-band telephony applicationsITU-T Recommendation P.563ITU-T P-SERIES RECOMMENDATIONSTELEPHONE TRANSMISSION QUALITY, TELEPHONE INSTALLATIONS, LOCAL LINE NETWORKSSeries P.10 Vocabulary and effects of transmission parameters on customer opinion of transmissionqualitySubscribers' lines and sets Series P.30P.300 Transmission standards Series P.40Objective measuring apparatus Series P.50P.500 Objective electro-acoustical measurements Series P.60Measurements related to speech loudness Series P.70Methods for objective and subjective assessment of quality Series P.80P.800 Audiovisual quality in multimedia services Series P.900For further details, please refer to the list of ITU-T Recommendations.ITU-T Recommendation P.563Single-ended method for objective speech quality assessmentin narrow-band telephony applicationsSummaryThis Recommendation describes an objective single-ended method for predicting the subjective quality of 3.1 kHz (narrow-band) telephony applications. This Recommendation presents a high-level description of the method and advice on how to use it. An ANSI-C reference implementation, described in Annex A, is provided in separate files and forms an integral part of this Recommendation. A conformance testing procedure is also specified in Annex A to allow a user to validate that an alternative implementation of the model is correct. This ANSI-C reference implementation shall take precedence in case of conflicts between the high-level description as given in this Recommendation and the ANSI-C reference implementation.This Recommendation includes an electronic attachment containing an ANSI-C reference implementation and conformance testing data.SourceITU-T Recommendation P.563 was approved on 14 May 2004 by ITU-T Study Group 12 (2001-2004) under the ITU-T Recommendation A.8 procedure.ITU-T Rec. P.563 (05/2004) iFOREWORDThe International Telecommunication Union (ITU) is the United Nations specialized agency in the field of telecommunications. The ITU Telecommunication Standardization Sector (ITU-T) is a permanent organ of ITU. ITU-T is responsible for studying technical, operating and tariff questions and issuing Recommendations on them with a view to standardizing telecommunications on a worldwide basis.The World Telecommunication Standardization Assembly (WTSA), which meets every four years, establishes the topics for study by the ITU-T study groups which, in turn, produce Recommendations on these topics.The approval of ITU-T Recommendations is covered by the procedure laid down in WTSA Resolution 1.In some areas of information technology which fall within ITU-T's purview, the necessary standards are prepared on a collaborative basis with ISO and IEC.NOTEIn this Recommendation, the expression "Administration" is used for conciseness to indicate both a telecommunication administration and a recognized operating agency.Compliance with this Recommendation is voluntary. However, the Recommendation may contain certain mandatory provisions (to ensure e.g. interoperability or applicability) and compliance with the Recommendation is achieved when all of these mandatory provisions are met. The words "shall" or some other obligatory language such as "must" and the negative equivalents are used to express requirements. The use of such words does not suggest that compliance with the Recommendation is required of any party.INTELLECTUAL PROPERTY RIGHTSITU draws attention to the possibility that the practice or implementation of this Recommendation may involve the use of a claimed Intellectual Property Right. ITU takes no position concerning the evidence, validity or applicability of claimed Intellectual Property Rights, whether asserted by ITU members or others outside of the Recommendation development process.As of the date of approval of this Recommendation, ITU had received notice of intellectual property, protected by patents, which may be required to implement this Recommendation. However, implementors are cautioned that this may not represent the latest information and are therefore strongly urged to consult the TSB patent database.ITU 2005All rights reserved. No part of this publication may be reproduced, by any means whatsoever, without the prior written permission of ITU.ii ITU-T Rec. P.563 (05/2004)CONTENTSPage1 Introduction (1)references (1)2 Normative3 Abbreviations (2)4 Scope (2)5 Convention (5)6 Requirements on speech signals to be assessed (5)7 Overview of P.563 (6)7.1 Vocal tract analysis and unnaturalness of speech (7)7.2 Analysis of strong additional noise (8)7.3 Interruptions, mutes and time clipping (8)classification (8)7.4 Distortion8 Comparison between objective and subjective scores (9)coefficient (10)8.1 Correlation9 High level description of the functional blocks used in P.563 (10)9.1 Description of basic speech descriptors and the signal pre-processing (10)9.2 Description of the functional block 'Vocal tract analysis and UnnaturalVoice' (18)9.3 Description of the functional block 'Additive Noise' (32)9.4 Description of the 'Mutes/Interruptions' functional block components (42)9.5 Description of the Speech Quality Model (46)Annex A – Source code for reference implementation and conformance tests (49)A.1 List of files provided for the ANSI-C reference implementation (49)A.2 List of files provided for conformance validation (50)A.3 Speech files provided for validation with variable delay (57)A.4 Conformance data sets (57)requirements (57)A.5 ConformanceA.6 Conformance test on unknown data (57)Electronic attachment: ANSI-C reference implementation and conformance testing data.ITU-T Rec. P.563 (05/2004) iiiITU-T Recommendation P.563Single-ended method for objective speech quality assessmentin narrow-band telephony applications11 IntroductionThe P.563 algorithm is applicable for speech quality predictions without a separate reference signal. For this reason, this method is recommended for non-intrusive speech quality assessment, live network monitoring and assessment by using unknown speech sources at the far-end side of a telephone connection.Real systems may include background noise, filtering and variable delay, as well as distortions due to channel errors and speech codecs. Up to now, methods for speech quality assessment of such systems, such as ITU-T Rec. P.862, require either a reference signal or they calculate only quality indexes based on a restricted set of parameters like level, noise in speech pauses and echoes.The P.563 approach is the first recommended method for single-ended non-intrusive measurement applications that takes into account the full range of distortions occurring in public switched telephone networks and that is able to predict the speech quality on a perception-based scale MOS-LQO according to ITU-T Rec. P.800.1. This Recommendation is not restricted to end-to-end measurements; it can be used at any arbitrary location in the transmission chain. The calculated score is then comparable to the quality perceived by a human listener, who is listening with a conventional shaped handset at this point.The validation of P.563 included all available experiments from the former P.862 validation process, as well as a number of experiments that specifically tested its performance by using an acoustical interface in a real terminal at the sending side. Furthermore, the P.563 algorithm was tested independently with unknown speech material by third party laboratories under strictly defined requirements.It is recommended that P.563 be used for speech quality assessment in 3.1 kHz (narrow-band) telephony applications only.references2 NormativeThe following ITU-T Recommendations and other references contain provisions which, through reference in this text, constitute provisions of this Recommendation. At the time of publication, the editions indicated were valid. All Recommendations and other references are subject to revision; users of this Recommendation are therefore encouraged to investigate the possibility of applying the most recent edition of the Recommendations and other references listed below. A list of the currently valid ITU-T Recommendations is regularly published. The reference to a document within this Recommendation does not give it, as a stand-alone document, the status of a Recommendation. −ITU-T Recommendation P.48 (1988), Specification for an intermediate reference system. −ITU-T Recommendation P.800 (1996), Methods for subjective determination of transmission quality.−ITU-T Recommendation P.810 (1996), Modulated noise reference unit (MNRU).____________________1This Recommendation includes an electronic attachment containing an ANSI-C reference implementation and conformance testing data.ITU-T Rec. P.563 (05/2004) 1– ITU-T Recommendation P.830 (1996), Subjective performance assessment of telephone-band and wideband digital codecs.– ITU-T Recommendation P.862 (2001), Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephonenetworks and speech codecs.– ITU-T P-series Recommendations – Supplement 23 (1998), ITU-T coded-speech database.3 AbbreviationsThis Recommendation uses the following abbreviations:ACR Absolute Category RatingCELP Code-Excited Linear PredictiondBov dB to overload pointDCME Digital Circuit Multiplication EquipmentERP Ear Reference PointHATS Head and Torso SimulatorIRS Intermediate Reference SystemLPC Linear Prediction CoefficientMOS Mean Opinion ScoreMOS-LQO Mean Opinion Score – Listening Quality ObjectiveMOS-LQS Mean Opinion Score – Listening Quality SubjectivePCM Pulse Code ModulationSNR Signal-to-NoiseRatioSPL Sound Pressure Level4 ScopeBased on the benchmark results presented within Study Group 12 in 2003, an overview of the test factors, coding technologies and applications to which this Recommendation applies is given in Tables 1 to 3. Table 1 presents the relationships of test factors, coding technologies and applications for which this Recommendation has been found to show acceptable accuracy. Table 2 presents a list of conditions for which the Recommendation is known to provide inaccurate predictions or is otherwise not intended to be used. Finally, Table 3 lists factors, technologies and applications for which P.563 has not currently been validated. Although correlations between objective and subjective scores in the benchmark were around 0.89 for both known and unknown data, the P.563 algorithm cannot be used to replace subjective testing but it can be applied for measurements where auditory tests would be too expensive or not applicable at all.It should also be noted that the P.563 algorithm does not provide a comprehensive evaluation of transmission quality. It only measures the effects of one-way speech distortion and noise on speech quality in the same way as it can be investigated by an auditory test assessing listening quality on an ACR scale. The P.563 algorithm scores the speech signal in that way, as it is presented to a human listener by using a conventional shaped handset and listening with a SPL of 79 dB at the ERP.2ITU-T Rec. P.563 (05/2004)Because P.563 models the human quality perception in combination with a common receiving terminal, the degradation produced by a receiving terminal and other equipment in a real monitored connection, which are connected behind the measurement point, cannot be taken into account. Because P.563 predicts listening quality scores, all effects degrading talking quality or conversational quality only cannot be taken into account. That means, the effects of loudness loss, delay, sidetone, talker-echo, and other impairments related to the talking quality or two-way interaction only are not reflected in the P.563 scores. Therefore, it is possible to have high P.563 scores, yet non-optimal quality of the connection overall.It should be highlighted that P.563 is designed for the prediction of speech quality in public switched narrow-band telephone networks. The types and the amount of the distortions, technologies and applications in the validation procedure cover the range of common occurrences in such networks. Extreme situations, even if they fulfil the terms of Table 1, may be predicted inaccurately.Table 1/P.563 – Factors for which P.563 had demonstrated acceptable accuracy andrecommended application scenariosTest factorsCharacteristics of the acoustical environment (reflections, different reverberation times) as used in the validation phase. Mobile and conventional shaped handsets as well as handsfree terminals according toP.340 test-setup in office environments were used (See Note).Environmental noise at the sending sideCharacteristics of the acoustical interface of the sending terminalRemaining electrical and encoding characteristics of the sending terminalSpeech input levels to a codecTransmission channel errorsPacket loss and packet loss concealment with CELP codecsBit rates if a codec has more than one bit-rate modeTranscodingsEffect of varying delay on listening quality in ACR testsShort-term time warping of speech signalLong-term time warping of speech signalTransmission systems including echo cancellers and noise reduction systems under single talk conditions and as they will be scored on an ACR scaleCoding technologiesWaveform codecs, e.g., G.711; G.726; G.727CELP and hybrid codecs ≥4 kbit/s, e.g., G.728, G.729, G.723.1Other codecs: GSM-FR, GSM-HR, GSM-EFR, GSM-AMR, CDMA-EVRC, TDMA-ACELP, TDMA-VSELP, TETRAITU-T Rec. P.563 (05/2004) 3Table 1/P.563 – Factors for which P.563 had demonstrated acceptable accuracy andrecommended application scenariosRecommended application scenarios for P.563Live network monitoring using digital or analogue connection to the networkLive network end-to-end testing using digital or analogue connection to the networkLive network end-to-end testing with unknown speech sources at the far end sideNOTE – For more detailed information, please refer to the published testplans ("Joint Test Plan for Single-Ended Assessment Models", COM 12-D 121, January 2003).Table 2/P.563 –- P.563 is known to provide inaccurate predictions when used in conjunction with these variables, or is otherwise not intended to be used with these variablesTest factorsListening levels, Loudness loss (See Note.)SidetoneEffect of delay in conversational testsTalker echoMusic or network tones as input signalCoding technologiesLPC vocoder technologies at bit rates < 4.0 kbit/s, e.g., IMBE, AMBE, LPC10eApplicationsPredicting talking qualityTwo-way communication performanceNOTE – P.563 assumes a standard listening level of 79 dB SPL and compensates for non-optimum signal levels in the input files. The subjective effect of deviation from optimum listening level is therefore not taken into account.Table 3/P.563 – Factors, technologies and applications for which P.563 has not or not fullybeen validated at the time of the standardizationTest factorsAmplitude clipping of speech (was not included in the evaluation data)Talker dependencies and multiple simultaneous talkersSinging voice and child's voice as input to a codecBit-rate mismatching between an encoder and a decoder if a codec has more than one bit-rate mode Artificial speech signals as input to a codecListener echoEffects/artifacts from isolated echo cancellersEffects/artifacts from isolated noise reduction algorithmsEvaluation of synthetic speech and/or using it as input to a speech codec4ITU-T Rec. P.563 (05/2004)Table 3/P.563 – Factors, technologies and applications for which P.563 has not or not fullybeen validated at the time of the standardizationCoding technologiesCELP and hybrid codecs < 4 kbit/sMPEG-4 HVXCApplicationsMeasurements at the acoustic interface of the receiving terminal/handset, e.g., using HATS5 ConventionSubjective evaluation of telephone networks and speech codecs may be conducted using listening-only or conversational methods of subjective testing. For practical reasons, listening-only tests are the only feasible method of subjective testing during the development of speech codecs, when a real-time implementation of the codec is not available. Also listening-only tests are often not practicable for live network monitoring. This Recommendation discusses an objective measurement technique for estimating subjective quality obtained in listening-only tests, using listening equipment conforming to the IRS or modified IRS receive characteristics.The P.563 approach predicts the results of ACR listening quality (LQS) subjective experiments by calculating a listening quality value (LQO) using the common MOS scale from 1 to 5. This Recommendation should, therefore, be considered to relate primarily to the ACR LQ opinion scale.6 Requirements on speech signals to be assessedThe described algorithm is designed for evaluating human speech only. It cannot be used for the evaluation of music, noise or other non-speech audio signals. The applicability if singing voice is transmitted over telephone connections has not yet been validated.The speech signal to be assessed has to be recorded at an 'electrical' interface. That means, recordings made by an artificial ear in the acoustical domain cannot be used. Furthermore, outcomes simulation of speech transmissions or other speech processing can be used if they are covered by the scope given in Table 1 and do not include a terminal simulation.The digitized speech signal has to fulfil the following requirements:frequency: 8000Hz• SamplingIf higher frequencies are used for recording, a separate down-sampling by using a high quality flat low pass filter has to be applied. Lower sampling frequencies are not allowed. • Amplitude resolution: 16 bit linear PCM• Minimum active speech in file: 3.0 s• Maximum signal length: 20.0 s• Minimum speech activity ratio: 25%• Maximum speech activity ratio: 75%• Range of active speech level: –36.0 to –16.0 dBovA level adjustment to –26 dBov is part of P.563. The recommended level limitation shouldavoid additional artefacts by low SNR or amplitude clipping respectively.7 Overview of P.563In comparison to P.862 (a so-called 'double-ended' or 'intrusive' method) that compares a high quality reference signal to the degraded signal on a basis of a perceptual model, P.563 predicts the speech quality of a degraded signal without a given reference speech signal. Figure 1 illustrates the differences in these approaches.Figure 1/P.563 – Non-Intrusive versus Intrusive modelsThe P.563 approach could be visualized as an expert who is listening to a real call with a test device like a conventional handset into the line in parallel. This visualization explains also the main application and allows the user to rate the scores gained by P.563. The quality score predicted by P.563 is related to the perceived quality by linking a conventional handset at the measuring point. Consequently, the listening device has to be part of the P.563 approach. Therefore, each signal will first be pre-processed. This pre-processing begins with the model of the receiving handset. Following this, a voice activity detector (VAD) is used to identify portions of the signal that contain speech and the speech level is calculated. Finally, a speech level adjustment to –26 dBov is applied. The pre-processed speech signal to be assessed will be investigated by several separate analyses, which detect, like a sensor layer, a set of characterizing signal parameters. This analysis will be applied at first to all signals. Based on a restricted set of key parameters, an assignment to a main distortion class will be made.The key parameters and the assigned distortion class are used for the adjustment of the speech quality model. This provides a perceptual based weighting where several distortions are occurringin one signal but one distortion class is more prominent than the others. The basic block-scheme of P.563 is shown in Figure 2.Figure 2/P.563 – Block scheme of P.563Basically, the P.563 algorithm's signal parameterization can be divided into three independent functional blocks that correspond to the main classes of distortion:– Vocal tract analysis and unnaturalness of speech1) Male voices;2) Female voices;3) Strong 'Robotization'.–Analysis of strong additional noise1) Low static SNR (Background noise floor);2) Low segmental SNR (Noise that is related to the signal envelope).–Interruptions, mutes and time clippingIn addition, a set of basic speech descriptors like active speech level, speech activity and level variations will be used, mainly for adjusting the pre-processing and the VAD. Some of the signal parameters calculated within the pre-processing stage will be used in these three functional blocks.7.1 Vocal tract analysis and unnaturalness of speechThe main block looks for unnaturalness in the speech signal. This functional block contains a speech production model for extracting signal parts that could be interpreted as voice and separates them from the non-speech parts. Furthermore, high order statistical analysis gives additional information about how human-like the speech is.The unnaturalness of speech will be rated separately for male and female voices. Furthermore, in the case of strong robotization2, another separate rating is made, which is gender-independent.In this clause the signal is investigated for the occurrence of tones like DTMF-tones or similar highly periodic signals that are not speech.Other very annoying disturbances are repeated speech frames. In packet-based transmission systems, a typical error that can occur is the loss of packets. Some speech codecs employ error concealment methods in order to increase the received speech quality. In fact, some error concealment methods use packet (frame) repetitions that simply replace a lost packet by, for example, a previously successfully transmitted packet, and tend to decrease the quality of the signal rather than to increase it.A more general description of the received speech quality is given by comparing the input signal with a pseudo reference signal generated by a speech enhancer.7.2 Analysis of strong additional noiseThe noise analysis calculates different characteristics of noise. Based on two key parameters, the decision will be made if additional noise is the main degradation. If additional noise is detected as the main degradation class, a decision is made for the type of noise. Either it is static and present over all the signal (at least during speech activity) such that the noise power is not correlated with the speech signal, or the noise power shows dependencies on the signal power envelope.If there was noise found that is likely to be static, several detectors try to quantify the amount of noise 'locally' and 'globally'. The expression 'local' noise as it is used here, describes the signal parts found especially between phonemes, whereas 'global' noise was defined as the signal between utterances such as sentences. Distinguishing between those noise types is important as, for example, in mobile communications often different settings for speech active parts and non-active parts are applied, e.g., introduction of comfort noise.7.3 Interruptions, mutes and time clippingMutes and interruptions also form a separate distortion class. Such distortions can only partly be described by outcomes of the vocal tract investigation. Hence, a separate analysis is made to detect and to rate time clippings and unnatural mutes in the signal.Signal interruption can occur in two variants i.e., as temporal speech clipping or speech interruption. Both lead to a loss of signal information.Temporal clipping may occur when voice activity detection is used, when DCME is used or the signal becomes interrupted. This clipping is an annoying phenomenon that cuts off a bit of speech in the instant it takes for the transmitter to detect presence of speech. It is possible to detect the interruptions of the speech signal, which occur during the active speech intervals. The algorithms used in P.563 are able to distinguish between normal word ends and abnormal signal interruptions as well as unnatural silence intervals in a speech utterance.classification7.4 DistortionTable 4 gives an overview for all calculated signal parameters. The key parameters that are used for classification of the main distortions are highlighted in grey.The table also reflects the structure of the rest of this Recommendation. The main columns are in line with the sections or the corresponding main distortion classes respectively.____________________2Robotization is caused by a voice signal that contains too much periodicity.All the names used for the parameters can be found and are explained in later sections of this Recommendation as well as in the example source code.Table 4/P.563 – Overview for all used signal parameters in P.563 Unnatural speech Noise analysis Basic speechdescriptorsVocal tract analysis Speech statistics Static SNR Segmental SNR Interruptions/ mutes PitchAverageRobotization LPCcurt SNR EstSegSNR SpeechInterrup-tions SpeechSectionLevelVar ConsistentArt Tracker LPCskew EstBGNoise SpecLevel DevSharpDeclines SpeechLevel VTPMaxTube SectionLPCskewAbs NoiseLevelSpecLevel Range MuteLength LocalLevelVar FinalVtpAverage CepCurt HiFreqVar RelNoise Floor UnnaturalSilenceVTPPeakTracker CepSkew SpectralClarity UnnaturalSilenceMeanArtAverage CepADev GlobalBGNoise UnnaturalSilenceTotEnergyVtpVadOverlap GlobalBGNoiseTotEnergyPitchCrossCorrl Offset GlobalBGNoiseRelEnergyPitchCrossPower GlobalBGNoise AffectedSamplesBasicVoiceQuality LocalBGNoise LogBasicVoiceQualityAsymLocalBGNoise Mean BasicVoiceQualitySym LocalBGNoise StddevFrameRepeats LocalBGNoise FrameRepeats TotEnergy LocalBGNoise AffectedSamplesUnnaturalBeeps UnnaturalBeeps MeanUnnaturalBeeps AffectedSamples8 Comparison between objective and subjective scoresSubjective votes are influenced by many factors such as the preferences of individual subjects and the context (the other conditions) of the experiment. Thus, a regression process is necessary before a direct comparison can be made. The regression must be monotonic so that information is preserved, and it is normally used to map the objective P.563 score onto the subjective score. Agood objective quality measure should have a high correlation with many different subjective experiments if this regression is performed separately for each one and, in practice, with P.563, the regression mapping is often almost linear, using a MOS-like scale.A preferred regression method for calculating the correlation between the P.563 score and subjective MOS, which was used in the validation of P.563, uses a 3rd-order polynomial constrained to be monotonic. This calculation is performed on a per study basis. In most cases, condition MOS is the chosen performance metric, so the regression should be performed between condition MOS and condition-averaged P.563 scores. A condition should use at least four different speech samples. The result of the regression is a set of objective MOS scores for that test. In order to be able to compare objective and subjective scores the subjective MOS scores should be derived from a listening test that is carried out according to ITU-T Rec. P.830.8.1 Correlation coefficientThe closeness of the fit between P.563 and the subjective scores may be measured by calculating the correlation coefficient. Normally, this is performed on condition-averaged scores, after mapping the objective to the subjective scores. The correlation coefficient is calculated with Pearson's formula:()()()()∑∑∑−−−−=22y y x x y y x x r i i i iIn this formula, x i is the subjective condition MOS for condition i , and x is the average over the subjective condition MOS values, x i . y i is the mapped condition-averaged P.563 score for condition i , and y is the average over the condition-averaged P.563 MOS values y i .For 24 known ITU benchmark experiments, the average correlation was 0.88. For an agreed set of six experiments used in the independent validation, experiments that were unknown during the development of P.563, the average correlation was 0.90.9 High level description of the functional blocks used in P.563This clause explains the functional blocks used in P.563 and shown in Figure 2.9.1Description of basic speech descriptors and the signal pre-processing 9.1.1 Voice activity detectionThe Voice Activity Detection (VAD) algorithm is based on an adaptive power threshold, using an iterative approach. Envelope frames above this threshold are classified as speech, and below as noise.。

jstd035声学扫描

jstd035声学扫描

JOINT INDUSTRY STANDARDAcoustic Microscopy for Non-HermeticEncapsulatedElectronicComponents IPC/JEDEC J-STD-035APRIL1999Supersedes IPC-SM-786 Supersedes IPC-TM-650,2.6.22Notice EIA/JEDEC and IPC Standards and Publications are designed to serve thepublic interest through eliminating misunderstandings between manufacturersand purchasers,facilitating interchangeability and improvement of products,and assisting the purchaser in selecting and obtaining with minimum delaythe proper product for his particular need.Existence of such Standards andPublications shall not in any respect preclude any member or nonmember ofEIA/JEDEC or IPC from manufacturing or selling products not conformingto such Standards and Publications,nor shall the existence of such Standardsand Publications preclude their voluntary use by those other than EIA/JEDECand IPC members,whether the standard is to be used either domestically orinternationally.Recommended Standards and Publications are adopted by EIA/JEDEC andIPC without regard to whether their adoption may involve patents on articles,materials,or processes.By such action,EIA/JEDEC and IPC do not assumeany liability to any patent owner,nor do they assume any obligation whateverto parties adopting the Recommended Standard or ers are alsowholly responsible for protecting themselves against all claims of liabilities forpatent infringement.The material in this joint standard was developed by the EIA/JEDEC JC-14.1Committee on Reliability Test Methods for Packaged Devices and the IPCPlastic Chip Carrier Cracking Task Group(B-10a)The J-STD-035supersedes IPC-TM-650,Test Method2.6.22.For Technical Information Contact:Electronic Industries Alliance/ JEDEC(Joint Electron Device Engineering Council)2500Wilson Boulevard Arlington,V A22201Phone(703)907-7560Fax(703)907-7501IPC2215Sanders Road Northbrook,IL60062-6135 Phone(847)509-9700Fax(847)509-9798Please use the Standard Improvement Form shown at the end of thisdocument.©Copyright1999.The Electronic Industries Alliance,Arlington,Virginia,and IPC,Northbrook,Illinois.All rights reserved under both international and Pan-American copyright conventions.Any copying,scanning or other reproduction of these materials without the prior written consent of the copyright holder is strictly prohibited and constitutes infringement under the Copyright Law of the United States.IPC/JEDEC J-STD-035Acoustic Microscopyfor Non-Hermetic EncapsulatedElectronicComponentsA joint standard developed by the EIA/JEDEC JC-14.1Committee on Reliability Test Methods for Packaged Devices and the B-10a Plastic Chip Carrier Cracking Task Group of IPCUsers of this standard are encouraged to participate in the development of future revisions.Contact:EIA/JEDEC Engineering Department 2500Wilson Boulevard Arlington,V A22201 Phone(703)907-7500 Fax(703)907-7501IPC2215Sanders Road Northbrook,IL60062-6135 Phone(847)509-9700Fax(847)509-9798ASSOCIATION CONNECTINGELECTRONICS INDUSTRIESAcknowledgmentMembers of the Joint IPC-EIA/JEDEC Moisture Classification Task Group have worked to develop this document.We would like to thank them for their dedication to this effort.Any Standard involving a complex technology draws material from a vast number of sources.While the principal members of the Joint Moisture Classification Working Group are shown below,it is not possible to include all of those who assisted in the evolution of this Standard.To each of them,the mem-bers of the EIA/JEDEC and IPC extend their gratitude.IPC Packaged Electronic Components Committee ChairmanMartin FreedmanAMP,Inc.IPC Plastic Chip Carrier Cracking Task Group,B-10a ChairmanSteven MartellSonoscan,Inc.EIA/JEDEC JC14.1CommitteeChairmanJack McCullenIntel Corp.EIA/JEDEC JC14ChairmanNick LycoudesMotorolaJoint Working Group MembersCharlie Baker,TIChristopher Brigham,Hi/FnRalph Carbone,Hewlett Packard Co. Don Denton,TIMatt Dotty,AmkorMichele J.DiFranza,The Mitre Corp. Leo Feinstein,Allegro Microsystems Inc.Barry Fernelius,Hewlett Packard Co. Chris Fortunko,National Institute of StandardsRobert J.Gregory,CAE Electronics, Inc.Curtis Grosskopf,IBM Corp.Bill Guthrie,IBM Corp.Phil Johnson,Philips Semiconductors Nick Lycoudes,MotorolaSteven R.Martell,Sonoscan Inc. Jack McCullen,Intel Corp.Tom Moore,TIDavid Nicol,Lucent Technologies Inc.Pramod Patel,Advanced Micro Devices Inc.Ramon R.Reglos,XilinxCorazon Reglos,AdaptecGerald Servais,Delphi Delco Electronics SystemsRichard Shook,Lucent Technologies Inc.E.Lon Smith,Lucent Technologies Inc.Randy Walberg,NationalSemiconductor Corp.Charlie Wu,AdaptecEdward Masami Aoki,HewlettPackard LaboratoriesFonda B.Wu,Raytheon Systems Co.Richard W.Boerdner,EJE ResearchVictor J.Brzozowski,NorthropGrumman ES&SDMacushla Chen,Wus Printed CircuitCo.Ltd.Jeffrey C.Colish,Northrop GrummanCorp.Samuel J.Croce,Litton AeroProducts DivisionDerek D-Andrade,Surface MountTechnology CentreRao B.Dayaneni,Hewlett PackardLaboratoriesRodney Dehne,OEM WorldwideJames F.Maguire,Boeing Defense&Space GroupKim Finch,Boeing Defense&SpaceGroupAlelie Funcell,Xilinx Inc.Constantino J.Gonzalez,ACMEMunir Haq,Advanced Micro DevicesInc.Larry A.Hargreaves,DC.ScientificInc.John T.Hoback,Amoco ChemicalCo.Terence Kern,Axiom Electronics Inc.Connie M.Korth,K-Byte/HibbingManufacturingGabriele Marcantonio,NORTELCharles Martin,Hewlett PackardLaboratoriesRichard W.Max,Alcatel NetworkSystems Inc.Patrick McCluskey,University ofMarylandJames H.Moffitt,Moffitt ConsultingServicesRobert Mulligan,Motorola Inc.James E.Mumby,CibaJohn Northrup,Lockheed MartinCorp.Dominique K.Numakura,LitchfieldPrecision ComponentsNitin B.Parekh,Unisys Corp.Bella Poborets,Lucent TechnologiesInc.D.Elaine Pope,Intel Corp.Ray Prasad,Ray Prasad ConsultancyGroupAlbert Puah,Adaptec Inc.William Sepp,Technic Inc.Ralph W.Taylor,Lockheed MartinCorp.Ed R.Tidwell,DSC CommunicationsCorp.Nick Virmani,Naval Research LabKen Warren,Corlund ElectronicsCorp.Yulia B.Zaks,Lucent TechnologiesInc.IPC/JEDEC J-STD-035April1999 iiTable of Contents1SCOPE (1)2DEFINITIONS (1)2.1A-mode (1)2.2B-mode (1)2.3Back-Side Substrate View Area (1)2.4C-mode (1)2.5Through Transmission Mode (2)2.6Die Attach View Area (2)2.7Die Surface View Area (2)2.8Focal Length(FL) (2)2.9Focus Plane (2)2.10Leadframe(L/F)View Area (2)2.11Reflective Acoustic Microscope (2)2.12Through Transmission Acoustic Microscope (2)2.13Time-of-Flight(TOF) (3)2.14Top-Side Die Attach Substrate View Area (3)3APPARATUS (3)3.1Reflective Acoustic Microscope System (3)3.2Through Transmission AcousticMicroscope System (4)4PROCEDURE (4)4.1Equipment Setup (4)4.2Perform Acoustic Scans..........................................4Appendix A Acoustic Microscopy Defect CheckSheet (6)Appendix B Potential Image Pitfalls (9)Appendix C Some Limitations of AcousticMicroscopy (10)Appendix D Reference Procedure for PresentingApplicable Scanned Data (11)FiguresFigure1Example of A-mode Display (1)Figure2Example of B-mode Display (1)Figure3Example of C-mode Display (2)Figure4Example of Through Transmission Display (2)Figure5Diagram of a Reflective Acoustic MicroscopeSystem (3)Figure6Diagram of a Through Transmission AcousticMicroscope System (3)April1999IPC/JEDEC J-STD-035iiiIPC/JEDEC J-STD-035April1999This Page Intentionally Left BlankivApril1999IPC/JEDEC J-STD-035 Acoustic Microscopy for Non-Hermetic EncapsulatedElectronic Components1SCOPEThis test method defines the procedures for performing acoustic microscopy on non-hermetic encapsulated electronic com-ponents.This method provides users with an acoustic microscopy processflow for detecting defects non-destructively in plastic packages while achieving reproducibility.2DEFINITIONS2.1A-mode Acoustic data collected at the smallest X-Y-Z region defined by the limitations of the given acoustic micro-scope.An A-mode display contains amplitude and phase/polarity information as a function of time offlight at a single point in the X-Y plane.See Figure1-Example of A-mode Display.IPC-035-1 Figure1Example of A-mode Display2.2B-mode Acoustic data collected along an X-Z or Y-Z plane versus depth using a reflective acoustic microscope.A B-mode scan contains amplitude and phase/polarity information as a function of time offlight at each point along the scan line.A B-mode scan furnishes a two-dimensional(cross-sectional)description along a scan line(X or Y).See Figure2-Example of B-mode Display.IPC-035-2 Figure2Example of B-mode Display(bottom half of picture on left)2.3Back-Side Substrate View Area(Refer to Appendix A,Type IV)The interface between the encapsulant and the back of the substrate within the outer edges of the substrate surface.2.4C-mode Acoustic data collected in an X-Y plane at depth(Z)using a reflective acoustic microscope.A C-mode scan contains amplitude and phase/polarity information at each point in the scan plane.A C-mode scan furnishes a two-dimensional(area)image of echoes arising from reflections at a particular depth(Z).See Figure3-Example of C-mode Display.1IPC/JEDEC J-STD-035April1999IPC-035-3 Figure3Example of C-mode Display2.5Through Transmission Mode Acoustic data collected in an X-Y plane throughout the depth(Z)using a through trans-mission acoustic microscope.A Through Transmission mode scan contains only amplitude information at each point in the scan plane.A Through Transmission scan furnishes a two-dimensional(area)image of transmitted ultrasound through the complete thickness/depth(Z)of the sample/component.See Figure4-Example of Through Transmission Display.IPC-035-4 Figure4Example of Through Transmission Display2.6Die Attach View Area(Refer to Appendix A,Type II)The interface between the die and the die attach adhesive and/or the die attach adhesive and the die attach substrate.2.7Die Surface View Area(Refer to Appendix A,Type I)The interface between the encapsulant and the active side of the die.2.8Focal Length(FL)The distance in water at which a transducer’s spot size is at a minimum.2.9Focus Plane The X-Y plane at a depth(Z),which the amplitude of the acoustic signal is maximized.2.10Leadframe(L/F)View Area(Refer to Appendix A,Type V)The imaged area which extends from the outer L/F edges of the package to the L/F‘‘tips’’(wedge bond/stitch bond region of the innermost portion of the L/F.)2.11Reflective Acoustic Microscope An acoustic microscope that uses one transducer as both the pulser and receiver. (This is also known as a pulse/echo system.)See Figure5-Diagram of a Reflective Acoustic Microscope System.2.12Through Transmission Acoustic Microscope An acoustic microscope that transmits ultrasound completely through the sample from a sending transducer to a receiver on the opposite side.See Figure6-Diagram of a Through Transmis-sion Acoustic Microscope System.2April1999IPC/JEDEC J-STD-0353IPC/JEDEC J-STD-035April1999 3.1.6A broad band acoustic transducer with a center frequency in the range of10to200MHz for subsurface imaging.3.2Through Transmission Acoustic Microscope System(see Figure6)comprised of:3.2.1Items3.1.1to3.1.6above3.2.2Ultrasonic pulser(can be a pulser/receiver as in3.1.1)3.2.3Separate receiving transducer or ultrasonic detection system3.3Reference packages or standards,including packages with delamination and packages without delamination,for use during equipment setup.3.4Sample holder for pre-positioning samples.The holder should keep the samples from moving during the scan and maintain planarity.4PROCEDUREThis procedure is generic to all acoustic microscopes.For operational details related to this procedure that apply to a spe-cific model of acoustic microscope,consult the manufacturer’s operational manual.4.1Equipment Setup4.1.1Select the transducer with the highest useable ultrasonic frequency,subject to the limitations imposed by the media thickness and acoustic characteristics,package configuration,and transducer availability,to analyze the interfaces of inter-est.The transducer selected should have a low enough frequency to provide a clear signal from the interface of interest.The transducer should have a high enough frequency to delineate the interface of interest.Note:Through transmission mode may require a lower frequency and/or longer focal length than reflective mode.Through transmission is effective for the initial inspection of components to determine if defects are present.4.1.2Verify setup with the reference packages or standards(see3.3above)and settings that are appropriate for the trans-ducer chosen in4.1.1to ensure that the critical parameters at the interface of interest correlate to the reference standard uti-lized.4.1.3Place units in the sample holder in the coupling medium such that the upper surface of each unit is parallel with the scanning plane of the acoustic transducer.Sweep air bubbles away from the unit surface and from the bottom of the trans-ducer head.4.1.4At afixed distance(Z),align the transducer and/or stage for the maximum reflected amplitude from the top surface of the sample.The transducer must be perpendicular to the sample surface.4.1.5Focus by maximizing the amplitude,in the A-mode display,of the reflection from the interface designated for imag-ing.This is done by adjusting the Z-axis distance between the transducer and the sample.4.2Perform Acoustic Scans4.2.1Inspect the acoustic image(s)for any anomalies,verify that the anomaly is a package defect or an artifact of the imaging process,and record the results.(See Appendix A for an example of a check sheet that may be used.)To determine if an anomaly is a package defect or an artifact of the imaging process it is recommended to analyze the A-mode display at the location of the anomaly.4.2.2Consider potential pitfalls in image interpretation listed in,but not limited to,Appendix B and some of the limita-tions of acoustic microscopy listed in,but not limited to,Appendix C.If necessary,make adjustments to the equipment setup to optimize the results and rescan.4April1999IPC/JEDEC J-STD-035 4.2.3Evaluate the acoustic images using the failure criteria specified in other appropriate documents,such as J-STD-020.4.2.4Record the images and thefinal instrument setup parameters for documentation purposes.An example checklist is shown in Appendix D.5IPC/JEDEC J-STD-035April19996April1999IPC/JEDEC J-STD-035Appendix AAcoustic Microscopy Defect Check Sheet(continued)CIRCUIT SIDE SCANImage File Name/PathDelamination(Type I)Die Circuit Surface/Encapsulant Number Affected:Average%Location:Corner Edge Center (Type II)Die/Die Attach Number Affected:Average%Location:Corner Edge Center (Type III)Encapsulant/Substrate Number Affected:Average%Location:Corner Edge Center (Type V)Interconnect tip Number Affected:Average%Interconnect Number Affected:Max.%Length(Type VI)Intra-Laminate Number Affected:Average%Location:Corner Edge Center Comments:CracksAre cracks present:Yes NoIf yes:Do any cracks intersect:bond wire ball bond wedge bond tab bump tab leadDoes crack extend from leadfinger to any other internal feature:Yes NoDoes crack extend more than two-thirds the distance from any internal feature to the external surfaceof the package:Yes NoAdditional verification required:Yes NoComments:Mold Compound VoidsAre voids present:Yes NoIf yes:Approx.size Location(if multiple voids,use comment section)Do any voids intersect:bond wire ball bond wedge bond tab bump tab lead Additional verification required:Yes NoComments:7IPC/JEDEC J-STD-035April1999Appendix AAcoustic Microscopy Defect Check Sheet(continued)NON-CIRCUIT SIDE SCANImage File Name/PathDelamination(Type IV)Encapsulant/Substrate Number Affected:Average%Location:Corner Edge Center (Type II)Substrate/Die Attach Number Affected:Average%Location:Corner Edge Center (Type V)Interconnect Number Affected:Max.%LengthLocation:Corner Edge Center (Type VI)Intra-Laminate Number Affected:Average%Location:Corner Edge Center (Type VII)Heat Spreader Number Affected:Average%Location:Corner Edge Center Additional verification required:Yes NoComments:CracksAre cracks present:Yes NoIf yes:Does crack extend more than two-thirds the distance from any internal feature to the external surfaceof the package:Yes NoAdditional verification required:Yes NoComments:Mold Compound VoidsAre voids present:Yes NoIf yes:Approx.size Location(if multiple voids,use comment section)Additional verification required:Yes NoComments:8Appendix BPotential Image PitfallsOBSERV ATIONS CAUSES/COMMENTSUnexplained loss of front surface signal Gain setting too lowSymbolization on package surfaceEjector pin knockoutsPin1and other mold marksDust,air bubbles,fingerprints,residueScratches,scribe marks,pencil marksCambered package edgeUnexplained loss of subsurface signal Gain setting too lowTransducer frequency too highAcoustically absorbent(rubbery)fillerLarge mold compound voidsPorosity/high concentration of small voidsAngled cracks in package‘‘Dark line boundary’’(phase cancellation)Burned molding compound(ESD/EOS damage)False or spotty indication of delamination Low acoustic impedance coating(polyimide,gel)Focus errorIncorrect delamination gate setupMultilayer interference effectsFalse indication of adhesion Gain set too high(saturation)Incorrect delamination gate setupFocus errorOverlap of front surface and subsurface echoes(transducerfrequency too low)Fluidfilling delamination areasApparent voiding around die edge Reflection from wire loopsIncorrect setting of void gateGraded intensity Die tilt or lead frame deformation Sample tiltApril1999IPC/JEDEC J-STD-0359Appendix CSome Limitations of Acoustic MicroscopyAcoustic microscopy is an analytical technique that provides a non-destructive method for examining plastic encapsulated components for the existence of delaminations,cracks,and voids.This technique has limitations that include the following: LIMITATION REASONAcoustic microscopy has difficulty infinding small defects if the package is too thick.The ultrasonic signal becomes more attenuated as a function of two factors:the depth into the package and the transducer fre-quency.The greater the depth,the greater the attenuation.Simi-larly,the higher the transducer frequency,the greater the attenu-ation as a function of depth.There are limitations on the Z-axis(axial)resolu-tion.This is a function of the transducer frequency.The higher the transducer frequency,the better the resolution.However,the higher frequency signal becomes attenuated more quickly as a function of depth.There are limitations on the X-Y(lateral)resolu-tion.The X-Y(lateral)resolution is a function of a number of differ-ent variables including:•Transducer characteristics,including frequency,element diam-eter,and focal length•Absorption and scattering of acoustic waves as a function of the sample material•Electromechanical properties of the X-Y stageIrregularly shaped packages are difficult to analyze.The technique requires some kind offlat reference surface.Typically,the upper surface of the package or the die surfacecan be used as references.In some packages,cambered packageedges can cause difficulty in analyzing defects near the edgesand below their surfaces.Edge Effect The edges cause difficulty in analyzing defects near the edge ofany internal features.IPC/JEDEC J-STD-035April1999 10April1999IPC/JEDEC J-STD-035Appendix DReference Procedure for Presenting Applicable Scanned DataMost of the settings described may be captured as a default for the particular supplier/product with specific changes recorded on a sample or lot basis.Setup Configuration(Digital Setup File Name and Contents)Calibration Procedure and Calibration/Reference Standards usedTransducerManufacturerModelCenter frequencySerial numberElement diameterFocal length in waterScan SetupScan area(X-Y dimensions)Scan step sizeHorizontalVerticalDisplayed resolutionHorizontalVerticalScan speedPulser/Receiver SettingsGainBandwidthPulseEnergyRepetition rateReceiver attenuationDampingFilterEcho amplitudePulse Analyzer SettingsFront surface gate delay relative to trigger pulseSubsurface gate(if used)High passfilterDetection threshold for positive oscillation,negative oscillationA/D settingsSampling rateOffset settingPer Sample SettingsSample orientation(top or bottom(flipped)view and location of pin1or some other distinguishing characteristic) Focus(point,depth,interface)Reference planeNon-default parametersSample identification information to uniquely distinguish it from others in the same group11IPC/JEDEC J-STD-035April1999Appendix DReference Procedure for Presenting Applicable Scanned Data(continued) Reference Procedure for Presenting Scanned DataImagefile types and namesGray scale and color image legend definitionsSignificance of colorsIndications or definition of delaminationImage dimensionsDepth scale of TOFDeviation from true aspect ratioImage type:A-mode,B-mode,C-mode,TOF,Through TransmissionA-mode waveforms should be provided for points of interest,such as delaminated areas.In addition,an A-mode image should be provided for a bonded area as a control.12Standard Improvement FormIPC/JEDEC J-STD-035The purpose of this form is to provide the Technical Committee of IPC with input from the industry regarding usage of the subject standard.Individuals or companies are invited to submit comments to IPC.All comments will be collected and dispersed to the appropriate committee(s).If you can provide input,please complete this form and return to:IPC2215Sanders RoadNorthbrook,IL 60062-6135Fax 847509.97981.I recommend changes to the following:Requirement,paragraph number Test Method number,paragraph numberThe referenced paragraph number has proven to be:Unclear Too RigidInErrorOther2.Recommendations forcorrection:3.Other suggestions for document improvement:Submitted by:Name Telephone Company E-mailAddress City/State/ZipDate ASSOCIATION CONNECTING ELECTRONICS INDUSTRIESASSOCIATION CONNECTINGELECTRONICS INDUSTRIESISBN#1-580982-28-X2215 Sanders Road, Northbrook, IL 60062-6135Tel. 847.509.9700 Fax 847.509.9798。

From Data Mining to Knowledge Discovery in Databases

From Data Mining to Knowledge Discovery in Databases

s Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. What is all the excitement about?This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges in-volved in real-world applications of knowledge discovery, and current and future research direc-tions in the field.A cross a wide variety of fields, data arebeing collected and accumulated at adramatic pace. There is an urgent need for a new generation of computational theo-ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).At an abstract level, the KDD field is con-cerned with the development of methods and techniques for making sense of data. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easi-ly) into other forms that might be more com-pact (for example, a short report), more ab-stract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for exam-ple, a predictive model for estimating the val-ue of future cases). At the core of the process is the application of specific data-mining meth-ods for pattern discovery and extraction.1This article begins by discussing the histori-cal context of KDD and data mining and theirintersection with other related fields. A briefsummary of recent KDD real-world applica-tions is provided. Definitions of KDD and da-ta mining are provided, and the general mul-tistep KDD process is outlined. This multistepprocess has the application of data-mining al-gorithms as one particular step in the process.The data-mining step is discussed in more de-tail in the context of specific data-mining al-gorithms and their application. Real-worldpractical application issues are also outlined.Finally, the article enumerates challenges forfuture research and development and in par-ticular discusses potential opportunities for AItechnology in KDD systems.Why Do We Need KDD?The traditional method of turning data intoknowledge relies on manual analysis and in-terpretation. For example, in the health-careindustry, it is common for specialists to peri-odically analyze current trends and changesin health-care data, say, on a quarterly basis.The specialists then provide a report detailingthe analysis to the sponsoring health-care or-ganization; this report becomes the basis forfuture decision making and planning forhealth-care management. In a totally differ-ent type of application, planetary geologistssift through remotely sensed images of plan-ets and asteroids, carefully locating and cata-loging such geologic objects of interest as im-pact craters. Be it science, marketing, finance,health care, retail, or any other field, the clas-sical approach to data analysis relies funda-mentally on one or more analysts becomingArticlesFALL 1996 37From Data Mining to Knowledge Discovery inDatabasesUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00areas is astronomy. Here, a notable success was achieved by SKICAT ,a system used by as-tronomers to perform image analysis,classification, and cataloging of sky objects from sky-survey images (Fayyad, Djorgovski,and Weir 1996). In its first application, the system was used to process the 3 terabytes (1012bytes) of image data resulting from the Second Palomar Observatory Sky Survey,where it is estimated that on the order of 109sky objects are detectable. SKICAT can outper-form humans and traditional computational techniques in classifying faint sky objects. See Fayyad, Haussler, and Stolorz (1996) for a sur-vey of scientific applications.In business, main KDD application areas includes marketing, finance (especially in-vestment), fraud detection, manufacturing,telecommunications, and Internet agents.Marketing:In marketing, the primary ap-plication is database marketing systems,which analyze customer databases to identify different customer groups and forecast their behavior. Business Week (Berry 1994) estimat-ed that over half of all retailers are using or planning to use database marketing, and those who do use it have good results; for ex-ample, American Express reports a 10- to 15-percent increase in credit-card use. Another notable marketing application is market-bas-ket analysis (Agrawal et al. 1996) systems,which find patterns such as, “If customer bought X, he/she is also likely to buy Y and Z.” Such patterns are valuable to retailers.Investment: Numerous companies use da-ta mining for investment, but most do not describe their systems. One exception is LBS Capital Management. Its system uses expert systems, neural nets, and genetic algorithms to manage portfolios totaling $600 million;since its start in 1993, the system has outper-formed the broad stock market (Hall, Mani,and Barr 1996).Fraud detection: HNC Falcon and Nestor PRISM systems are used for monitoring credit-card fraud, watching over millions of ac-counts. The FAIS system (Senator et al. 1995),from the U.S. Treasury Financial Crimes En-forcement Network, is used to identify finan-cial transactions that might indicate money-laundering activity.Manufacturing: The CASSIOPEE trou-bleshooting system, developed as part of a joint venture between General Electric and SNECMA, was applied by three major Euro-pean airlines to diagnose and predict prob-lems for the Boeing 737. To derive families of faults, clustering methods are used. CASSIOPEE received the European first prize for innova-intimately familiar with the data and serving as an interface between the data and the users and products.For these (and many other) applications,this form of manual probing of a data set is slow, expensive, and highly subjective. In fact, as data volumes grow dramatically, this type of manual data analysis is becoming completely impractical in many domains.Databases are increasing in size in two ways:(1) the number N of records or objects in the database and (2) the number d of fields or at-tributes to an object. Databases containing on the order of N = 109objects are becoming in-creasingly common, for example, in the as-tronomical sciences. Similarly, the number of fields d can easily be on the order of 102or even 103, for example, in medical diagnostic applications. Who could be expected to di-gest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially.The need to scale up human analysis capa-bilities to handling the large number of bytes that we can collect is both economic and sci-entific. Businesses use data to gain competi-tive advantage, increase efficiency, and pro-vide more valuable services to customers.Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Be-cause computers have enabled humans to gather more data than we can digest, it is on-ly natural to turn to computational tech-niques to help us unearth meaningful pat-terns and structures from the massive volumes of data. Hence, KDD is an attempt to address a problem that the digital informa-tion era made a fact of life for all of us: data overload.Data Mining and Knowledge Discovery in the Real WorldA large degree of the current interest in KDD is the result of the media interest surrounding successful KDD applications, for example, the focus articles within the last two years in Business Week , Newsweek , Byte , PC Week , and other large-circulation periodicals. Unfortu-nately, it is not always easy to separate fact from media hype. Nonetheless, several well-documented examples of successful systems can rightly be referred to as KDD applications and have been deployed in operational use on large-scale real-world problems in science and in business.In science, one of the primary applicationThere is an urgent need for a new generation of computation-al theories and tools toassist humans in extractinguseful information (knowledge)from the rapidly growing volumes ofdigital data.Articles38AI MAGAZINEtive applications (Manago and Auriol 1996).Telecommunications: The telecommuni-cations alarm-sequence analyzer (TASA) wasbuilt in cooperation with a manufacturer oftelecommunications equipment and threetelephone networks (Mannila, Toivonen, andVerkamo 1995). The system uses a novelframework for locating frequently occurringalarm episodes from the alarm stream andpresenting them as rules. Large sets of discov-ered rules can be explored with flexible infor-mation-retrieval tools supporting interactivityand iteration. In this way, TASA offers pruning,grouping, and ordering tools to refine the re-sults of a basic brute-force search for rules.Data cleaning: The MERGE-PURGE systemwas applied to the identification of duplicatewelfare claims (Hernandez and Stolfo 1995).It was used successfully on data from the Wel-fare Department of the State of Washington.In other areas, a well-publicized system isIBM’s ADVANCED SCOUT,a specialized data-min-ing system that helps National Basketball As-sociation (NBA) coaches organize and inter-pret data from NBA games (U.S. News 1995). ADVANCED SCOUT was used by several of the NBA teams in 1996, including the Seattle Su-personics, which reached the NBA finals.Finally, a novel and increasingly importanttype of discovery is one based on the use of in-telligent agents to navigate through an infor-mation-rich environment. Although the ideaof active triggers has long been analyzed in thedatabase field, really successful applications ofthis idea appeared only with the advent of theInternet. These systems ask the user to specifya profile of interest and search for related in-formation among a wide variety of public-do-main and proprietary sources. For example, FIREFLY is a personal music-recommendation agent: It asks a user his/her opinion of several music pieces and then suggests other music that the user might like (<http:// www.ffl/>). CRAYON(/>) allows users to create their own free newspaper (supported by ads); NEWSHOUND(<http://www. /hound/>) from the San Jose Mercury News and FARCAST(</> automatically search information from a wide variety of sources, including newspapers and wire services, and e-mail rele-vant documents directly to the user.These are just a few of the numerous suchsystems that use KDD techniques to automat-ically produce useful information from largemasses of raw data. See Piatetsky-Shapiro etal. (1996) for an overview of issues in devel-oping industrial KDD applications.Data Mining and KDDHistorically, the notion of finding useful pat-terns in data has been given a variety ofnames, including data mining, knowledge ex-traction, information discovery, informationharvesting, data archaeology, and data patternprocessing. The term data mining has mostlybeen used by statisticians, data analysts, andthe management information systems (MIS)communities. It has also gained popularity inthe database field. The phrase knowledge dis-covery in databases was coined at the first KDDworkshop in 1989 (Piatetsky-Shapiro 1991) toemphasize that knowledge is the end productof a data-driven discovery. It has been popular-ized in the AI and machine-learning fields.In our view, KDD refers to the overall pro-cess of discovering useful knowledge from da-ta, and data mining refers to a particular stepin this process. Data mining is the applicationof specific algorithms for extracting patternsfrom data. The distinction between the KDDprocess and the data-mining step (within theprocess) is a central point of this article. Theadditional steps in the KDD process, such asdata preparation, data selection, data cleaning,incorporation of appropriate prior knowledge,and proper interpretation of the results ofmining, are essential to ensure that usefulknowledge is derived from the data. Blind ap-plication of data-mining methods (rightly crit-icized as data dredging in the statistical litera-ture) can be a dangerous activity, easilyleading to the discovery of meaningless andinvalid patterns.The Interdisciplinary Nature of KDDKDD has evolved, and continues to evolve,from the intersection of research fields such asmachine learning, pattern recognition,databases, statistics, AI, knowledge acquisitionfor expert systems, data visualization, andhigh-performance computing. The unifyinggoal is extracting high-level knowledge fromlow-level data in the context of large data sets.The data-mining component of KDD cur-rently relies heavily on known techniquesfrom machine learning, pattern recognition,and statistics to find patterns from data in thedata-mining step of the KDD process. A natu-ral question is, How is KDD different from pat-tern recognition or machine learning (and re-lated fields)? The answer is that these fieldsprovide some of the data-mining methodsthat are used in the data-mining step of theKDD process. KDD focuses on the overall pro-cess of knowledge discovery from data, includ-ing how the data are stored and accessed, howalgorithms can be scaled to massive data setsThe basicproblemaddressed bythe KDDprocess isone ofmappinglow-leveldata intoother formsthat might bemorecompact,moreabstract,or moreuseful.ArticlesFALL 1996 39A driving force behind KDD is the database field (the second D in KDD). Indeed, the problem of effective data manipulation when data cannot fit in the main memory is of fun-damental importance to KDD. Database tech-niques for gaining efficient data access,grouping and ordering operations when ac-cessing data, and optimizing queries consti-tute the basics for scaling algorithms to larger data sets. Most data-mining algorithms from statistics, pattern recognition, and machine learning assume data are in the main memo-ry and pay no attention to how the algorithm breaks down if only limited views of the data are possible.A related field evolving from databases is data warehousing,which refers to the popular business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. Data warehousing helps set the stage for KDD in two important ways: (1) data cleaning and (2)data access.Data cleaning: As organizations are forced to think about a unified logical view of the wide variety of data and databases they pos-sess, they have to address the issues of map-ping data to a single naming convention,uniformly representing and handling missing data, and handling noise and errors when possible.Data access: Uniform and well-defined methods must be created for accessing the da-ta and providing access paths to data that were historically difficult to get to (for exam-ple, stored offline).Once organizations and individuals have solved the problem of how to store and ac-cess their data, the natural next step is the question, What else do we do with all the da-ta? This is where opportunities for KDD natu-rally arise.A popular approach for analysis of data warehouses is called online analytical processing (OLAP), named for a set of principles pro-posed by Codd (1993). OLAP tools focus on providing multidimensional data analysis,which is superior to SQL in computing sum-maries and breakdowns along many dimen-sions. OLAP tools are targeted toward simpli-fying and supporting interactive data analysis,but the goal of KDD tools is to automate as much of the process as possible. Thus, KDD is a step beyond what is currently supported by most standard database systems.Basic DefinitionsKDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimate-and still run efficiently, how results can be in-terpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any one particular discipline such as machine learning. In this context, there are clear opportunities for other fields of AI (be-sides machine learning) to contribute to KDD. KDD places a special emphasis on find-ing understandable patterns that can be inter-preted as useful or interesting knowledge.Thus, for example, neural networks, although a powerful modeling tool, are relatively difficult to understand compared to decision trees. KDD also emphasizes scaling and ro-bustness properties of modeling algorithms for large noisy data sets.Related AI research fields include machine discovery, which targets the discovery of em-pirical laws from observation and experimen-tation (Shrager and Langley 1990) (see Kloes-gen and Zytkow [1996] for a glossary of terms common to KDD and machine discovery),and causal modeling for the inference of causal models from data (Spirtes, Glymour,and Scheines 1993). Statistics in particular has much in common with KDD (see Elder and Pregibon [1996] and Glymour et al.[1996] for a more detailed discussion of this synergy). Knowledge discovery from data is fundamentally a statistical endeavor. Statistics provides a language and framework for quan-tifying the uncertainty that results when one tries to infer general patterns from a particu-lar sample of an overall population. As men-tioned earlier, the term data mining has had negative connotations in statistics since the 1960s when computer-based data analysis techniques were first introduced. The concern arose because if one searches long enough in any data set (even randomly generated data),one can find patterns that appear to be statis-tically significant but, in fact, are not. Clearly,this issue is of fundamental importance to KDD. Substantial progress has been made in recent years in understanding such issues in statistics. Much of this work is of direct rele-vance to KDD. Thus, data mining is a legiti-mate activity as long as one understands how to do it correctly; data mining carried out poorly (without regard to the statistical as-pects of the problem) is to be avoided. KDD can also be viewed as encompassing a broader view of modeling than statistics. KDD aims to provide tools to automate (to the degree pos-sible) the entire process of data analysis and the statistician’s “art” of hypothesis selection.Data mining is a step in the KDD process that consists of ap-plying data analysis and discovery al-gorithms that produce a par-ticular enu-meration ofpatterns (or models)over the data.Articles40AI MAGAZINEly understandable patterns in data (Fayyad, Piatetsky-Shapiro, and Smyth 1996).Here, data are a set of facts (for example, cases in a database), and pattern is an expres-sion in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; find-ing structure from data; or, in general, mak-ing any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple itera-tions. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the av-erage value of a set of numbers.The discovered patterns should be valid on new data with some degree of certainty. We also want patterns to be novel (at least to the system and preferably to the user) and poten-tially useful, that is, lead to some benefit to the user or task. Finally, the patterns should be understandable, if not immediately then after some postprocessing.The previous discussion implies that we can define quantitative measures for evaluating extracted patterns. In many cases, it is possi-ble to define measures of certainty (for exam-ple, estimated prediction accuracy on new data) or utility (for example, gain, perhaps indollars saved because of better predictions orspeedup in response time of a system). No-tions such as novelty and understandabilityare much more subjective. In certain contexts,understandability can be estimated by sim-plicity (for example, the number of bits to de-scribe a pattern). An important notion, calledinterestingness(for example, see Silberschatzand Tuzhilin [1995] and Piatetsky-Shapiro andMatheus [1994]), is usually taken as an overallmeasure of pattern value, combining validity,novelty, usefulness, and simplicity. Interest-ingness functions can be defined explicitly orcan be manifested implicitly through an or-dering placed by the KDD system on the dis-covered patterns or models.Given these notions, we can consider apattern to be knowledge if it exceeds some in-terestingness threshold, which is by nomeans an attempt to define knowledge in thephilosophical or even the popular view. As amatter of fact, knowledge in this definition ispurely user oriented and domain specific andis determined by whatever functions andthresholds the user chooses.Data mining is a step in the KDD processthat consists of applying data analysis anddiscovery algorithms that, under acceptablecomputational efficiency limitations, pro-duce a particular enumeration of patterns (ormodels) over the data. Note that the space ofArticlesFALL 1996 41Figure 1. An Overview of the Steps That Compose the KDD Process.methods, the effective number of variables under consideration can be reduced, or in-variant representations for the data can be found.Fifth is matching the goals of the KDD pro-cess (step 1) to a particular data-mining method. For example, summarization, clas-sification, regression, clustering, and so on,are described later as well as in Fayyad, Piatet-sky-Shapiro, and Smyth (1996).Sixth is exploratory analysis and model and hypothesis selection: choosing the data-mining algorithm(s) and selecting method(s)to be used for searching for data patterns.This process includes deciding which models and parameters might be appropriate (for ex-ample, models of categorical data are differ-ent than models of vectors over the reals) and matching a particular data-mining method with the overall criteria of the KDD process (for example, the end user might be more in-terested in understanding the model than its predictive capabilities).Seventh is data mining: searching for pat-terns of interest in a particular representa-tional form or a set of such representations,including classification rules or trees, regres-sion, and clustering. The user can significant-ly aid the data-mining method by correctly performing the preceding steps.Eighth is interpreting mined patterns, pos-sibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models.Ninth is acting on the discovered knowl-edge: using the knowledge directly, incorpo-rating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking for and resolving po-tential conflicts with previously believed (or extracted) knowledge.The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (al-though not the potential multitude of itera-tions and loops) is illustrated in figure 1.Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, we now focus on the data-mining component,which has, by far, received the most atten-tion in the literature.patterns is often infinite, and the enumera-tion of patterns involves some form of search in this space. Practical computational constraints place severe limits on the sub-space that can be explored by a data-mining algorithm.The KDD process involves using the database along with any required selection,preprocessing, subsampling, and transforma-tions of it; applying data-mining methods (algorithms) to enumerate patterns from it;and evaluating the products of data mining to identify the subset of the enumerated pat-terns deemed knowledge. The data-mining component of the KDD process is concerned with the algorithmic means by which pat-terns are extracted and enumerated from da-ta. The overall KDD process (figure 1) in-cludes the evaluation and possible interpretation of the mined patterns to de-termine which patterns can be considered new knowledge. The KDD process also in-cludes all the additional steps described in the next section.The notion of an overall user-driven pro-cess is not unique to KDD: analogous propos-als have been put forward both in statistics (Hand 1994) and in machine learning (Brod-ley and Smyth 1996).The KDD ProcessThe KDD process is interactive and iterative,involving numerous steps with many deci-sions made by the user. Brachman and Anand (1996) give a practical view of the KDD pro-cess, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps:First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD process from the customer’s viewpoint.Second is creating a target data set: select-ing a data set, or focusing on a subset of vari-ables or data samples, on which discovery is to be performed.Third is data cleaning and preprocessing.Basic operations include removing noise if appropriate, collecting the necessary informa-tion to model or account for noise, deciding on strategies for handling missing data fields,and accounting for time-sequence informa-tion and known changes.Fourth is data reduction and projection:finding useful features to represent the data depending on the goal of the task. With di-mensionality reduction or transformationArticles42AI MAGAZINEThe Data-Mining Stepof the KDD ProcessThe data-mining component of the KDD pro-cess often involves repeated iterative applica-tion of particular data-mining methods. This section presents an overview of the primary goals of data mining, a description of the methods used to address these goals, and a brief description of the data-mining algo-rithms that incorporate these methods.The knowledge discovery goals are defined by the intended use of the system. We can distinguish two types of goals: (1) verification and (2) discovery. With verification,the sys-tem is limited to verifying the user’s hypothe-sis. With discovery,the system autonomously finds new patterns. We further subdivide the discovery goal into prediction,where the sys-tem finds patterns for predicting the future behavior of some entities, and description, where the system finds patterns for presenta-tion to a user in a human-understandableform. In this article, we are primarily con-cerned with discovery-oriented data mining.Data mining involves fitting models to, or determining patterns from, observed data. The fitted models play the role of inferred knowledge: Whether the models reflect useful or interesting knowledge is part of the over-all, interactive KDD process where subjective human judgment is typically required. Two primary mathematical formalisms are used in model fitting: (1) statistical and (2) logical. The statistical approach allows for nondeter-ministic effects in the model, whereas a logi-cal model is purely deterministic. We focus primarily on the statistical approach to data mining, which tends to be the most widely used basis for practical data-mining applica-tions given the typical presence of uncertain-ty in real-world data-generating processes.Most data-mining methods are based on tried and tested techniques from machine learning, pattern recognition, and statistics: classification, clustering, regression, and so on. The array of different algorithms under each of these headings can often be bewilder-ing to both the novice and the experienced data analyst. It should be emphasized that of the many data-mining methods advertised in the literature, there are really only a few fun-damental techniques. The actual underlying model representation being used by a particu-lar method typically comes from a composi-tion of a small number of well-known op-tions: polynomials, splines, kernel and basis functions, threshold-Boolean functions, and so on. Thus, algorithms tend to differ primar-ily in the goodness-of-fit criterion used toevaluate model fit or in the search methodused to find a good fit.In our brief overview of data-mining meth-ods, we try in particular to convey the notionthat most (if not all) methods can be viewedas extensions or hybrids of a few basic tech-niques and principles. We first discuss the pri-mary methods of data mining and then showthat the data- mining methods can be viewedas consisting of three primary algorithmiccomponents: (1) model representation, (2)model evaluation, and (3) search. In the dis-cussion of KDD and data-mining methods,we use a simple example to make some of thenotions more concrete. Figure 2 shows a sim-ple two-dimensional artificial data set consist-ing of 23 cases. Each point on the graph rep-resents a person who has been given a loanby a particular bank at some time in the past.The horizontal axis represents the income ofthe person; the vertical axis represents the to-tal personal debt of the person (mortgage, carpayments, and so on). The data have beenclassified into two classes: (1) the x’s repre-sent persons who have defaulted on theirloans and (2) the o’s represent persons whoseloans are in good status with the bank. Thus,this simple artificial data set could represent ahistorical data set that can contain usefulknowledge from the point of view of thebank making the loans. Note that in actualKDD applications, there are typically manymore dimensions (as many as several hun-dreds) and many more data points (manythousands or even millions).ArticlesFALL 1996 43Figure 2. A Simple Data Set with Two Classes Used for Illustrative Purposes.。

中英翻译《使用加权滤波器的一种改进的谱减语音增强算法》

中英翻译《使用加权滤波器的一种改进的谱减语音增强算法》

使用加权滤波器的一种改进的谱减语音增强算法摘要在噪声环境,例如飞机座舱、汽车引擎中,语音中或多或少地夹杂着噪声。

为了减少带噪语音中的噪声,我们提出了一种改进型的谱减算法。

这种算法是利用对谱减的过度减法而实现的。

残余噪声能够利用人类听觉系统的掩蔽特性被掩蔽。

为了消除残余的音乐噪声,引入了一种基于心理声学的有用的加权滤波器。

通过仿真发现其增强的语音并未失真,而且音乐噪声也被有效地掩蔽,从而体现了一种更好的性能。

关键词:语音增强;谱减1.引言语音信号中经常伴有环境中的背景噪声。

在一些应用中如:语音命令系统,语音识别,说话者认证,免提系统,背景噪声对语音信号的处理有许多不利的影响。

语音增强技术可以被分为单通道和多通道或多通道增强技术。

单通道语音增强技术的应用情况是只有一个采集通道可用。

谱减语音增强算法是一个众所周知的单通道降噪技术[]2,1。

大多数实现和多种基本技术的运用是在语音谱上减去对噪声谱的估计而得以实现的。

传统的功率谱相减的方法大大减少了带噪语音中的噪声水平。

然而,它也在语音信号中引入了一种被称为音乐噪声的恼人的失真。

在本文中我们运用一种能够更好、更多地抑制噪声的改进的频谱过度减法的方法[]3。

该方法的运用是为了估计纯净语音的功率谱,它是通过从语音功率谱中减去噪声功率谱的过度估计而实现的。

此外,为了在语音失真和噪声消除之间找到最佳的平衡点,一种基于声学心理学的动机谱加权规则被纳入。

通过利用人耳听觉系统的掩蔽特性能够掩蔽现有的残余噪声。

当确定了语音掩蔽阈值的时候,运用一种改进的掩蔽阈值估计来消除噪声的影响。

该方法提供了比传统的功率谱相减法更优越的性能,并能在很大程度上降低音乐噪声。

2.过度谱相减算法该方法的基本假设是把噪声看作是独立的加性噪声。

假设已经被不相关的加性噪声信号()t n降解的语音信号为()t s:()()()t n t s t x += (1)带噪语音信号的短时功率谱近似为:()()()ωωωj j j e N e S e X +≈ (2) 通过用无音期间得到的平均值()2ωj e N 代替噪声的平方幅度值()2ωj e N 得到功率谱相减的估计值为: ()()()222ˆωωωj j j e N e X e S -= (3)在运用了谱减算法之后,由于估计的噪声和有效噪声之间的差异而出现了一种残余噪声。

人声分离算法

人声分离算法

人声分离算法人声分离是一种音频信号处理技术,旨在从混合音频中分离或提取出特定的人声信号。

这项任务通常是在语音处理、音乐处理以及音频增强等领域中应用的重要技术。

以下是一些常见的人声分离算法:1. 基于深度学习的方法:• Deep Clustering:使用深度学习模型,如深度聚类网络(Deep Clustering Network, DCN),学习在频谱域对音频进行聚类,以实现音源分离。

该方法在训练过程中将相似的频谱点聚类在一起,从而使网络能够学到不同音源的表示。

• Deep attractor network (DAN):通过学习音源的吸引子表示,这种方法使得模型能够在频谱上分离不同的音源。

2. 基于短时傅立叶变换(STFT)的方法:• Non-negative Matrix Factorization (NMF):将音频信号表示为非负矩阵的乘积,其中一个矩阵表示基础音源,另一个矩阵表示每个时间点的激活系数。

通过调整这两个矩阵,可以分离出人声信号。

• Independent Component Analysis (ICA):基于统计模型,假设混合信号是独立的非高斯过程,通过最大似然估计方法来分离不同的源信号。

3. 基于时域处理的方法:• Ideal Binary Mask (IBM):通过分析语音和非语音的频谱差异,生成一个二进制掩码,用于选择性地过滤和分离人声信号。

• Phase-sensitive Reconstruction (PSR):基于相位信息的处理,通过在频域上对信号进行修复和重新构建来分离人声。

4. 基于卷积神经网络(CNN)的方法:• U-Net Architecture:基于 U-Net 结构的深度学习模型,通过卷积层和上采样层实现对音频信号的高级特征学习和重建。

请注意,人声分离是一个复杂的问题,其效果受到许多因素的影响,如音频质量、混合信号的复杂性以及算法的设计。

选择合适的方法取决于实际应用的要求和环境。

Method and apparatus for multicast delivery of pro

Method and apparatus for multicast delivery of pro

专利名称:Method and apparatus for multicast deliveryof program information发明人:Petr Peterka,Alexander Medvinsky申请号:US11201675申请日:20050811公开号:US07865723B2公开日:20110104专利内容由知识产权出版社提供专利附图:摘要:Method and apparatus providing program information to client devices for at least one multicast stream of digital content is described. In one embodiment, session description messages for the at least one multicast stream of digital content aregenerated. Each of the session description messages includes at least one content access parameter. The at least one content access parameter may include digital rights management (DRM) data, channel key identification data associated with the at least one channel of the at least one multicast stream of digital content, and/or data indicative of whether each session description message is associated with a channel, a program, or a program segment. Each of the session description messages is signed using a cryptographic key. The session description messages are then multicasted to the client devices using a predefined multicast address.申请人:Petr Peterka,Alexander Medvinsky地址:San Diego CA US,San Diego CA US国籍:US,US代理人:Larry T. Cullen更多信息请下载全文后查看。

鸡尾酒效应录音方案

鸡尾酒效应录音方案

鸡尾酒效应录音方案鸡尾酒效应指的是由于声音反射、混响等原因,导致在录音中出现的杂音和混乱的现象,影响语音识别和语音合成的准确性。

为了避免这种情况的出现,以下是一些针对鸡尾酒效应的录音方案:一、选择合适的录音设备1. 选择高品质的录音设备,避免低廉的设备带来的杂音和噪声。

2. 选择有降噪功能的录音设备,能够有效地降低背景噪声对录音的影响。

二、选好录音场地1. 选择一个静音的场所进行录音,远离喧闹的环境,避免录音时外界声音的干扰。

2. 选择没有反射、混响的空间进行录音,如有需要,可以通过布置软包、吸音棉等材料来减少声音反射和混响。

三、调整录音参数1. 调整麦克风的灵敏度,避免太高或太低的灵敏度带来的杂音和噪声。

2. 调整录音的增益和音量,使录音的声音清晰、自然、平衡。

四、录音前准备1. 消除身上的杂音,避免衣物、手表等物品带来的杂音。

2. 提醒录音人员注意口齿清晰,语速适中。

五、后期处理1. 通过降噪、去混响等后期处理工具来降低杂音和混响。

2. 使用人工智能语音处理技术对录音进行处理,提高语音合成和识别的准确性。

以上是针对鸡尾酒效应的录音方案,通过选择合适的设备、场地,调整录音参数,以及后期处理,可以有效地降低鸡尾酒效应对录音的影响,提高语音识别和语音合成的准确性。

鸡尾酒效应录音方案:简介:鸡尾酒效应是一种心理学现象,指在复杂嘈杂的环境下,人们可以聚焦于某一声音或声音组合并将其从环境噪音中区分出来。

对于录音工作而言,要想在复杂嘈杂的环境中得到清晰的录音,需要使用鸡尾酒效应录音方案。

方案:1. 选址:选择一个相对安静、噪音小的地方进行录音,例如静音室、闭门会议室等。

2. 设备:选用高品质的麦克风和录音设备,如专业级的录音机、麦克风和调音台。

确保设备符合录音环境的需求。

3. 音频处理:使用降噪软件或音频处理器,削弱背景噪音,突出主要音源。

可以对录音进行去噪、降低混响、调整音量等操作。

4. 音源定位:合理摆放麦克风和录音设备,将麦克风放在主要音源附近,调整好麦克风的灵敏度和方向,使主要音源能够清晰地被记录下来。

JBL ASB6118高功率单子18英寸液晶扬声器产品介绍说明书

JBL ASB6118高功率单子18英寸液晶扬声器产品介绍说明书

ASB6118High PowerSingle 18" SubwooferKey Features:᭤1 x 18" 2242H SVG™ Driver.᭤Large vent area for high output with low distortion.᭤Arrays with various AE Series mid-high frequency 2-way models (see AE Series Array Guide).Applications:᭤Performing arts facilities ᭤Theatrical sound design ᭤Auditoriums᭤Houses of worship ᭤Live clubs᭤Dance-clubs/discotheques ᭤Sports facilities᭤Themed entertainment venuesASB6118 is a high power subwoofer system comprised of one 460 mm(18 in) SVG Super Vented Gap low fre-quency driver in a vented, front-loaded configuration for extended bandwidth.The rectangular cabinet is fitted with M10 threaded suspension points. Pre-engineered array bracketry is available.ASB6118 is part of JBL’s AE Application Engineered Series, aversatile family of loudspeakers for a wide variety of applications.Specifications:System:Frequency Range (-10 dB):28 Hz – 1 kHz Frequency Response (±3 dB):35 Hz – 1 kHzTransducer Power Rating (AES)1:1200 W (4800 W peak), 2 hrs Long-Term System Power Rating 2:800 W (3200 W peak), 100 hrsMaximum SPL 3:30 Hz – 100 Hz: 129 dB-SPL cont avg (135 dB peak)100 Hz – 500 Hz: 129 dB-SPL cont avg (135 dB peak)System Sensitivity (dB-SPL, 1W @ 1m)4:30 Hz – 100 Hz: 98 dB100 Hz – 500 Hz: 98 dBNominal Impedance:8 ohmsTransducers:Low Frequency Driver: 1 x JBL 2242H 460 mm (18 in) SVG™ driver with 100 mm (4 in)voice coilPhysical:Enclosure:Rectangular cabinet, 16 mm (5/8 in) exterior grade 11-plyFinnish birch plywoodSuspension Attachment:12 points (3 top, 3 bottom, 2 each side, 2 rear), M10 threadedhardwareFinish:Black DuraFlex™ finish. White available upon request.Grille:Powder coated 14 gauge perforated steel, with acoustically trans-parent black foam backing.Input Connector:NL4 Neutrik Speakon ®and CE-compliant covered barrier stripterminals. Barrier terminals accept up to 5.2 sq mm (10 AWG)wire or max width 9 mm (.375 in) spade lugs. Speakon in paral-lel with barrier strip for loop-through.Environmental Specifications:Mil-Std 810; IP-x3 per IEC529.Dimensions (H x W x D in 548 x 561 x 816 mm vertical cabinet orientation):(21.6 x 22.1 x 32.2 in)Net Weight:44.5 kg (98 lb)Optional Accessories: M10 x 35 mm forged shoulder eyebolts with washers.Optional planar array frame kit. See AE Series Bracket GuideAES standard, one decade pink noise with 6 dB crest factor within device's operational band, free air. Standard AES 2 hr rating plus long-term 100 hr rating are specified for low-frequency transducers.AES standard, one decade pink noise with 6 dB crest factor, in cabinet, long-term 100 hr rating.Calculated based on power rating and sensitivity, exclusive of power compression.Half-space (2␲) loading, averaged in specified frequency band.JBL continually engages in research related to product improvement. Changes introduced into existing products without noticeare an expression of that philosophy.SS ASB6118CRP 10M 7/02᭤ASB6118 High Power Single 18" SubwooferJBL Professional8500 Balboa Boulevard, P.O. Box 2200Northridge, California 91329 U.S.A.©Copyright 2002 JBL ProfessionalA Harman International Company Frequency response is measured on-axis at a distance referenced to 1 m @ 1 watt (2.83 Vrms)input,shown as half-space (2␲,solid line) and full-space (4␲,dotted line) environment.Electrical Input Impedance。

双音频(DTMF)信号的产生与检测2

双音频(DTMF)信号的产生与检测2

XXXXXXX大学毕业论文(设计)题目:双音频(DTMF)信号的产生与检测学生姓名学号专业电子信息工程班级2008级1班指导教师学部计算机科学与电气工程答辩日期2012年5月19日黑龙江东方学院本科生毕业论文(设计)任务书双音频(DTMF)信号的产生与检测摘要双音多频DTMF(Dual Tone Multi-Frequency)信令在全世界范围内得到广泛应用,DTMF信令的产生与检测集成到含有数字信号处理器(DSP)的系统中,是一项较有价值的工程应用。

DTMF作为实现电话号码快速可靠传输的一种技术,它具有很强的抗干扰能力和较高的传输速度,因此,可广泛用于电话通信系统中。

但绝大部分是用作电话的音频拨号,另外,它也可以在数据通信系统中广泛地用来实现各种数据流和语音等信息的远程传输,研究其在MATLAB下的仿真实现有助于其具体系统的优化设计。

本文给出一种实现方案,主要阐述了DTMF的原理及如何在Matlab上产生DTMF信号,并对用Goertzel算法提取的频谱进行分析,然后,得到用Goertzel算法在白噪声的环境下对输入的DTMF信号提取频谱信息,最后,根据提取的频谱信息对输入信号进行检测解码。

关键词:双音多频DTMF;Goertzel算法;MatlabDual Tone Multi-frequency (DTMF) Signal Generationand DetectionAbstractDTMF (Dual Tone Multi-Frequency) signaling in the widely used worldwide, signaling the DTMF generation and detection integrated with digital signal processor (DSP) system, is a more value engineering. DTMF telephone number as to achieve a fast and reliable transmission technology, it has a strong anti-interference ability and high transmission speed, it can be widely used for telephone communication system. But the vast majority of telephone tone dialing is used. In addition, it can also be in the data communication system widely used to achieve a variety of data streams and remote transmission of voice and other information. Under study in the MATLAB Simulation helps optimize the design of their specific systems.This paper presents a realization of the program, mainly on the principle of DTMF and how to generate DTMF signals in Matlab, and extracted with Goertzel algorithm to analyze the spectrum, and then, get with the Goertzel algorithm in the context of white noise on the input of the DTMF spectrum information signal extraction, and finally, according to information extracted from the input signal spectrum to detect decoding.Keywords:Dual tone multi-frequency;Goertzel algorithm;Matlab目录摘要 (I)Abstract ........................................................................................................................................ I I 第1章绪论 (1)1.1引言 (1)1.2课题意义 (1)第2章基本原理 (3)2.1自动电话的制式 (3)2.2 DTMF技术 (5)2.3 Goertzel算法 (5)2.4 Matlab简介 (7)2.5本章小结 (8)第3章DTMF信号产生与检测 (9)3.1 DTMF信号的产生 (9)3.2 DTMF信号的检测 (10)3.2.1 DTMF信号检测方法 (10)3.2.2 DTMF信号有效性的检测 (12)3.3本章小结 (13)第4章Matlab仿真 (14)4.1设计程序(见附录) (14)4.2 Matlab仿真 (14)4.3本章小结 (17)结论 (18)参考文献 (19)附录 (20)致谢 (29)双音频(DTMF)信号的产生与检测第1章绪论1.1 引言电话中的双音多频信号(DTMF)有两种用途:一是用于双音多频信号的拨号,去控制交换机接通被叫的用户话机;二是利用双音多频信号遥控电话机各种动作,如播放留言、语音信箱等,并可以通过附加一些电路来是实现遥控家电设备的开启关闭等智能功能。

模糊限制语的顺应性的开题报告

模糊限制语的顺应性的开题报告

模糊限制语的顺应性的开题报告一、选题背景在人机交互中,自然语言处理技术得到了广泛的应用。

模糊限制语作为一类表征模糊限制信息的自然语言,可以应用于多个领域。

其中,模糊限制语的顺应性是一个重要的研究方向。

顺应性是指自然语言的表达方式要符合社会文化传统、习惯以及与人交流沟通的需求。

模糊限制语顺应性研究的重要性在于提升人机交互的效率和准确性,有效避免信息不准确、误导等不良后果。

二、研究目的通过对模糊限制语的顺应性进行研究,旨在揭示模糊限制语交流过程中面临的顺应性问题,为提高人机交互效率和准确性提供理论和实践依据。

具体目的包括:1. 探究模糊限制语的顺应性定义及其研究现状;2. 分析模糊限制语顺应性的影响因素;3. 提出针对模糊限制语顺应性的解决方案。

三、研究内容1. 模糊限制语的概念、特点、分类和应用领域分析;2. 模糊限制语顺应性的定义、分类和表达方式分析;3. 模糊限制语顺应性影响因素的分析和归纳;4. 模糊限制语顺应性解决方案的提出。

四、研究方法1. 文献调研法:通过查阅相关文献,总结和归纳模糊限制语顺应性的定义、分类和表达方式,了解模糊限制语的应用领域,分析模糊限制语顺应性的研究现状;2. 实地调查法:通过对模糊限制语使用者进行深入访谈,了解模糊限制语在实际应用中的情况,分析模糊限制语顺应性的影响因素;3. 统计分析法:通过对实地调查数据进行统计分析,归纳出模糊限制语顺应性的主要问题所在;4. 实验研究法:通过实验设计,对提出的模糊限制语顺应性解决方案进行验证和评估。

五、可行性分析1. 资源条件:本研究所需数据和材料可以通过文献和实地调查获取,技术和设备条件得到保障;2. 时间条件:本研究所涉及到的调查、统计、实验等方法需要一定时间进行;3. 经费条件:本研究所需费用包括文献信息、实地调查、实验设计等费用,可通过相关单位经费资助或个人自费方式解决。

六、预期研究结果1. 深刻理解模糊限制语顺应性的概念定义;2. 揭示模糊限制语顺应性的影响因素;3. 提出模糊限制语顺应性的解决方案,为提高人机交互的效率和准确性提供理论和实践依据。

相关主题
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Speaker Diarization for Multiple Distant Microphone Meetings: Mixing Acoustic Features And Inter-Channel Time DifferencesJose M. Pardo1,2, Xavier Anguera1,3, Chuck Wooters11 International Computer Science Institute, Berkeley CA 94708 USA,2 Universidad Politécnica de Madrid, 28040 Madrid, Spain3Technical University of Catalonia, Barcelona, Spainjpardo@die.upm.es,{xanguera,wooters}@AbstractSpeaker diarization for recordings made in meetings consists of identifying the number of participants in each meeting and creating a list of speech time intervals for each participant. In recently published work [7] we presented some experiments using only TDOA values (Time Delay Of Arrival for different channels) applied to this task. We demonstrated that information in those values can be used to segment the speakers. In this paper we have developed a method to mix the TDOA values with the acoustic values by calculating a combined log-likelihood between both sets of vectors. Using this method we have been able to reduce the DER by 16.34% (relative) for the NIST RT05s set (scored without overlap and manually transcribed references) the DER for our devel06s set (scored with overlap and force-aligned references) by 21% (relative) and the DER for the NIST RT06s (scored with overlap and manually transcribed references) by 15% (relative) .Index terms: Speaker diarization, speaker segmentation, meetings recognition.1.IntroductionThere has been extensive research at ICSI in the last few years in the area of speaker segmentation and diarization [1],[2],[3],[4].Speaker diarization for meetings consists of identifying the number of participants in each meeting and creating a list of speech time intervals for each participant. Notice that it may occur that two or more speakers talk at the same time, these overlap regions should be labelled with both speaker labels. It is important to emphasize that speaker diarization is done without using any knowledge about the number of speakers in the room, their location, the position and quality of the microphones, or the details of the acoustics of the room. These conditions make the task itself very tricky and very dependent on the characteristics of the room, the number of speakers and the number of channels. We have tried to make a system as robust as possible so its results are stable across different recording settings.Speaker diarization for meetings using multiple distant microphones (MDM) should be easier compared to the use of a single distant microphone (SDM) for several reasons: a) there are redundant signals (one for each channel) that can be used to enhance the processed signal, even if some of the channels have a very poor signal to noise ratio; and b) there is information encoded in the signals about the spatial position of the source (speaker) that is different for each speaker.In previous work [9], a processing technique using the time delay of arrival (TDOA) was applied to the different microphone channels by delaying in time and summing the channels to create an enhanced signal. With this enhanced signal, the speaker diarization error (DER) was improved by 3.3% relative compared to the single channel error for the RT05s evaluation set, 23% relative for the RT04s development set, and 2.3% relative for the RT04s evaluation set (see [10] for more information about the databases and the task).While in the work mentioned above, improvements were obtained, no direct information about the delays between different microphones was used in the segmentation and clustering process.In recent work [7], we processed the TDOA values and clustered them to obtain a segmentation hypothesis. Using only this information we obtained a 31.2% diarization error rate (DER) for the NIST´s RT05s conference room evaluation set. For a subset of NIST´s RT04s, we obtained 35.73% DER error (not including False Alarms in this case). Comparing those results with the ones presented by Ellis and Liu [8], who also used inter-channel differences for the same data, we obtained 43% relative improvement..In this paper we combine the acoustic front end features (MFCC) with the TDOA features to obtain an enhanced segmentation useful for this task. Including the TDOA values we have been able to improve baseline results by 16.35% relative for the RT05s evaluation set 21% relative for the devel06s database (see the explanation of the database content below) and 15% relative for the RT06s evaluation set.2.System DescriptionThe basic procedure is based on the segmentation and clustering proposed in [2],[3] using only acoustic features, without the use of the purification method mentioned in [3]. But there are substantial differences as explained below.2.1.Speech/non speech calculationAs a first step in the diarization process non-speech frames are identified and removed. We have used the SRI speech/non-speech detector [4] or a more recent system developed at ICSI [12].2.2. Delay generationWe calculate the cross-correlation between the signals coming from the different channels and estimate the TDOA as the maximum of the cross-correlation function. The details of the delay generation procedure are described in [9].For a set of microphones, we choose the microphone with overall best cross-correlation with all others as the reference microphone and calculate the delay of the signals coming to the other microphones relative to the reference microphone. We form a vector of these delays that has as many components as the number of microphones minus 1. We use a window width of 500 msec with a shift of 10 msec.2.3. Acoustic feature extractionThe signals coming from the different microphones are delayed and added together to form a single enhanced signal [9]. On the enhanced signal we calculate a vector of 19 MFCC coefficients using a 30 msec analysis window and a frame shift of 10 msecs.2.4. InitializationThe initialization requires a “guess” at the maximum number of speakers (K) that are likely to occur in the data. The data are then divided into K equal-length segments, and each segment is assigned to one model. Each model's parameters are then trained using their assigned data. With the trained models we segment the data (using the Viterbi algorithm) and retrain them over several iterations. The clustering process uses of an ergodic HMM model that has a number of states equal to the initial number of clusters (K). Each state in the HMM contains a sequence of MD subtates which are used to impose a minimum duration. Within a state, each one of the sub-states shares a probability density function (PDF) modelled with a Gaussian mixture model (GMM) with a diagonal covariance matrix. Each GMM starts with “g” gaussians which are changed later in the merging process. The models for the acoustic vectors and for the delay vectors are trained in parallel but kept as separate models. The number of initial gaussians per model is different for the acoustic vectors and for the delay vectors. In previous work [7] we did some experimentation using only the delay vectors in the segmentation and clustering procedure. Using 10 initial clusters each starting with a single we obtained the best results for the MDM RT05s set (31.2% DER). For the acoustic features we use 5 initial gaussians per model.The combined log-likelihood Clog for each state and every frame is obtained by combining the log likelihoods from the acoustic vectors and the log likelihoods from the delay vectors using the following formula:(Eq.1)θa is the compound model for cluster a, θax is the model created for cluster a using the acoustic vectors x[n] and θay is the model created for cluster a using the delay vectors y[n]. α is a weight that has to be determined by some method. Currently we determine it empirically using development data.2.4.1. Clustering processThe initialized modelseed the clustering and segmentation processes described next.The iterative segmentation and merging process consists of the following steps:1. Run a Viterbi decode to re-segment the date.2. Retrain the models using the segmentation from (1).3. Select the pair of clusters with the largest merge score (Eq.2)> 0.0 (Since Eq. 2 produces positive scores for models that are similar, and negative scores for models that are different, a natural threshold for the system is 0.0) 4. If no pair of clusters is found, stop.5. Merge the pair of clusters found in (3). The models for theindividual clusters in the pair are replaced by a single, combined model. 6. Go to (1).2.4.2. Merging scoreOne of the main problems in the segmentation and clustering process is deciding which merging score to use. The BIC criterion has been used extensively, giving good results [1], [11] and the modification of BIC to eliminate the need of a penalty term has also given us good results. Nevertheless it is still an open question as to how much the performance depends on the kind of data vectors and models that are used in the comparisons. The modified BIC that we use for merging is:(Eq.2)θa is the model created with D a and θb is the model created with D b , θ is the model created with D which is the union of D a and D b , the key to this modified BIC is that the number of parameters in θ must equal the sum of the number of parameters in θa and θb .3. Experiments and Results3.1. Experiments with RT05s set and hand labeledreferences (System A).We have used the RT05s MDM conference meetings evaluation data in our experiments. The data consists of 10 meetings from which 10 minute excerpts have been extracted [10]. The DER was obtained using the standard NIST procedure comparing the segmentation results with the hand labelled reference data. In the first column of Table 1, results for the independent systems and for the combined system are shown. We have not included overlapping speech in the score calculation. For the results presented here we have used the SRI speech/non-speech detector.The results obtained by using the combined system give a relative error improvement of 16.34 % DER.We have used a weight factor α=0.9. The first question to answer is how to determine α. Fig 1 shows a plot of DER as a function of α. It can be seen that a badly chosen weight factor[][][][])(log )1()(log ),(log ayax a i y p i x p i y i x p C θαθαθ−+=)(log )(log )(log b b a a D p C D p C D p C BIC θθθ−−=∆can seriously degrade performance, since the delays alone have a much worse performance than the acoustic vectors alone.Features used DER eval05s (system A) DERdevel06s (system B)Delays only 31.20 % 31.97 % Acoustic features only18.48 % 12.71 % Combined acoustic+delays15.46 % 10.04 % Relative error reduction16.34 %21 %Table 1: DER error for the eval05s and devel06sdataset obtained using acoustic features only, delay features only and combined features using System A andSystem BFigure 1: Plot of different DER errors as a function ofthe weight factor applied for the Eval05s set3.2. Experiments with force-aligned references(System B)For the NIST RT06s evaluation campaign we decided to select a set of development shows from all previous data sets: RT02s, RT04s, and RT05s. The set of shows are given in Table 2. We we will refer to this set of shows as the devel06s data. For the RT06 evaluation we havemade several changes compared to what has been mentioned above: •Since the evaluation this year was going to be made taking into account the overlap between speakers, all the new scoring has been done taking overlaps into account 1.•We have discovered that the hand-aligned data contained a lot of non-speech events (breaths, cough, lipsmacks etc), especially at the beginning and end of every speaker turn. 1Scoring with overlap means taking into account the regions where more than one speaker talks at the same time and an error is counted if any of the speakers is not found.Also and sometimes overlap was marked when there was. For this reason we decide to use references obtained by using the SRI recognizer to force-align the data. •We have used a new speech/non-speech detector that doesn´t need training data [12].The results on the devel06s are presented in Table 2. In the different columns we present the percentage of missed speech, false alarm speech, speaker error and total diarization error. There is a missed if one (or several) speakers talking at the same time are not labeled. There is a false alarm if the system assigns a label to a region where there is no speech. There is a speaker error if the label assigned by the system does not match the tgrue speaker (see [10]).Figure 2: DER as a function of the weight factor for thedevel06s data)File Miss FA Spkr Total AMI_20041210-1052 0.40 1.20 1.10 2.69 AMI_20050204-1206 2.60 2.20 3.30 8.01 CMU_20050228-1615 9.30 1.20 1.80 12.30 CMU_20050301-1415 3.70 1.60 1.10 6.41 ICSI_20000807-1000 4.60 0.40 3.80 8.77 ICSI_20010208-1430 3.60 1.10 11.00 15.72 LDC_20011116-1400 2.10 3.00 4.20 9.32 LDC_20011116-1500 5.90 1.10 7.60 14.65 NIST_20030623-1409 1.00 0.70 1.40 3.08 NIST_20030925-1517 7.70 5.70 9.60 22.95 VT_20050304-1300 0.60 1.00 2.80 4.43 VT_20050318-1430 1.30 6.20 13.80 21.36 ALL 3.40 1.90 4.70 10.04Table 2: Results for the subset of shows listed. We present the percentage of Missed speech, False Alarm speech,, Speaker error and total Diarization error (DER).The results presented for devel06s in Table 1 and Table 2, used K=16 and a minimum duration of 2.5 seconds. Again it can be seen that performance compared to the baseline system (withoutusing the delays information) is improved by using the delays information. In the second column of Table 1, the DER using acoustic features, delay features and the combination of both are presented. The relative improvement of the combined system compared to the acoustic features alone is 21 %.In Figure 2, we show the DER as a function of the weight factor applied. Again a good weight factor is crucial to obtaining good results.3.3.Official results at NIST RT06This system1was presented as a contrastive system in the official NIST RT 06s evaluation campaign giving a DER of 35.77%. Although the original plan was to score the data using force-aligned labels, the results were finally scored using hand-made references. The scoring with force-aligned labels gave a DER of 20.03%. We calculated the after-eval score using only acoustic features and hand-made references and it gave us a DER of 42.13%. Thus the use of delay information reduced the error of the system by 15% relative.4.DiscussionIn all the experiments that we have done using this method we have been able to improve the results obtained using only the acoustic vectors. The results on both RT05s set, devel06s set and RT06s set show substantial improvements.The problem of integrating TDOA information with acoustic information is not trivial. Previous experiments merging both types of information in a single long vector yielded poorer performance. We believe that this may be due to the use of diagonal covariance matrices in a non-homogeneuos vector.There is a large amount of work ahead researching methods for merging both sources of information, especially since we do not know yet if the merging metric used is the best possible metric. There is also a need to develop methods to estimate the weight factor automatically.5.ConclusionsIn this paper we have presented a method to combine acoustic features and delay features to improve speaker diarization performance. The results are significantly better than the ones obtained using acoustic features alone. There are still many unknowns to the method (some of them inherent to the clustering procedure) such as how to choose the minimum duration constraint, how to choose the initial number of clusters, and how to choose the initial number of Gaussians for each cluster. Particularly important is how to select a good weight factor between the acoustic features and the delay features that is robust and generalizes to different conditions of rooms, number of speakers, number of microphones etc.6.AcknowledgementsThis work was supported by the Joint Spain-ICSI Visitor Program and by the projects ROBINT (DPI 2004-07908-C02) , 1 With a slightly modified delay calculation method TINA (UPM-CAM R05-10922) and EDECAN (TIN2005-08660-C04). We also would like to thank Andreas Stolcke, Kemal Sönmez and Nikki Mirghafori for many helpful discussions. We appreciate the help of Michael Ellsworth in reviewing the English.7.References[1]J. Ferreiros, D. Ellis: Using Acoustic Condition ClusteringTo Improve Acoustic Change Detection On Broadcast News. Proc. ICSLP 2000[2]J. Ajmera, C,Wooters : A Robust speaker clusteringalgorithm, IEEE ASRU 2003.[3]X. Anguera, C. Wooters, B. Pesking and Mateu Aguiló :Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System, Proc NIST MLMI Meeting Recognition Workshop, Edinburgh, 2005[4] C.Wooters, J, Fung, B. Pesking, X. Anguera, “TowardsRobust Speaker Segmentation: The ICSI-SRI Fall 2004 Diarization System” NIST RT-04F Workshop, Nov. 2004.[5] A. Stolcke, X. Anguera, K. Boakye, O. Cetin, F. Grezl, A.Janin, A. Mandal, B. Peskin, C. Wooters and J. Zheng, “Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System”Proceedings of NIST MLMI Meeting Recognition Workshop, Edinburgh.[6] A. Janin, J. Ang, S. Bhagat, R. Dhillon, J. Edwards, J.Macias-Guarasa, N. Morgan, B. Peskin, E. Shriberg, A.Stolcke, C. Wooters and B. Wrede, “The ICSI Meeting Project: Resources and Research” NIST ICASSP 2004 Meeting Recognition Workshop, Montreal[7]J.M.Pardo, X. Anguera, C. Wooters: Speaker DiarizationFor Multi-Microphone Meetings Using only Between-Channel Differences. Proc. MLMI 06, 3rd Joint Workshop on Multimodal Interaction and Related Machine Learning Algorithms, 1-3 May 2006, Washington DC, USA. To appear in Lecture Notes in Computer Science.[8] D.P.W Elis and Jerry C.Liu : Speaker Turn SegmentationBased On Between-Channels Differences, Proc. ICASSP 2004.[9]X. Anguera, C. Wooters, J. Hernando : Speaker DiarizationFor Multi-Party Meetings Using Acoustic Fusion, IEEE ASRU, 2005.[10]NIST Spring 2005 (RT05S) Rich Transcription MeetingRecognition Evaluation Plan, /iad/894.01/tests/rt/rt2005/spring/ [11]S.S. Chen, P.S. Gopalakrishnan: Speaker Environment AndChannel Change Detection And Clustering Via The Bayesian Information Criterion, Proceedings DARPA Broadcast News Transcription and Understanding Workshop, Virginia, USA, Feb. 1998.[12]X. Anguera, M Aguiló, C. Wooters, C. Nadeu, J. Hernando“Hybrid Speech/non-speech detector applied to Speaker Diarization of Meetings” IEEE Odyssey 2006: The Speaker and Language Recognition Workshop 28-30 June 2006 San Juan, Puerto Rico。

相关文档
最新文档