Low-complexity near lossless coding of depth maps from Kinect-like depth cameras

合集下载

LosslessCompress...

Lossless Compression of DSA Image Sequence Based on PCMD2 CoderJi Zhen, Mou Xuanqin, Jiang Yifeng, Cai YuanlongImage Processing Center, Xi’an Jiaotong Univ.Xi’an, P.R.C., 710049E-mail:**************.cn,**************.cnABSTRACT:ⅠINTRODUCTIONMedical image compression has been mainly concerned with lossless coding techniques[1], which ensure that all significant information for diagnostic purposes is retained in the reconstructed images with the compression ratio of around 2:1.The Digital Image and Communications in Medicine proposed standard (DICOM3),that adopted by the American College of Radiology and National Electrical Manufacturing Associa-tion(ACR-NEMA), includes lossless coding and compression of medical images. However, recent studies concerning “Visually lossless” and “In-formation preserving” indicate that the reconstruc-tion error is accepted from a medical point of view. ACR-NEMA announced a call for proposals for lossy compression that has been included in DI-COM3.0 in 1995. This class of compression tech-niques is subjective definitions and extreme cau-tion msut be taken into considerance in P.R.C,where the criteria remains ambiguous and many complex legal and regulatory issues would arise.The objective of compressing images is to reduce the data volume and achieve a low bit rate. Compressing a digital medical image can facilitate its storage and transmission. With regard to the legislation problems, physician prefers diagnosing with uncorrupted medical image. The popular lossless coding schemes includes Huffman, arith-metic, run-length encoding(RLE) and LZW[2]. More effective coding method is desired for the fast growth of PACS (Picture Archiving and Communication System) and Teleradiology.Acqusition of 3-D and 4-D medical image sequences are becoming more usual nowadays, especially in the case of dynamic studies with MRI,CT,DSA,PET. A new lossless compression method is proposed for the image sequence gener-ated by DSA ( Digital Subtraction Angiography ) apparatus, which is now common as X-ray, CT. Compressing an image sequence is equal to com-pression of 3D data, which contains both space and time redundance. The well-known MPEG[3] provides the perfect resolution of the lossy codingof image sequence. It has proved effective and proposed many practicable techniques, which could be exploited.Differential pulse code modulation (DPCM)[4] predictive coding method is predomi-nant in lossless compression. In this paper, high order(2-order) DPCM coder is introduced that can exploit the correlation between one-order differen-tial images and two-order ones to benefits of com-pression, which has highly competitive compres-sion performance and remains practical. We pre-sent the description of this proposed lossless image sequence coder in followed sections.ⅡCHARACTERISTICS OF DSA IMAGESImage compression techniques usually consistof concrete mathematical models. In practical ap-plications, the specific images, for which the available priori knowledge could be exploited in order to develop an optimized compression scheme.A typical DSA image sequence is shown in Figure 1,which consists of :Figure 1. a typical DSA image sequence (a) M represents mask image, acquired before in-travenous or intraarterious injection. (b) L(n) means the sequence of live images, acquired after intravenous or intraarterious injection. N is the volume of the whole sequence. (c) S(n) is the sub-traction image. S(n)=L(n)-M (n=0,…,N-1).(d) SD(n) is the differential subtraction image. SD(n)=S(n)-S(n-1) (n=1,…,N-1). It is obvious that the whole sequence could be repre-sented in three equivalent formats: Format 1. M and L(n); Format 2. M and S(n);Format 3. M 、S(0) and SD(n).Firstly, the entropy of whole images is calcu-lated separately, H x pp jj j()log ,=−∑ (p jmeans the probability of the jth gray in a image). The followed table could be acquired. In this table. The image resolution is 1024*1024*10bits. Table 1Image H(x) M 6.65 L(n) 6.61 S(n) 4.68 SD(n) 3.67Secondly, the correlation between two images is defined as {})()()(k n I n I E k R +⋅=. We separately obtain the correlation of S(n) and SD(n). The correlation between S(n) is shown in Figure 2(a) and between SD(n) is in (b).From the above curve, followed conclusions about the characteristic of DSA images could be made:(a) when K<5, the correlation R(k) between S(n) often remains very high.(b) the correlation R(k) in SD(n) decreases quickly while K increases. However, the R(1) could hold 0.60, which is useful and important in the pro-posed compression method.Since the three represent formats are equivalent in the mathematical mean, compressing the whole sequence could be implemented in three ways, which would take use of different tech-niques and get different results. It is evident that compression result would better by adopting the last represent than two others because of followed reasons: 1.the entropy is smallest, which mean that the highest lossless compression ratio is prospec-tive theoretically. 2.the R(k) of common signal decreases distinctly after applying one-order dif-ferential, Which means that the two-order differ-ential operation seems worthless. However, it is not the same for the DSA images. The correlation between S(n) holds high, which results in the cer-tain meanness of two-order differential images SD(n). The compression performance would be improved by taking the full advantage of this characteristic.Ⅲ COMPRESSION FRAMEWORK A. Differential Pulse Code ModulationDifferential pulse code modulation(DPCM) exploits the property that the values of adjacent pixels in an image are often similar and highly correlated. The general block diagram of DPCM is shown in Figure 3.The value of a pixel is predictedas a linear combination of a few neighbor pixel values, which is represented as follows: ∑∈−−=Rj i r e n j m i X n m X ),(),(),(αwhere e X is the predicted value, R meansa. R(k) between S(n)b. R(k) between SD(n)Figure 2. the R(k) curve of S(n) and SD(n)xEncoderDecoderneighbor region, ),(n m αare prediction coeffi-cients specified by the image characteristic. Theprediction error defined as ),(),(),(j i X j i X j i e e−= is then coded and trans-mitted. The Decoder reconstructs the pixels as ),(),(j i e j i X X q e r +=In order to keep lossless compression, only thepredictor would be included in DPCM coder. It isa “Predictor + Entropy Coder” lossless DPCMcoding .[5]B. Two Order Differential Pulse Code ModulationThe D PCM 2 coder is different from thetwo-demensional(2-D)DPCM method andthree-demensional(3-D) one[6].For a pixel ),,(j i n X in S(n), applying the DPCM withthe variable n indicates the its extension ofthree-dimension. The same is for the SD(n,x,y) in Format 3, which means two order differential be-cause SD(n) is derived from one differential op-eration oneself. The D PCM 2 coder includes four steps:1.For the mask image M, apply the RICE[7]coding method.2.Construct an optimal predictor. The ARMR model is adopted according to the image attribute. The equation is : )()1,1()1,1(),1()0,1()1,()1,0(e r X X b n m Xr n m Xr n m Xr X −+−−+−+−=αααThe last component of right part means the two-order differential and there is not quantization er-rors.3.For the every pixel S(n,x,y) in S(n), apply to the linear prediction with the variable n.4.Apply adaptive arithmetic algorithm to the error signal ),,(y x n e p . The whole coding process is shown in Figure4.Figure 4. the scheme of D PCM 2coding The dash rectangle constitute the D PCM 2 lossless coder. The decoding process is obvious . Ⅳ EXPERIMENTAL STUDY Figure 5 shows typical DSA images acquiredin Forth Military Medical Univ. of P.R.C. The fig-ure gives the image M, L(60),S(60),S(59) and SD(60) separately. The compression results through this pro-posed method are given as follows: average image size after compression is 463Kbyte. average compression ratio cr ==102410241046364782261***.:average bit rate pixel bit B /42.41024*10248*579338== average compression efficiency η===H x B ()..44236783% The followed table gives the compari-son with Huffman,Arithmetic and DPCM. The compression procedure is equivalent to still image compression without reducing the correlation between images.Huffman Arithmetic DPCM D PCM2Cr 1.36:1 1.47:1 1.53:1 2.26:1 B7.35 6.76 6.76 4.42From above table, it is obvious that this proposed compression technique performs better than others.ⅤCONCLUSIONWith applying the proposed method to the DSA apparatus, perfect experiment performance is deserved. The compression ratio is better than or-dinary techniques. The computational time and space also prove practicable and robust. Although significant progress Many researches remain.ACKNOWLEDGMENTThe authors wish to thank Dr.Sun Li-jun,FMMU for his cooperation on acquisition of some important DSA images.(a) M (b) L(60)(c) S(60) (d) S(59) (e) SD(60)(enhancement for display)Figure 5. (a) mask image (b) live image (c)and (d) subtraction image (e) differential subtraction image1.S.Wong, L.Zaremba.D.Gooden,and H.K.Huang. “Radiologic image compression-A Reivew”,Proc. IEEE,vol,83,pp194-219,Feb.1995.2.A.K.Jain, Fundamentals of Digital Image Processing. Eaglewood Cliffs,NJ:Prentice-Hall,1989.3.MPEG(ISO/IEC JTC1/SC29/WG11).Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to 1.5MBit/s ISO.CD 11172,Nov,1991.4.G.Bostelmann,“A simple high quality DPCM-Codec for video telephony using 8 Mbit per sec-ond”,NTZ,27(3),pp115-117,1974.5.L.N.Wu, Data Compression & Application, Electrical Industry Publisher,pp116-120,19956.P.Roos and M.A.Viegever, “Reversible 3_D decorrelation of medical images”, IEEE, Medical Im-aging,12(3),pp413-420,1993.7.D.Yeh, ler, “The development of lossless data compression technology for remote sensing application”, pp307-309,In IGRASS’94.。

多媒体技术08DCT与JPEG编码

第8章 DCT 与JPEG 编码JPEG 是用于灰度图与真彩图的静态图像压缩的国际标准，JPEG 主要采用了以DCT （Discrete Cosine Transform ，离散余弦变换）为基础的有损压缩算法，在本章中会作较为详细的介绍。

JPEG 2000则是用于二值图、灰度图、伪彩图和真彩图的静态图像压缩的新标准，它采用的是性能更优秀的DWT （Discrete Wavelet Transform ，离散小波变换），将在下一章介绍。

因为视频的帧内编码就是静态图像编码，所以JPEG 和JPEG 2000的编码算法也用于MPEG 的视频编码标准中。

8.1 DCT与上一章所讲的几种熵编码不同，DCT （Discrete Cosine Transform 离散余弦变换）是一种变换型的源编码，使用十分广泛，也是JPEG 编码的一种基础算法。

DCT 将时间或空间数据变成频率数据，利用人的听觉和视觉对高频信号（的变化）不敏感和对不同频带数据的感知特征不一样等特点，可以对多媒体数据进行压缩。

8.1.1 余弦变换DCT 是计算（Fourier 级数的特例）余弦级数之系数的变换。

若函数f (x ) 以2 l 为周期，在[-l , l ]上绝对可积，则f (x )可展开成Fourier 级数：∑∞=⎪⎭⎫ ⎝⎛++=10sin cos 2)(n n n l x n b l x n a a x f ππ其中正弦变换余弦变换 sin )(1 cos )(1⎰⎰--==l l n l l n dx l x n x f l b dx lxn x f l a ππ若f (x )为奇或偶函数，有 a n ≡0或b n ≡0，则f (x )可展开为正弦或余弦级数：∑∑∞=∞=+==101cos 2)( sin )(n n n n l xn a a x f l x n b x f ππ或任给f (x )，x ∈[0, l ]，总可以将其偶延拓到[-l , l ]：⎩⎨⎧-∈-∈=]0,[),(],0[ ),()(l x x f l x x f x f 然后再以2l 为周期进行周期延拓，使其成为以2l 为周期的偶函数。

DCT及JPEG编码.

DCT 与JPEG 编码JPEG 是用于灰度图与真彩图的静态图像压缩的国际标准，JPEG 主要采用了以DCT （Discrete Cosine Transform ，离散余弦变换）为基础的有损压缩算法，在本章中会作较为详细的介绍。

因为视频的帧内编码就是静态图像编码，所以JPEG 和JPEG 2000的编码算法也用于MPEG 的视频编码标准中。

8.1.1 余弦变换DCT 是计算（Fourier 级数的特例）余弦级数之系数的变换。

若函数f (x ) 以2 l 为周期，在[-l , l ]上绝对可积，则f (x )可展开成Fourier 级数：∑∞=⎪⎭⎫⎝⎛++=10sin cos 2)(n n n l x n b l x n a a x f ππ其中正弦变换余弦变换sin )(1 cos )(1⎰⎰--==l l n l l n dx lx n x f l b dx lx n x f l a ππ若f (x )为奇或偶函数，有 a n ≡0或b n ≡0，则f (x )可展开为正弦或余弦级数：∑∑∞=∞=+==101cos2)( sin )(n n n n l xn a a x f l x n b x f ππ或任给f (x )，x ∈[0, l ]，总可以将其偶延拓到[-l , l ]：⎩⎨⎧-∈-∈=]0,[),(],0[ ),()(l x x f l x x f x f 然后再以2l 为周期进行周期延拓，使其成为以2l 为周期的偶函数。

《神经网络与深度学习综述DeepLearning15May2014

Draft:Deep Learning in Neural Networks:An OverviewTechnical Report IDSIA-03-14/arXiv:1404.7828(v1.5)[cs.NE]J¨u rgen SchmidhuberThe Swiss AI Lab IDSIAIstituto Dalle Molle di Studi sull’Intelligenza ArtiﬁcialeUniversity of Lugano&SUPSIGalleria2,6928Manno-LuganoSwitzerland15May2014AbstractIn recent years,deep artiﬁcial neural networks(including recurrent ones)have won numerous con-tests in pattern recognition and machine learning.This historical survey compactly summarises relevantwork,much of it from the previous millennium.Shallow and deep learners are distinguished by thedepth of their credit assignment paths,which are chains of possibly learnable,causal links between ac-tions and effects.I review deep supervised learning(also recapitulating the history of backpropagation),unsupervised learning,reinforcement learning&evolutionary computation,and indirect search for shortprograms encoding deep and large networks.PDF of earlier draft(v1):http://www.idsia.ch/∼juergen/DeepLearning30April2014.pdfLATEX source:http://www.idsia.ch/∼juergen/DeepLearning30April2014.texComplete BIBTEXﬁle:http://www.idsia.ch/∼juergen/bib.bibPrefaceThis is the draft of an invited Deep Learning(DL)overview.One of its goals is to assign credit to those who contributed to the present state of the art.I acknowledge the limitations of attempting to achieve this goal.The DL research community itself may be viewed as a continually evolving,deep network of scientists who have inﬂuenced each other in complex ways.Starting from recent DL results,I tried to trace back the origins of relevant ideas through the past half century and beyond,sometimes using“local search”to follow citations of citations backwards in time.Since not all DL publications properly acknowledge earlier relevant work,additional global search strategies were employed,aided by consulting numerous neural network experts.As a result,the present draft mostly consists of references(about800entries so far).Nevertheless,through an expert selection bias I may have missed important work.A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century.For these reasons,the present draft should be viewed as merely a snapshot of an ongoing credit assignment process.To help improve it,please do not hesitate to send corrections and suggestions to juergen@idsia.ch.Contents1Introduction to Deep Learning(DL)in Neural Networks(NNs)3 2Event-Oriented Notation for Activation Spreading in FNNs/RNNs3 3Depth of Credit Assignment Paths(CAPs)and of Problems4 4Recurring Themes of Deep Learning54.1Dynamic Programming(DP)for DL (5)4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL (6)4.3Occam’s Razor:Compression and Minimum Description Length(MDL) (6)4.4Learning Hierarchical Representations Through Deep SL,UL,RL (6)4.5Fast Graphics Processing Units(GPUs)for DL in NNs (6)5Supervised NNs,Some Helped by Unsupervised NNs75.11940s and Earlier (7)5.2Around1960:More Neurobiological Inspiration for DL (7)5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) (8)5.41979:Convolution+Weight Replication+Winner-Take-All(WTA) (8)5.51960-1981and Beyond:Development of Backpropagation(BP)for NNs (8)5.5.1BP for Weight-Sharing Feedforward NNs(FNNs)and Recurrent NNs(RNNs)..95.6Late1980s-2000:Numerous Improvements of NNs (9)5.6.1Ideas for Dealing with Long Time Lags and Deep CAPs (10)5.6.2Better BP Through Advanced Gradient Descent (10)5.6.3Discovering Low-Complexity,Problem-Solving NNs (11)5.6.4Potential Beneﬁts of UL for SL (11)5.71987:UL Through Autoencoder(AE)Hierarchies (12)5.81989:BP for Convolutional NNs(CNNs) (13)5.91991:Fundamental Deep Learning Problem of Gradient Descent (13)5.101991:UL-Based History Compression Through a Deep Hierarchy of RNNs (14)5.111992:Max-Pooling(MP):Towards MPCNNs (14)5.121994:Contest-Winning Not So Deep NNs (15)5.131995:Supervised Recurrent Very Deep Learner(LSTM RNN) (15)5.142003:More Contest-Winning/Record-Setting,Often Not So Deep NNs (16)5.152006/7:Deep Belief Networks(DBNs)&AE Stacks Fine-Tuned by BP (17)5.162006/7:Improved CNNs/GPU-CNNs/BP-Trained MPCNNs (17)5.172009:First Ofﬁcial Competitions Won by RNNs,and with MPCNNs (18)5.182010:Plain Backprop(+Distortions)on GPU Yields Excellent Results (18)5.192011:MPCNNs on GPU Achieve Superhuman Vision Performance (18)5.202011:Hessian-Free Optimization for RNNs (19)5.212012:First Contests Won on ImageNet&Object Detection&Segmentation (19)5.222013-:More Contests and Benchmark Records (20)5.22.1Currently Successful Supervised Techniques:LSTM RNNs/GPU-MPCNNs (21)5.23Recent Tricks for Improving SL Deep NNs(Compare Sec.5.6.2,5.6.3) (21)5.24Consequences for Neuroscience (22)5.25DL with Spiking Neurons? (22)6DL in FNNs and RNNs for Reinforcement Learning(RL)236.1RL Through NN World Models Yields RNNs With Deep CAPs (23)6.2Deep FNNs for Traditional RL and Markov Decision Processes(MDPs) (24)6.3Deep RL RNNs for Partially Observable MDPs(POMDPs) (24)6.4RL Facilitated by Deep UL in FNNs and RNNs (25)6.5Deep Hierarchical RL(HRL)and Subgoal Learning with FNNs and RNNs (25)6.6Deep RL by Direct NN Search/Policy Gradients/Evolution (25)6.7Deep RL by Indirect Policy Search/Compressed NN Search (26)6.8Universal RL (27)7Conclusion271Introduction to Deep Learning(DL)in Neural Networks(NNs) Which modiﬁable components of a learning system are responsible for its success or failure?What changes to them improve performance?This has been called the fundamental credit assignment problem(Minsky, 1963).There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses(Sec.6.8).The present survey,however,will focus on the narrower,but now commercially important,subﬁeld of Deep Learning(DL)in Artiﬁcial Neural Networks(NNs).We are interested in accurate credit assignment across possibly many,often nonlinear,computational stages of NNs.Shallow NN-like models have been around for many decades if not centuries(Sec.5.1).Models with several successive nonlinear layers of neurons date back at least to the1960s(Sec.5.3)and1970s(Sec.5.5). An efﬁcient gradient descent method for teacher-based Supervised Learning(SL)in discrete,differentiable networks of arbitrary depth called backpropagation(BP)was developed in the1960s and1970s,and ap-plied to NNs in1981(Sec.5.5).BP-based training of deep NNs with many layers,however,had been found to be difﬁcult in practice by the late1980s(Sec.5.6),and had become an explicit research subject by the early1990s(Sec.5.9).DL became practically feasible to some extent through the help of Unsupervised Learning(UL)(e.g.,Sec.5.10,5.15).The1990s and2000s also saw many improvements of purely super-vised DL(Sec.5).In the new millennium,deep NNs haveﬁnally attracted wide-spread attention,mainly by outperforming alternative machine learning methods such as kernel machines(Vapnik,1995;Sch¨o lkopf et al.,1998)in numerous important applications.In fact,supervised deep NNs have won numerous of-ﬁcial international pattern recognition competitions(e.g.,Sec.5.17,5.19,5.21,5.22),achieving theﬁrst superhuman visual pattern recognition results in limited domains(Sec.5.19).Deep NNs also have become relevant for the more generalﬁeld of Reinforcement Learning(RL)where there is no supervising teacher (Sec.6).Both feedforward(acyclic)NNs(FNNs)and recurrent(cyclic)NNs(RNNs)have won contests(Sec.5.12,5.14,5.17,5.19,5.21,5.22).In a sense,RNNs are the deepest of all NNs(Sec.3)—they are general computers more powerful than FNNs,and can in principle create and process memories of ar-bitrary sequences of input patterns(e.g.,Siegelmann and Sontag,1991;Schmidhuber,1990a).Unlike traditional methods for automatic sequential program synthesis(e.g.,Waldinger and Lee,1969;Balzer, 1985;Soloway,1986;Deville and Lau,1994),RNNs can learn programs that mix sequential and parallel information processing in a natural and efﬁcient way,exploiting the massive parallelism viewed as crucial for sustaining the rapid decline of computation cost observed over the past75years.The rest of this paper is structured as follows.Sec.2introduces a compact,event-oriented notation that is simple yet general enough to accommodate both FNNs and RNNs.Sec.3introduces the concept of Credit Assignment Paths(CAPs)to measure whether learning in a given NN application is of the deep or shallow type.Sec.4lists recurring themes of DL in SL,UL,and RL.Sec.5focuses on SL and UL,and on how UL can facilitate SL,although pure SL has become dominant in recent competitions(Sec.5.17-5.22). Sec.5is arranged in a historical timeline format with subsections on important inspirations and technical contributions.Sec.6on deep RL discusses traditional Dynamic Programming(DP)-based RL combined with gradient-based search techniques for SL or UL in deep NNs,as well as general methods for direct and indirect search in the weight space of deep FNNs and RNNs,including successful policy gradient and evolutionary methods.2Event-Oriented Notation for Activation Spreading in FNNs/RNNs Throughout this paper,let i,j,k,t,p,q,r denote positive integer variables assuming ranges implicit in the given contexts.Let n,m,T denote positive integer constants.An NN’s topology may change over time(e.g.,Fahlman,1991;Ring,1991;Weng et al.,1992;Fritzke, 1994).At any given moment,it can be described as aﬁnite subset of units(or nodes or neurons)N= {u1,u2,...,}and aﬁnite set H⊆N×N of directed edges or connections between nodes.FNNs are acyclic graphs,RNNs cyclic.Theﬁrst(input)layer is the set of input units,a subset of N.In FNNs,the k-th layer(k>1)is the set of all nodes u∈N such that there is an edge path of length k−1(but no longer path)between some input unit and u.There may be shortcut connections between distant layers.The NN’s behavior or program is determined by a set of real-valued,possibly modiﬁable,parameters or weights w i(i=1,...,n).We now focus on a singleﬁnite episode or epoch of information processing and activation spreading,without learning through weight changes.The following slightly unconventional notation is designed to compactly describe what is happening during the runtime of the system.During an episode,there is a partially causal sequence x t(t=1,...,T)of real values that I call events.Each x t is either an input set by the environment,or the activation of a unit that may directly depend on other x k(k<t)through a current NN topology-dependent set in t of indices k representing incoming causal connections or links.Let the function v encode topology information and map such event index pairs(k,t)to weight indices.For example,in the non-input case we may have x t=f t(net t)with real-valued net t= k∈in t x k w v(k,t)(additive case)or net t= k∈in t x k w v(k,t)(multiplicative case), where f t is a typically nonlinear real-valued activation function such as tanh.In many recent competition-winning NNs(Sec.5.19,5.21,5.22)there also are events of the type x t=max k∈int (x k);some networktypes may also use complex polynomial activation functions(Sec.5.3).x t may directly affect certain x k(k>t)through outgoing connections or links represented through a current set out t of indices k with t∈in k.Some non-input events are called output events.Note that many of the x t may refer to different,time-varying activations of the same unit in sequence-processing RNNs(e.g.,Williams,1989,“unfolding in time”),or also in FNNs sequentially exposed to time-varying input patterns of a large training set encoded as input events.During an episode,the same weight may get reused over and over again in topology-dependent ways,e.g.,in RNNs,or in convolutional NNs(Sec.5.4,5.8).I call this weight sharing across space and/or time.Weight sharing may greatly reduce the NN’s descriptive complexity,which is the number of bits of information required to describe the NN (Sec.4.3).In Supervised Learning(SL),certain NN output events x t may be associated with teacher-given,real-valued labels or targets d t yielding errors e t,e.g.,e t=1/2(x t−d t)2.A typical goal of supervised NN training is toﬁnd weights that yield episodes with small total error E,the sum of all such e t.The hope is that the NN will generalize well in later episodes,causing only small errors on previously unseen sequences of input events.Many alternative error functions for SL and UL are possible.SL assumes that input events are independent of earlier output events(which may affect the environ-ment through actions causing subsequent perceptions).This assumption does not hold in the broaderﬁelds of Sequential Decision Making and Reinforcement Learning(RL)(Kaelbling et al.,1996;Sutton and Barto, 1998;Hutter,2005)(Sec.6).In RL,some of the input events may encode real-valued reward signals given by the environment,and a typical goal is toﬁnd weights that yield episodes with a high sum of reward signals,through sequences of appropriate output actions.Sec.5.5will use the notation above to compactly describe a central algorithm of DL,namely,back-propagation(BP)for supervised weight-sharing FNNs and RNNs.(FNNs may be viewed as RNNs with certainﬁxed zero weights.)Sec.6will address the more general RL case.3Depth of Credit Assignment Paths(CAPs)and of ProblemsTo measure whether credit assignment in a given NN application is of the deep or shallow type,I introduce the concept of Credit Assignment Paths or CAPs,which are chains of possibly causal links between events.Let usﬁrst focus on SL.Consider two events x p and x q(1≤p<q≤T).Depending on the appli-cation,they may have a Potential Direct Causal Connection(PDCC)expressed by the Boolean predicate pdcc(p,q),which is true if and only if p∈in q.Then the2-element list(p,q)is deﬁned to be a CAP from p to q(a minimal one).A learning algorithm may be allowed to change w v(p,q)to improve performance in future episodes.More general,possibly indirect,Potential Causal Connections(PCC)are expressed by the recursively deﬁned Boolean predicate pcc(p,q),which in the SL case is true only if pdcc(p,q),or if pcc(p,k)for some k and pdcc(k,q).In the latter case,appending q to any CAP from p to k yields a CAP from p to q(this is a recursive deﬁnition,too).The set of such CAPs may be large but isﬁnite.Note that the same weight may affect many different PDCCs between successive events listed by a given CAP,e.g.,in the case of RNNs, or weight-sharing FNNs.Suppose a CAP has the form(...,k,t,...,q),where k and t(possibly t=q)are theﬁrst successive elements with modiﬁable w v(k,t).Then the length of the sufﬁx list(t,...,q)is called the CAP’s depth (which is0if there are no modiﬁable links at all).This depth limits how far backwards credit assignment can move down the causal chain toﬁnd a modiﬁable weight.1Suppose an episode and its event sequence x1,...,x T satisfy a computable criterion used to decide whether a given problem has been solved(e.g.,total error E below some threshold).Then the set of used weights is called a solution to the problem,and the depth of the deepest CAP within the sequence is called the solution’s depth.There may be other solutions(yielding different event sequences)with different depths.Given someﬁxed NN topology,the smallest depth of any solution is called the problem’s depth.Sometimes we also speak of the depth of an architecture:SL FNNs withﬁxed topology imply a problem-independent maximal problem depth bounded by the number of non-input layers.Certain SL RNNs withﬁxed weights for all connections except those to output units(Jaeger,2001;Maass et al.,2002; Jaeger,2004;Schrauwen et al.,2007)have a maximal problem depth of1,because only theﬁnal links in the corresponding CAPs are modiﬁable.In general,however,RNNs may learn to solve problems of potentially unlimited depth.Note that the deﬁnitions above are solely based on the depths of causal chains,and agnostic of the temporal distance between events.For example,shallow FNNs perceiving large“time windows”of in-put events may correctly classify long input sequences through appropriate output events,and thus solve shallow problems involving long time lags between relevant events.At which problem depth does Shallow Learning end,and Deep Learning begin?Discussions with DL experts have not yet yielded a conclusive response to this question.Instead of committing myself to a precise answer,let me just deﬁne for the purposes of this overview:problems of depth>10require Very Deep Learning.The difﬁculty of a problem may have little to do with its depth.Some NNs can quickly learn to solve certain deep problems,e.g.,through random weight guessing(Sec.5.9)or other types of direct search (Sec.6.6)or indirect search(Sec.6.7)in weight space,or through training an NNﬁrst on shallow problems whose solutions may then generalize to deep problems,or through collapsing sequences of(non)linear operations into a single(non)linear operation—but see an analysis of non-trivial aspects of deep linear networks(Baldi and Hornik,1994,Section B).In general,however,ﬁnding an NN that precisely models a given training set is an NP-complete problem(Judd,1990;Blum and Rivest,1992),also in the case of deep NNs(S´ıma,1994;de Souto et al.,1999;Windisch,2005);compare a survey of negative results(S´ıma, 2002,Section1).Above we have focused on SL.In the more general case of RL in unknown environments,pcc(p,q) is also true if x p is an output event and x q any later input event—any action may affect the environment and thus any later perception.(In the real world,the environment may even inﬂuence non-input events computed on a physical hardware entangled with the entire universe,but this is ignored here.)It is possible to model and replace such unmodiﬁable environmental PCCs through a part of the NN that has already learned to predict(through some of its units)input events(including reward signals)from former input events and actions(Sec.6.1).Its weights are frozen,but can help to assign credit to other,still modiﬁable weights used to compute actions(Sec.6.1).This approach may lead to very deep CAPs though.Some DL research is about automatically rephrasing problems such that their depth is reduced(Sec.4). In particular,sometimes UL is used to make SL problems less deep,e.g.,Sec.5.10.Often Dynamic Programming(Sec.4.1)is used to facilitate certain traditional RL problems,e.g.,Sec.6.2.Sec.5focuses on CAPs for SL,Sec.6on the more complex case of RL.4Recurring Themes of Deep Learning4.1Dynamic Programming(DP)for DLOne recurring theme of DL is Dynamic Programming(DP)(Bellman,1957),which can help to facili-tate credit assignment under certain assumptions.For example,in SL NNs,backpropagation itself can 1An alternative would be to count only modiﬁable links when measuring depth.In many typical NN applications this would not make a difference,but in some it would,e.g.,Sec.6.1.be viewed as a DP-derived method(Sec.5.5).In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth(Sec.6.2).DP algorithms are also essen-tial for systems that combine concepts of NNs and graphical models,such as Hidden Markov Models (HMMs)(Stratonovich,1960;Baum and Petrie,1966)and Expectation Maximization(EM)(Dempster et al.,1977),e.g.,(Bottou,1991;Bengio,1991;Bourlard and Morgan,1994;Baldi and Chauvin,1996; Jordan and Sejnowski,2001;Bishop,2006;Poon and Domingos,2011;Dahl et al.,2012;Hinton et al., 2012a).4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL Another recurring theme is how UL can facilitate both SL(Sec.5)and RL(Sec.6).UL(Sec.5.6.4) is normally used to encode raw incoming data such as video or speech streams in a form that is more convenient for subsequent goal-directed learning.In particular,codes that describe the original data in a less redundant or more compact way can be fed into SL(Sec.5.10,5.15)or RL machines(Sec.6.4),whose search spaces may thus become smaller(and whose CAPs shallower)than those necessary for dealing with the raw data.UL is closely connected to the topics of regularization and compression(Sec.4.3,5.6.3). 4.3Occam’s Razor:Compression and Minimum Description Length(MDL) Occam’s razor favors simple solutions over complex ones.Given some programming language,the prin-ciple of Minimum Description Length(MDL)can be used to measure the complexity of a solution candi-date by the length of the shortest program that computes it(e.g.,Solomonoff,1964;Kolmogorov,1965b; Chaitin,1966;Wallace and Boulton,1968;Levin,1973a;Rissanen,1986;Blumer et al.,1987;Li and Vit´a nyi,1997;Gr¨u nwald et al.,2005).Some methods explicitly take into account program runtime(Al-lender,1992;Watanabe,1992;Schmidhuber,2002,1995);many consider only programs with constant runtime,written in non-universal programming languages(e.g.,Rissanen,1986;Hinton and van Camp, 1993).In the NN case,the MDL principle suggests that low NN weight complexity corresponds to high NN probability in the Bayesian view(e.g.,MacKay,1992;Buntine and Weigend,1991;De Freitas,2003), and to high generalization performance(e.g.,Baum and Haussler,1989),without overﬁtting the training data.Many methods have been proposed for regularizing NNs,that is,searching for solution-computing, low-complexity SL NNs(Sec.5.6.3)and RL NNs(Sec.6.7).This is closely related to certain UL methods (Sec.4.2,5.6.4).4.4Learning Hierarchical Representations Through Deep SL,UL,RLMany methods of Good Old-Fashioned Artiﬁcial Intelligence(GOFAI)(Nilsson,1980)as well as more recent approaches to AI(Russell et al.,1995)and Machine Learning(Mitchell,1997)learn hierarchies of more and more abstract data representations.For example,certain methods of syntactic pattern recog-nition(Fu,1977)such as grammar induction discover hierarchies of formal rules to model observations. The partially(un)supervised Automated Mathematician/EURISKO(Lenat,1983;Lenat and Brown,1984) continually learns concepts by combining previously learnt concepts.Such hierarchical representation learning(Ring,1994;Bengio et al.,2013;Deng and Yu,2014)is also a recurring theme of DL NNs for SL (Sec.5),UL-aided SL(Sec.5.7,5.10,5.15),and hierarchical RL(Sec.6.5).Often,abstract hierarchical representations are natural by-products of data compression(Sec.4.3),e.g.,Sec.5.10.4.5Fast Graphics Processing Units(GPUs)for DL in NNsWhile the previous millennium saw several attempts at creating fast NN-speciﬁc hardware(e.g.,Jackel et al.,1990;Faggin,1992;Ramacher et al.,1993;Widrow et al.,1994;Heemskerk,1995;Korkin et al., 1997;Urlbe,1999),and at exploiting standard hardware(e.g.,Anguita et al.,1994;Muller et al.,1995; Anguita and Gomes,1996),the new millennium brought a DL breakthrough in form of cheap,multi-processor graphics cards or GPUs.GPUs are widely used for video games,a huge and competitive market that has driven down hardware prices.GPUs excel at fast matrix and vector multiplications required not only for convincing virtual realities but also for NN training,where they can speed up learning by a factorof50and more.Some of the GPU-based FNN implementations(Sec.5.16-5.19)have greatly contributed to recent successes in contests for pattern recognition(Sec.5.19-5.22),image segmentation(Sec.5.21), and object detection(Sec.5.21-5.22).5Supervised NNs,Some Helped by Unsupervised NNsThe main focus of current practical applications is on Supervised Learning(SL),which has dominated re-cent pattern recognition contests(Sec.5.17-5.22).Several methods,however,use additional Unsupervised Learning(UL)to facilitate SL(Sec.5.7,5.10,5.15).It does make sense to treat SL and UL in the same section:often gradient-based methods,such as BP(Sec.5.5.1),are used to optimize objective functions of both UL and SL,and the boundary between SL and UL may blur,for example,when it comes to time series prediction and sequence classiﬁcation,e.g.,Sec.5.10,5.12.A historical timeline format will help to arrange subsections on important inspirations and techni-cal contributions(although such a subsection may span a time interval of many years).Sec.5.1brieﬂy mentions early,shallow NN models since the1940s,Sec.5.2additional early neurobiological inspiration relevant for modern Deep Learning(DL).Sec.5.3is about GMDH networks(since1965),perhaps theﬁrst (feedforward)DL systems.Sec.5.4is about the relatively deep Neocognitron NN(1979)which is similar to certain modern deep FNN architectures,as it combines convolutional NNs(CNNs),weight pattern repli-cation,and winner-take-all(WTA)mechanisms.Sec.5.5uses the notation of Sec.2to compactly describe a central algorithm of DL,namely,backpropagation(BP)for supervised weight-sharing FNNs and RNNs. It also summarizes the history of BP1960-1981and beyond.Sec.5.6describes problems encountered in the late1980s with BP for deep NNs,and mentions several ideas from the previous millennium to overcome them.Sec.5.7discusses aﬁrst hierarchical stack of coupled UL-based Autoencoders(AEs)—this concept resurfaced in the new millennium(Sec.5.15).Sec.5.8is about applying BP to CNNs,which is important for today’s DL applications.Sec.5.9explains BP’s Fundamental DL Problem(of vanishing/exploding gradients)discovered in1991.Sec.5.10explains how a deep RNN stack of1991(the History Compressor) pre-trained by UL helped to solve previously unlearnable DL benchmarks requiring Credit Assignment Paths(CAPs,Sec.3)of depth1000and more.Sec.5.11discusses a particular WTA method called Max-Pooling(MP)important in today’s DL FNNs.Sec.5.12mentions aﬁrst important contest won by SL NNs in1994.Sec.5.13describes a purely supervised DL RNN(Long Short-Term Memory,LSTM)for problems of depth1000and more.Sec.5.14mentions an early contest of2003won by an ensemble of shallow NNs, as well as good pattern recognition results with CNNs and LSTM RNNs(2003).Sec.5.15is mostly about Deep Belief Networks(DBNs,2006)and related stacks of Autoencoders(AEs,Sec.5.7)pre-trained by UL to facilitate BP-based SL.Sec.5.16mentions theﬁrst BP-trained MPCNNs(2007)and GPU-CNNs(2006). Sec.5.17-5.22focus on ofﬁcial competitions with secret test sets won by(mostly purely supervised)DL NNs since2009,in sequence recognition,image classiﬁcation,image segmentation,and object detection. Many RNN results depended on LSTM(Sec.5.13);many FNN results depended on GPU-based FNN code developed since2004(Sec.5.16,5.17,5.18,5.19),in particular,GPU-MPCNNs(Sec.5.19).5.11940s and EarlierNN research started in the1940s(e.g.,McCulloch and Pitts,1943;Hebb,1949);compare also later work on learning NNs(Rosenblatt,1958,1962;Widrow and Hoff,1962;Grossberg,1969;Kohonen,1972; von der Malsburg,1973;Narendra and Thathatchar,1974;Willshaw and von der Malsburg,1976;Palm, 1980;Hopﬁeld,1982).In a sense NNs have been around even longer,since early supervised NNs were essentially variants of linear regression methods going back at least to the early1800s(e.g.,Legendre, 1805;Gauss,1809,1821).Early NNs had a maximal CAP depth of1(Sec.3).5.2Around1960:More Neurobiological Inspiration for DLSimple cells and complex cells were found in the cat’s visual cortex(e.g.,Hubel and Wiesel,1962;Wiesel and Hubel,1959).These cellsﬁre in response to certain properties of visual sensory inputs,such as theorientation of plex cells exhibit more spatial invariance than simple cells.This inspired later deep NN architectures(Sec.5.4)used in certain modern award-winning Deep Learners(Sec.5.19-5.22).5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) Networks trained by the Group Method of Data Handling(GMDH)(Ivakhnenko and Lapa,1965; Ivakhnenko et al.,1967;Ivakhnenko,1968,1971)were perhaps theﬁrst DL systems of the Feedforward Multilayer Perceptron type.The units of GMDH nets may have polynomial activation functions imple-menting Kolmogorov-Gabor polynomials(more general than traditional NN activation functions).Given a training set,layers are incrementally grown and trained by regression analysis,then pruned with the help of a separate validation set(using today’s terminology),where Decision Regularisation is used to weed out superﬂuous units.The numbers of layers and units per layer can be learned in problem-dependent fashion. This is a good example of hierarchical representation learning(Sec.4.4).There have been numerous ap-plications of GMDH-style networks,e.g.(Ikeda et al.,1976;Farlow,1984;Madala and Ivakhnenko,1994; Ivakhnenko,1995;Kondo,1998;Kord´ık et al.,2003;Witczak et al.,2006;Kondo and Ueno,2008).5.41979:Convolution+Weight Replication+Winner-Take-All(WTA)Apart from deep GMDH networks(Sec.5.3),the Neocognitron(Fukushima,1979,1980,2013a)was per-haps theﬁrst artiﬁcial NN that deserved the attribute deep,and theﬁrst to incorporate the neurophysiolog-ical insights of Sec.5.2.It introduced convolutional NNs(today often called CNNs or convnets),where the(typically rectangular)receptiveﬁeld of a convolutional unit with given weight vector is shifted step by step across a2-dimensional array of input values,such as the pixels of an image.The resulting2D array of subsequent activation events of this unit can then provide inputs to higher-level units,and so on.Due to massive weight replication(Sec.2),relatively few parameters may be necessary to describe the behavior of such a convolutional layer.Competition layers have WTA subsets whose maximally active units are the only ones to adopt non-zero activation values.They essentially“down-sample”the competition layer’s input.This helps to create units whose responses are insensitive to small image shifts(compare Sec.5.2).The Neocognitron is very similar to the architecture of modern,contest-winning,purely super-vised,feedforward,gradient-based Deep Learners with alternating convolutional and competition lay-ers(e.g.,Sec.5.19-5.22).Fukushima,however,did not set the weights by supervised backpropagation (Sec.5.5,5.8),but by local un supervised learning rules(e.g.,Fukushima,2013b),or by pre-wiring.In that sense he did not care for the DL problem(Sec.5.9),although his architecture was comparatively deep indeed.He also used Spatial Averaging(Fukushima,1980,2011)instead of Max-Pooling(MP,Sec.5.11), currently a particularly convenient and popular WTA mechanism.Today’s CNN-based DL machines proﬁta lot from later CNN work(e.g.,LeCun et al.,1989;Ranzato et al.,2007)(Sec.5.8,5.16,5.19).5.51960-1981and Beyond:Development of Backpropagation(BP)for NNsThe minimisation of errors through gradient descent(Hadamard,1908)in the parameter space of com-plex,nonlinear,differentiable,multi-stage,NN-related systems has been discussed at least since the early 1960s(e.g.,Kelley,1960;Bryson,1961;Bryson and Denham,1961;Pontryagin et al.,1961;Dreyfus,1962; Wilkinson,1965;Amari,1967;Bryson and Ho,1969;Director and Rohrer,1969;Griewank,2012),ini-tially within the framework of Euler-LaGrange equations in the Calculus of Variations(e.g.,Euler,1744). Steepest descent in such systems can be performed(Bryson,1961;Kelley,1960;Bryson and Ho,1969)by iterating the ancient chain rule(Leibniz,1676;L’Hˆo pital,1696)in Dynamic Programming(DP)style(Bell-man,1957).A simpliﬁed derivation of the method uses the chain rule only(Dreyfus,1962).The methods of the1960s were already efﬁcient in the DP sense.However,they backpropagated derivative information through standard Jacobian matrix calculations from one“layer”to the previous one, explicitly addressing neither direct links across several layers nor potential additional efﬁciency gains due to network sparsity(but perhaps such enhancements seemed obvious to the authors).。

高斯朴素贝叶斯训练集精确度的英语

高斯朴素贝叶斯训练集精确度的英语Gaussian Naive Bayes (GNB) is a popular machine learning algorithm used for classification tasks. It is particularly well-suited for text classification, spam filtering, and recommendation systems. However, like any other machine learning algorithm, GNB's performance heavily relies on the quality of the training data. In this essay, we will delve into the factors that affect the training set accuracy of Gaussian Naive Bayes and explore potential solutions to improve its performance.One of the key factors that influence the training set accuracy of GNB is the quality and quantity of the training data. In order for the algorithm to make accurate predictions, it needs to be trained on a diverse and representative dataset. If the training set is too small or biased, the model may not generalize well to new, unseen data. This can result in low training set accuracy and poor performance in real-world applications. Therefore, it is crucial to ensure that the training data is comprehensive and well-balanced across different classes.Another factor that can impact the training set accuracy of GNB is the presence of irrelevant or noisy features in the dataset. When the input features contain irrelevant information or noise, it can hinder the algorithm's ability to identify meaningful patterns and make accurate predictions. To address this issue, feature selection and feature engineering techniques can be employed to filter out irrelevant features and enhance the discriminative power of the model. Byselecting the most informative features and transforming them appropriately, we can improve the training set accuracy of GNB.Furthermore, the assumption of feature independence in Gaussian Naive Bayes can also affect its training set accuracy. Although the 'naive' assumption of feature independence simplifies the model and makes it computationally efficient, it may not hold true in real-world datasets where features are often correlated. When features are not independent, it can lead to biased probability estimates and suboptimal performance. To mitigate this issue, techniques such as feature extraction and dimensionality reduction can be employed to decorrelate the input features and improve the training set accuracy of GNB.In addition to the aforementioned factors, the choice of hyperparameters and model tuning can also impact the training set accuracy of GNB. Hyperparameters such as the smoothing parameter (alpha) and the covariance type in the Gaussian distribution can significantly influence the model's performance. Therefore, it is important to carefully tune these hyperparameters through cross-validation andgrid search to optimize the training set accuracy of GNB. By selecting the appropriate hyperparameters, we can ensure that the model is well-calibrated and achieves high accuracy on the training set.Despite the challenges and limitations associated with GNB, there are several strategies that can be employed to improve its training set accuracy. By curating a high-quality training dataset, performing feature selection and engineering, addressing feature independence assumptions, and tuning model hyperparameters, we can enhance the performance of GNB and achieve higher training set accuracy. Furthermore, it is important to continuously evaluate and validate the model on unseen data to ensure that it generalizes well and performs robustly in real-world scenarios. By addressing these factors and adopting best practices in model training and evaluation, we can maximize the training set accuracy of Gaussian Naive Bayes and unleash its full potential in various applications.。

一种改进的高斯频率域压缩感知稀疏反演方法(英文)

AbstractCompressive sensing and sparse inversion methods have gained a significant amount of attention in recent years due to their capability to accurately reconstruct signals from measurements with significantly less data than previously possible. In this paper, a modified Gaussian frequency domain compressive sensing and sparse inversion method is proposed, which leverages the proven strengths of the traditional method to enhance its accuracy and performance. Simulation results demonstrate that the proposed method can achieve a higher signal-to- noise ratio and a better reconstruction quality than its traditional counterpart, while also reducing the computational complexity of the inversion procedure.IntroductionCompressive sensing (CS) is an emerging field that has garnered significant interest in recent years because it leverages the sparsity of signals to reduce the number of measurements required to accurately reconstruct the signal. This has many advantages over traditional signal processing methods, including faster data acquisition times, reduced power consumption, and lower data storage requirements. CS has been successfully applied to a wide range of fields, including medical imaging, wireless communications, and surveillance.One of the most commonly used methods in compressive sensing is the Gaussian frequency domain compressive sensing and sparse inversion (GFD-CS) method. In this method, compressive measurements are acquired by multiplying the original signal with a randomly generated sensing matrix. The measurements are then transformed into the frequency domain using the Fourier transform, and the sparse signal is reconstructed using a sparsity promoting algorithm.In recent years, researchers have made numerous improvementsto the GFD-CS method, with the goal of improving its reconstruction accuracy, reducing its computational complexity, and enhancing its robustness to noise. In this paper, we propose a modified GFD-CS method that combines several techniques to achieve these objectives.Proposed MethodThe proposed method builds upon the well-established GFD-CS method, with several key modifications. The first modification is the use of a hierarchical sparsity-promoting algorithm, which promotes sparsity at both the signal level and the transform level. This is achieved by applying the hierarchical thresholding technique to the coefficients corresponding to the higher frequency components of the transformed signal.The second modification is the use of a novel error feedback mechanism, which reduces the impact of measurement noise on the reconstructed signal. Specifically, the proposed method utilizes an iterative algorithm that updates the measurement error based on the difference between the reconstructed signal and the measured signal. This feedback mechanism effectively increases the signal-to-noise ratio of the reconstructed signal, improving its accuracy and robustness to noise.The third modification is the use of a low-rank approximation method, which reduces the computational complexity of the inversion algorithm while maintaining reconstruction accuracy. This is achieved by decomposing the sensing matrix into a product of two lower dimensional matrices, which can be subsequently inverted using a more efficient algorithm.Simulation ResultsTo evaluate the effectiveness of the proposed method, we conducted simulations using synthetic data sets. Three different signal types were considered: a sinusoidal signal, a pulse signal, and an image signal. The results of the simulations were compared to those obtained using the traditional GFD-CS method.The simulation results demonstrate that the proposed method outperforms the traditional GFD-CS method in terms of signal-to-noise ratio and reconstruction quality. Specifically, the proposed method achieves a higher signal-to-noise ratio and lower mean squared error for all three types of signals considered. Furthermore, the proposed method achieves these results with a reduced computational complexity compared to the traditional method.ConclusionThe results of our simulations demonstrate the effectiveness of the proposed method in enhancing the accuracy and performance of the GFD-CS method. The combination of sparsity promotion, error feedback, and low-rank approximation techniques significantly improves the signal-to-noise ratio and reconstruction quality, while reducing thecomputational complexity of the inversion procedure. Our proposed method has potential applications in a wide range of fields, including medical imaging, wireless communications, and surveillance.。

纹理物体缺陷的视觉检测算法研究--优秀毕业论文

摘要
在竞争激烈的工业自动化生产过程中，机器视觉对产品质量的把关起着举足轻重的作用，机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检测技术相比，自动化的视觉检测系统更加经济、快捷、高效与安全。纹理物体在工业生产中广泛存在，像用于半导体装配和封装底板和发光二极管，现代化电子系统中的印制电路板，以及纺织行业中的布匹和织物等都可认为是含有纹理特征的物体。本论文主要致力于纹理物体的缺陷检测技术研究，为纹理物体的自动化检测提供高效而可靠的检测算法。纹理是描述图像内容的重要特征，纹理分析也已经被成功的应用与纹理分割和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检测算法。这种算法能容忍物体变形引起的图像配准误差，对纹理的影响也具有鲁棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义，如缺陷区域的大小、形状、亮度对比度及空间分布等。同时，在参考图像可行的情况下，本算法可用于同质纹理物体和非同质纹理物体的检测，对非纹理物体的检测也可取得不错的效果。在整个检测过程中，我们采用了可调控金字塔的纹理分析和重构技术。与传统的小波纹理分析技术不同，我们在小波域中加入处理物体变形和纹理影响的容忍度控制算法，来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段，我们检测了一系列具有实际应用价值的图像。实验结果表明本文提出的纹理物体缺陷检测算法具有高效性和易于实现性。关键字: 缺陷检测；纹理；物体变形；可调控金字塔；重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II

电子相关英语词汇H-S

HA Global Positioning Satellite 全球定位卫星HB Global Positioning System 全球定位系统HC Global System for Mobile Communication 全球移动通信系统HCT General Video File Server 通用视频文件服务器HD Head Amplifier 前置放大器Head Bus 前端总线HDM Hierarchical Coding 分层编码HDTV Home Communication Terminal 家庭通信终端HDVS High Definition 高清晰度HF Horizontal Drive 水平驱动(脉冲)HFC High Density Modulation 高密度调制HFCT High Definition Television 高清晰度电视HIS High Definition Video System 高清晰度视频系统Hi-Fi High Frequency 高频HPA Hybrid Fiber Coaxial 光纤同轴电缆混合网HPF HQAD Hybrid Fiber Concentric Twisted Pair Wire 混合光纤同轴双绞线HS Home Information System 家庭信息系统HSC High-Fidelity 高保真(度)High Power Amplifier 大功率放大器HSDB High-Pass Filter 高通滤波器HT High Quality Audio Disc 高品位音频光盘HTT HTTP Horizon Scanner 水平扫描HTU High Speed Camera System 高速摄像机系统IA High Speed Channel 高速信道IB High Speed Data Broadcast 高速数据广播High Tension 高压Home Television Theatre 家庭电视影院IBC Hyper Text Transmission Protocol 超文本传输协议Home Terminal Unit 家庭终端单元Information Access 信息存取IBG International Broadcasting 国际广播IC Interface Bus 接口总线IDCT Internal Bus 内部总线IF Integrated Broadband Communication 综合宽带通信IM International Broadcasting Center 国际广播中心IMTV International Broadcasting Convention (欧洲)国际广播会议IN Inter Block Gap 字组间隔INFO INS Integrated Circuit 集成电路IOCS Inverse Discrete Cosine Transform 离散余弦逆变换IOD Intermediate Frequency 中频IP Interface Module 接口模块Interactive Multimedia Television 交互式多媒体电视IPC Integrated Network 综合网IPD Integrated Network Using Fiber Optics 光纤综合网IPTC Information Network System 信息网络系统IRD Input-Output Control System 输入/输出控制系统IS Information On Demand 点播信息Input Power 输入功率Internet Protocol 因特网协议ISA Information Processing Center 信息处理中心ISAN Interactive Program Directory 交互式节目指南International Press Telecommunication Council 国际新闻通信委员会ISO ISRC Integrated Receiver/Decoder 综合接收机/解码器ISSI Information Superhighway 信息高速公路IT Interactive Service 交互业务ITS International Standard 国际标准Industry Standard Architecture 工业标准总线Integrated Service Analog Network 综合业务模拟网ITU International Standard Audiovisual Number 国际标准音视频编号ITV International Standards Organization 国际标准化组织International Standard Recording Code 国际标准记录码IU Inter-Switching System Interface 交换机间系统接口IVCS Interline Transfer 行间转移IVDS Insertion Test Signal 插入测试信号IVOD Intelligent Traffic System 智能交通系统IVS International Telecommunication Service 国际电信业务JB International Telecommunications Union 国际电信联盟JCTA Industrial Television 工业电视JPEG Interactive Television 交互式电视JSB Information Unit 信息单元KB Intelligent Video Conferencing System 智能视频会议系统LAN Interactive Video Data Service 交互视频数据业务LBC Interactive Video On Demand 交互点播电视LC Interactive Video System 交互视频系统LCD Junction Box 接线盒Japan Cable Television Association 日本有线电视协会LD Joint Photographic Experts Group 联合图片专家组LDTV Japan Satellite Broadcasting Inc 日本广播卫星公司IED Keyboard 键盘LF Local Area Network 局域网LFE Low Bit-rate Coding 低码率编码LFO Lossless Coding 无损编码LI Liquid Crystal Display 液晶显示器LMDS Light Coupled Device 光耦合器件LNA Laser Diode 激光二极管LO Low Definition Television 低分辨率数字电视LPF Light-Emitting Diode 发光二极管LRC Low Frequency 低频LS Low Frequency Response 低频响应LSD Low Frequency Oscillator 低频振荡器LSI Level Indicator 电平指示器LSN Local Microwave Distribution System 本地微波分配系统LTC Low Noise Amplifier 低噪声放大器LVD Local Oscillator 本地振荡器LVR Low Pass Filter 低通滤波器Longitudinal Redundancy Checking 纵向冗余校验Light Source 光源MAC Large Screen Display 大屏幕显示器MAN Large Scale Integrated Circuit 大规模集成电路MAPI Local Supervision Network 本地监测网MATV Longitudinal Time Code 纵向时间码MC Laser Vision Disc 激光电视唱片Laser Video Recording System 激光视盘录制系统Multiplexed Analog Components 复用模拟分量Metropolitan Area Network 都市网MCI Multimedia Application Programming Interface 多媒体应用编程接口MCPC Master Antenna Television 共用天线电视MCR Main Control 主控Media Composer 非线性媒体编辑系统MD Motion Compensation 运动补偿MDM Multimedia Communication 多媒体通信MDOP Media Control Interface 媒体控制接口MF Multi-Channel Per Carrier 多路单载波MIC Master Control Room 主控制室MIDI Mobile Control Room 转播车，移动控制室MMDS Magnetic Drum 磁鼓MODEM Multimedia Data Management 多媒体数据管理MOL Multimedia Data Operation Platform 多媒体数据操作平台MON Medium Frequency 中频MPC Microphone 传声器，话筒MPEG Musical Instrument Digital Interface 乐器数字接口MPO Multi-Channel Microwave Distribution System 微波多点分配系统Modulator And Demodulator 调制解调器MR Maximum Output Level 最大输出电平MSC Monitor 监视器，监听器。

目前主流的几种数字视频压缩编解码标准（转载）

⽬前主流的⼏种数字视频压缩编解码标准（转载）上⼀篇主要讲了H.264，接下来我们看⼀下其他编解码标准。

参看：参看：参看：JPEG联合图⽚专家组（JPEG，Joint Photographic Experts Group）是作为国际标准化组织（ISO）与电报电话国际协会（CCITT，国际电信联盟ITU的前⾝）的联合⼯作委员会于1987年成⽴的，于1988年成⽴JBIG（Joint Bi-level Image Experts Group），现在同属ISO/IECJTC1/SC29 WG1(ITU-T SG8)，专门致⼒于静⽌图⽚（still images）压缩。

JPEG已开发三个图像标准。

第⼀个直接称为JPEG标准，正式名称叫“连续⾊调静⽌图像的数字压缩编码”（Digital Compression and Coding of Continuous-tone still Images）, 1992年正式通过。

JPEG开发的第⼆个标准是JPEG-LS（ISO/IEC 14495, 1999）。

JPEG-LS仍然是静⽌图像⽆损编码，能提供接近有损压缩压缩率。

JPEG 的最新标准是JPEG 2000（ISO/IEC 15444, 等同的ITU-T编号T.800），于1999年3⽉形成⼯作草案，2000年底成为正式标准（第⼀部分）。

根据JPEG专家组的⽬标，该标准将不仅能提⾼对图像的压缩质量，尤其是低码率时的压缩质量，⽽且还将得到许多新功能，包括根据图像质量，视觉感受和分辨率进⾏渐进传输，对码流的随机存取和处理，开放结构，向下兼容等。

JPEG标准制定了四种⼯作模式：（1）顺序的基于DCT（Sequential DCT-based ）模式,由DCT（离散余弦变换）系数的形成、量化和熵编码三步组成。

从左到右，从上到下扫描信号，为每个图像编码。

（2）累进的基于DCT（Progressive DCT-based）模式,⽣成DCT系数和量化中的关键步骤与基本顺序编码解码器相同。

数字信号处理英语词汇

AAbsolutely integrable绝对可积Absolutely integrable impulse response绝对可积冲激响应Absolutely summable绝对可和Absolutely summable impulse response绝对可和冲激响应Accumulator累加器Acoustic 声学Adder加法器Additivity property可加性Aliasing混叠现象All-pass systems全通系统AM (Amplitude modulation )幅度调制Amplifier放大器Amplitude modulation (AM)幅度调制Amplitude-scaling factor幅度放大因子Analog-to-digital (A-to-D) converter模数转换器Analysis equation分析公式（方程）Angel (phase) of complex number复数的角度（相位）Angle criterion角判据Angle modulation角度调制Anticausality反因果Aperiodic非周期Aperiodic convolution非周期卷积Aperiodic signal非周期信号Asynchronous异步的Audio systems音频（声音）系统Autocorrelation functions自相关函数Automobile suspension system汽车减震系统Averaging system平滑系统BBand-limited带（宽）限的Band-limited input signals带限输入信号Band-limited interpolation带限内插Bandpass filters带通滤波器Bandpass signal带通信号Bandpass-sampling techniques带通采样技术Bandwidth带宽Bartlett (triangular) window巴特利特（三角形）窗Bilateral Laplace transform双边拉普拉斯变换Bilinear双线性的Bilinear transformation双线性变换Bit（二进制）位，比特Block diagrams方框图Bode plots波特图Bounded有界限的Break frequency折转频率Butterworth filters巴特沃斯滤波器C“Chirp” transform algorithm“鸟声”变换算法Capacitor电容器Carrier载波Carrier frequency载波频率Carrier signal载波信号Cartesian (rectangular) form 直角坐标形式Cascade (series) interconnection串联，级联Cascade-form串联形式Causal LTI system因果的线性时不变系统Channel信道，频道Channel equalization信道均衡Chopper amplifier斩波器放大器Closed-loop闭环Closed-loop poles闭环极点Closed-loop system闭环系统Closed-loop system function闭环系统函数Coefficient multiplier系数乘法器Coefficients系数Communications systems通信系统Commutative property交换性（交换律）Compensation for nonideal elements非理想元件的补偿Complex conjugate复数共轭Complex exponential carrier复指数载波Complex exponential signals复指数信号Complex exponential(s)复指数Complex numbers 复数Conditionally stable systems条件稳定系统Conjugate symmetry共轭对称Conjugation property共轭性质Continuous-time delay连续时间延迟Continuous-time filter连续时间滤波器Continuous-time Fourier series连续时间傅立叶级数Continuous-time Fourier transform连续时间傅立叶变换Continuous-time signals连续时间信号Continuous-time systems连续时间系统Continuous-to-discrete-time conversion连续时间到离散时间转换Convergence 收敛Convolution卷积Convolution integral卷积积分Convolution property卷积性质Convolution sum卷积和Correlation function相关函数Critically damped systems临界阻尼系统Crosss-correlation functions互相关函数Cutoff frequencies截至频率DDamped sinusoids阻尼正弦振荡Damping ratio阻尼系数Dc offset直流偏移Dc sequence直流序列Deadbeat feedback systems临界阻尼反馈系统Decibels (dB) 分贝Decimation抽取Decimation and interpolation抽取和内插Degenerative (negative) feedback负反馈Delay延迟Delay time延迟时间Demodulation解调Difference equations差分方程Differencing property差分性质Differential equations微分方程Differentiating filters微分滤波器Differentiation property微分性质Differentiator微分器Digital-to-analog (D-to-A) converter数模转换器Direct Form I realization直接I型实现Direct form II realization直接II型实现Direct-form直接型Dirichlet conditions狄里赫利条件Dirichlet, P.L.狄里赫利Discontinuities间断点，不连续Discrete-time filters 离散时间滤波器Discrete-time Fourier series离散时间傅立叶级数Discrete-time Fourier series pair离散时间傅立叶级数对Discrete-time Fourier transform （DFT）离散时间傅立叶变换Discrete-time LTI filters离散时间线性时不变滤波器Discrete-time modulation离散时间调制Discrete-time nonrecursive filters离散时间非递归滤波器Discrete-time signals离散时间信号Discrete-time systems离散时间系统Discrete-time to continuous-time conversion离散时间到连续时间转换Dispersion弥撒（现象）Distortion扭曲，失真Distribution theory（property）分配律Dominant time constant主时间常数Double-sideband modulation (DSB)双边带调制Downsampling减采样Duality对偶性EEcho回波Eigenfunctions特征函数Eigenvalue特征值Elliptic filters椭圆滤波器Encirclement property围线性质End points终点Energy of signals信号的能量Energy-density spectrum能量密度谱Envelope detector包络检波器Envelope function包络函数Equalization均衡化Equalizer circuits均衡器电路Equation for closed-loop poles闭环极点方程Euler, L.欧拉Euler’s relation欧拉关系（公式）Even signals偶信号Exponential signals指数信号Exponentials指数FFast Fourier transform (FFT)快速傅立叶变换Feedback反馈Feedback interconnection反馈联结Feedback path反馈路径Filter(s)滤波器Final-value theorem终值定理Finite impulse response (FIR)有限长脉冲响应Finite impulse response (FIR) filters有限长脉冲响应滤波器Finite sum formula有限项和公式Finite-duration signals有限长信号First difference一阶差分First harmonic components基波分量（一次谐波分量）First-order continuous-time systems一阶连续时间系统First-order discrete-time systems一阶离散时间系统First-order recursive discrete-time filters一阶递归离散时间滤波器First-order systems一阶系统Forced response受迫响应Forward path正向通路Fourier series傅立叶级数Fourier transform傅立叶变换Fourier transform pairs傅立叶变换对Fourier, Jean Baptiste Joseph傅立叶（法国数学家，物理学家）Frequency response频率响应Frequency response of LTI systems线性时不变系统的频率响应Frequency scaling of continuous-time Fourier transform 连续时间傅立叶变化的频率尺度（变换性质）Frequency shift keying (FSK)频移键控Frequency shifting property频移性质Frequency-division multiplexing (FDM)频分多路复用Frequency-domain characterization频域特征Frequency-selective filter频率选择滤波器Frequency-shaping filters频率成型滤波器Fundamental components基波分量Fundamental frequency基波频率Fundamental period基波周期GGain增益Gain and phase margin增益和相位裕度General complex exponentials一般复指数信号Generalized functions广义函数Gibbs phenomenon吉伯斯现象Group delay群延迟HHalf-sample delay半采样间隔时延Hanning window汉宁窗Harmonic analyzer谐波分析议Harmonic components谐波分量Harmonically related谐波关系Heat propagation and diffusion热传播和扩散现象Higher order holds高阶保持Highpass filter高通滤波器Highpass-to-lowpass transformations高通到低通变换Hilbert transform希尔波特滤波器Homogeneity (scaling) property齐次性（比例性）IIdeal理想的Ideal bandstop characteristic理想带阻特征Ideal frequency-selective filter理想频率选择滤波器Idealization理想化Identity system恒等系统Imaginary part虚部Impulse response 冲激响应Impulse train冲激串Incrementally linear systems增量线性系统Independent variable独立变量Infinite impulse response (IIR)无限长脉冲响应Infinite impulse response (IIR) filters无限长脉冲响应滤波器Infinite sum formula无限项和公式Infinite taylor series无限项泰勒级数Initial-value theorem初值定理Inpulse-train sampling冲激串采样Instantaneous瞬时的Instantaneous frequency瞬时频率Integration in time-domain时域积分Integration property积分性质Integrator积分器Interconnection互联Intermediate-frequency (IF) stage中频级Intersymbol interference (ISI)码间干扰Inverse Fourier transform傅立叶反变换Inverse Laplace transform拉普拉斯反变换Inverse LTI system逆线性时不变系统Inverse system design逆系统设计Inverse z-transform z反变换Inverted pendulum倒立摆Invertibility of LTI systems线性时不变系统的可逆性Invertible systems逆系统LLag network滞后网络Lagrange, J.L.拉格朗日（法国数学家，力学家）Laplace transform拉普拉斯变换Laplace, P.S. de拉普拉斯（法国天文学家，数学家）lead network超前网络left-half plane左半平面left-sided signal左边信号Linear线性Linear constant-coefficient difference线性常系数差分方程equationsLinear constant-coefficient differential线性常系数微分方程equationsLinear feedback systems线性反馈系统Linear interpolation线性插值Linearity线性性Log magnitude-phase diagram对数幅－相图Log-magnitude plots对数模图Lossless coding无损失码Lowpass filters低通滤波器Lowpass-to-highpass transformation低通到高通的转换LTI system response线性时不变系统响应LTI systems analysis线性时不变系统分析MMagnitude and phase幅度和相位Matched filter匹配滤波器Measuring devices测量仪器Memory记忆Memoryless systems无记忆系统Modulating signal调制信号Modulation调制Modulation index调制指数Modulation property调制性质Moving-average filters移动平均滤波器Multiplexing多路技术Multiplication property相乘性质Multiplicities多样性NNarrowband窄带Narrowband frequency modulation窄带频率调制Natural frequency自然响应频率Natural response自然响应Negative (degenerative) feedback负反馈Nonanticipatibe system不超前系统Noncausal averaging system非因果平滑系统Nonideal非理想的Nonideal filters非理想滤波器Nonmalized functions归一化函数Nonrecursive非递归Nonrecursive filters非递归滤波器Nonrecursive linear constant-coefficient非递归线性常系数差分方程difference equationsNyquist frequency奈奎斯特频率Nyquist rate奈奎斯特率Nyquist stability criterion奈奎斯特稳定性判据OOdd harmonic 奇次谐波Odd signal奇信号Open-loop开环Open-loop frequency response开环频率响应Open-loop system开环系统Operational amplifier运算放大器Orthogonal functions正交函数Orthogonal signals正交信号Oscilloscope示波器Overdamped system过阻尼系统Oversampling过采样Overshoot超量PParallel interconnection并联Parallel-form block diagrams并联型框图Parity check奇偶校验检查Parseval’s relation帕斯伐尔关系（定理）Partial-fraction expansion部分分式展开Particular and homogeneous solution特解和齐次解Passband通频带Passband edge通带边缘Passband frequency通带频率Passband ripple通带起伏（或波纹）Pendulum钟摆Percent modulation调制百分数Periodic周期的Periodic complex exponentials周期复指数Periodic convolution周期卷积Periodic signals周期信号Periodic square wave周期方波Periodic square-wave modulating signal周期方波调制信号Periodic train of impulses周期冲激串Phase (angle) of complex number复数相位（角度）Phase lag相位滞后Phase lead相位超前Phase margin相位裕度Phase shift相移Phase-reversal相位倒置Phase modulation相位调制Plant工厂Polar form极坐标形式Poles极点Pole-zero plot(s)零极点图Polynomials 多项式Positive (regenerative) feedback正（再生）反馈Power of signals信号功率Power-series expansion method幂级数展开的方法Principal-phase function主值相位函数Proportional (P) control比例控制Proportional feedback system比例反馈系统Proportional-plus-derivative比例加积分Proportional-plus-derivative feedback比例加积分反馈Proportional-plus-integral-plus-different比例－积分－微分控制ial (PID) controlPulse-amplitude modulation脉冲幅度调制Pulse-code modulation脉冲编码调制Pulse-train carrier冲激串载波QQuadrature distortion正交失真Quadrature multiplexing正交多路复用Quality of circuit电路品质（因数）RRaised consine frequency response升余弦频率响应Rational frequency responses有理型频率响应Rational transform有理变换RC highpass filter RC 高阶滤波器RC lowpass filter RC 低阶滤波器Real实数Real exponential signals实指数信号Real part实部Rectangular (Cartesian) form 直角（卡笛儿）坐标形式Rectangular pulse矩形脉冲Rectangular pulse signal矩形脉冲信号Rectangular window矩形窗口Recursive (infinite impulse response)递归（无时限脉冲响应）滤波器filtersRecursive linear constant-coefficient 递归的线性常系数差分方程difference equationsRegenerative (positive) feedback再生（正）反馈Region of comvergence收敛域right-sided signal右边信号Rise time上升时间Root-locus analysis根轨迹分析（方法）Running sum动求和SS domain S域Sampled-data feedback systems采样数据反馈系统Sampled-data systems采样数据系统Sampling采样Sampling frequency采样频率Sampling function采样函数Sampling oscilloscope采样示波器Sampling period采样周期Sampling theorem采样定理Scaling (homogeneity) property比例性（齐次性）性质Scaling in z domain z域尺度变换Scrambler扰频器Second harmonic components二次谐波分量Second-order二阶Second-order continuous-time system二阶连续时间系统Second-order discrete-time system二阶离散时间系统Second-order systems二阶系统sequence序列Series (cascade) interconnection级联（串联）Sifting property筛选性质Sinc functions sinc函数Single-sideband单边带Single-sideband sinusoidal amplitude单边带正弦幅度调制modulationSingularity functions奇异函数Sinusoidal正弦（信号）Sinusoidal amplitude modulation正弦幅度调制Sinusoidal carrier正弦载波Sinusoidal frequency modulation正弦频率调制Sliding滑动Spectral coefficient频谱系数Spectrum频谱Speech scrambler语音加密器S-plane S平面Square wave方波Stability稳定性Stabilization of unstable systems不稳定系统的稳定性（度）Step response阶跃响应Step-invariant transformation阶跃响应不定的变换Stopband阻带Stopband edge阻带边缘Stopband frequency阻带频率Stopband ripple 阻带起伏（或波纹）Stroboscopic effect频闪响应Summer加法器Superposition integral叠加积分Superposition property叠加性质Superposition sum叠加和Suspension system减震系统Symmetric periodic 周期对称Symmetry对称性Synchronous同步的Synthesis equation综合方程System function(s)系统方程TTable of properties 性质列表Taylor series泰勒级数Time时间，时域Time advance property of unilateral单边z变换的时间超前性质z-transformTime constants时间常数Time delay property of unilateral单边z变换的时间延迟性质z-transformTime expansion property时间扩展性质Time invariance时间变量Time reversal property时间反转（反褶）性Time scaling property时间尺度变换性Time shifting property时移性质Time window时间窗口Time-division multiplexing (TDM)时分复用Time-domain时域Time-domain properties时域性质Tracking system (s)跟踪系统Transfer function转移函数transform pairs变换对Transformation变换（变形）Transition band过渡带Transmodulation (transmultiplexing) 交叉调制Triangular (Barlett) window三角型（巴特利特）窗口Trigonometric series三角级数Two-sided signal双边信号Type l feedback system l 型反馈系统UUint impulse response单位冲激响应Uint ramp function单位斜坡函数Undamped natural frequency无阻尼自然相应Undamped system无阻尼系统Underdamped systems欠阻尼系统Undersampling欠采样Unilateral单边的Unilateral Laplace transform单边拉普拉斯变换Unilateral z-transform单边z变换Unit circle单位圆Unit delay单位延迟Unit doublets单位冲激偶Unit impulse单位冲激Unit step functions单位阶跃函数Unit step response 单位阶跃响应Unstable systems不稳定系统Unwrapped phase展开的相位特性Upsampling增采样VVariable变量WWalsh functions沃尔什函数Wave波形Wavelengths波长Weighted average加权平均Wideband宽带Wideband frequency modulation宽带频率调制Windowing加窗zZ domain z域Zero force equalizer置零均衡器Zero-Input response零输入响应Zero-Order hold零阶保持Zeros of Laplace transform拉普拉斯变换的零点Zero-state response零状态响应z-transform z变换z-transform pairs z变换对。

基于小波去噪与改进Canny算法的带钢表面缺陷检测

现代电子技术Modern Electronics TechniqueFeb. 2024Vol. 47 No. 42024年2月15日第47卷第4期0 引言带钢是钢铁工业的主要产品之一，广泛应用于机械制造、航空航天、军事工业、船舶等行业中。

然而在带钢的生产制作过程中，由于受到原材料、生产设备、工艺流程等多种因素的影响，不可避免地会导致带钢表面出现缺陷，例如：氧化、斑块、裂纹、麻点、夹杂、划痕等。

表面缺陷不仅影响带钢的外观，更是损害了产品的耐磨性、抗腐蚀性和疲劳强度等性能，因此需要加强产品的质检，对有表面缺陷的带钢进行检测和筛查。

但传统人工检测方法采用人为判断，随机性较大、检测置信度偏低、实时性较差[1]。

卞桂平等提出一种基于改进Canny 算法的图像边缘检测方法，采用复合形态学滤波代替高斯滤波，并通过最大类间方差法选取高低阈值，最后利用数学形态学对边缘进行细化，提高了抗噪性能[2]。

刘源等提出一种DOI ：10.16652/j.issn.1004⁃373x.2024.04.027引用格式：崔莹，赵磊，李恒，等.基于小波去噪与改进Canny 算法的带钢表面缺陷检测[J].现代电子技术，2024，47（4）：148⁃152.基于小波去噪与改进Canny 算法的带钢表面缺陷检测崔莹，赵磊，李恒，刘辉（昆明理工大学信息工程与自动化学院，云南昆明 650500）摘要：针对带钢表面图像亮度不均匀、对比度低以及缺陷种类多、形式复杂的问题，提出一种基于小波去噪与改进Canny 算法的带钢表面缺陷检测算法。

首先通过小波变换将原始图像分解，对低频分量采用改进的同态滤波提高亮度和对比度，对高频分量采用改进的阈值函数进行去噪，并通过小波重构得到增强图像。

其次对传统Canny 算法进行改进，通过改进的自适应加权中值滤波进行平滑，并增加梯度方向模板；然后采用迭代式最优阈值选择法与最大类间方差法来求取高低阈值，提高算法的自适应性。

基于Log-Euclidean词袋模型与基于Stein核稀疏编码的人体行为识别算法的优化与改进

基于Log-Euclidean词袋模型与基于Stein核稀疏编码的人体行为识别算法的优化与改进随着人工智能和计算机视觉技术的发展，人体行为识别技术在智能监控、安全防范、智能交通等领域具有广泛的应用前景。

目前，基于Log-Euclidean词袋模型与基于Stein核稀疏编码的人体行为识别算法已经被广泛研究和应用。

这些算法在一些实际应用场景中还存在一些问题，如识别精度不高、运行效率低等。

本文将针对这些问题进行优化与改进，以提高人体行为识别算法的性能和实用性。

一、Log-Euclidean词袋模型的优化与改进Log-Euclidean词袋模型是一种常用于人体行为识别的特征提取方法，它可以将图像数据转化为向量形式，然后通过分类器进行训练和识别。

传统的Log-Euclidean词袋模型在处理大规模数据时存在计算复杂度高、识别精度不高等问题。

我们可以通过以下方式进行优化和改进：1. 采用分层词袋模型：在传统的词袋模型中，特征提取和词汇识别是分开进行的，这样会导致词袋模型对局部特征的捕获并不够充分。

我们可以采用分层词袋模型，将特征提取和词汇识别过程进行融合，从而提高对局部特征的表达能力。

2. 使用深度学习进行特征学习：深度学习在图像处理领域取得了很大的成功，可以通过深度学习网络进行特征学习，从而提高特征的抽取能力。

我们可以将深度学习网络与Log-Euclidean词袋模型进行结合，从而提高算法的识别精度和鲁棒性。

二、基于Stein核稀疏编码的优化与改进Stein核稀疏编码是一种基于核方法的特征提取和分类算法，它可以通过学习数据的稀疏表示来实现对数据的有效表达和识别。

传统的Stein核稀疏编码在处理高维数据和大规模数据时存在计算复杂度高、运行效率低等问题。

我们可以通过以下方式进行优化和改进：1. 使用快速算法进行训练和识别：传统的Stein核稀疏编码算法通常需要对整个数据集进行全局学习，这样会导致计算复杂度较高。

利用聚类算法区分小尺度电离层行扰事件与赤道等离子体泡事件

0254-6124/2020/40(6)-1014-10 Chin. J. Space Sci. 空间科学学报WANG Ling, YIN Fan. Distinguish small-scale traveling ionospheric disturbances and equatorial plasma bubbles by clustering algorithm (in Chinese). Chin. J. Space Sci., 2020, 40(6): 1014-1023. DOI:10.11728/cjss2020.06.1014利用聚类算法区分小尺度电离层行扰事件与赤道等离子体泡事件**国家重点研发计划项目(2018YFC1503501, 2018YFC1503501-01),国家自然科学基金重点项目(41431073)和国家自然科学基金面上项目(41474157)共同资助2020-01-13收到原稿，2020-04-17收到修定稿E-mail: ****************.cn.通信作者尹凡，E-mail: **************.cn汪领尹凡(武汉大学电子信息学院武汉430072)摘要利用Swarm 卫星2015年1月1日至2019年12月31日的50Hz 高频磁场数据，根据阈值判断垂直于主磁场方向的扰动，对磁纬45°N-45°S 之间的小尺度电离层行扰事件进行探测.为避免混淆而产生的干扰，可以根据阈值判断平行于主磁场方向是否发生扰动，从而排除典型的赤道等离子体泡事件.但对于较弱的赤道等离子体泡事件，扰动阈值判断无效.为避免弱赤道等离子体泡事件的污染，根据小尺度电离层行扰事件和赤道等离子体泡事件在不同参数空间中的密度分布差异，利用基于密度的聚类算法将赤道等离子体泡事件进一步甄别提取.结果表明，聚类算法能够有效地将赤道等离子体泡事件从小尺度电离层行扰事件中甄选出来，并使小尺度电离层行扰事件聚类与赤道等离子体泡事件聚类形成清晰的边界.由聚类算法导出的弱赤道等离子体泡爭件主要分布在磁纬15°N —15°S,地理经度20°-60°W,月份10至3月之间，并且在20:00MLT-24:00MLT 存在高发生率，同时依赖于太阳活动，这也验证了前人的相关研究结果.关键词聚类算法，电离层,小尺度电离层行扰，赤道等离子体泡中图分类号P352Distinguish Small-scale Traveling IonosphericDisturbances and Equatorial PlasmaBubbles by Clustering AlgorithmWANG Ling YIN Fan(School of Electronic Information, Wuhan University, Wuhan 430072)Abstract By analyzing the 50 Hz high-frequency magnetic field data by Swarm from January 2015to December 2019, according to the magnetic disturbances above the threshold in the perpendiculardirection to the main field, Small-Scale Traveling Ionospheric Disturbances (SSTID) within ±45°Nmagnetic 1st itude have been detected. When detecting SSTID, Equatorial Plasma Bubbles (EPB) can be excluded by determining whether there is magnetic disturbance above the threshold in the汪领等：利用聚类算法区分小尺度电离层行扰事件和赤道等离子体泡事件1015direction which is parallel to the mean ambient field.However,when the disturbance is weak,the algorithm can n ot completely exclude it.According to the different density distribu t ion of SSTID and EPB in parameter space,the EPB can be distinguished by a density-based clustering algorithin.The results demonstrate that the clustering algorithm is very effective on separating EPB and SSTID in different two-dimensional parameteT spaces and brings out a clear boundary between the high density region and low density region.The EPB identified by the clustering algorithm shows the strong dependence on solar activity and their statistical features are consistent with the previous study.Key words Clustering algorithm,Ionosphere,Small-Scale Traveling Ionospheric Disturbance(SSTID), Equatorial Plasma Bubble(EPB)0引言由于等离子体的不稳定性,各种从上和从下的能量输入会导致电离层发生扰动现象，而并不总是保持稳态和均匀分层结构.波长小于10km的小尺度电离层行扰(Small-Scale Traveling Ionospheric Disturbance,SSTID)是一种经常发生在中低纬地区的电离层扰动现象，被认为是夜间波长几百km左右的中尺度电离层行扰(Medium-Scale Traveling Ionospheric Disturbance,MSTID)的一种演化，二者的统计特征基本一致太阳活动平静期的夜间中纬度电离层经常发生MSTIDd这是由电离层F层的Perkins不稳定和与其相连的偶发E层不稳定之间的耦合作用产生的【3-7】.与MSTID事件导致的中纬度中尺度磁场波动(Mid-latitude Magnetic Field fluctuation,MMF)类似，中低纬度SSTID事件也会导致与场向电流相关的小尺度磁场波动(Small-Scale Mid-Latitude Magnetic Fluctuation,SSMMF).通常MMF以及SSMMF在电离层顶部都不伴随电子密度的扰动，而且磁扰动只存在于主磁场的垂直方向［1'81-Yin等⑴研究发现SSMMF的统计特征与夜间MMF【&9］的统计特征存在一致性：SSMMF与MMF均主要分布在南北半球磁纬25°N-45°S之间，磁赤道低纬度附近(磁纬1(TN—10°S之间)和南大西洋异常(South Atlantic Anomaly,SAA)区域分布极其稀疏；在季节依赖性上均表现岀在夏至季节分布最密集，冬至其次，春秋分是事件发生的低谷期.在磁地方时上.SSMMF主要分别分布在0&00MLT-12:00MLT以及17:00MLT -23:00MLT.SSMMF整体对太阳活动没有依赖性.另一种电离层扰动现象是发生在低纬赤道地区的由瑞利泰勒不稳定性导致的等离子体密度不规则结构，通常被称为赤道扩展F(Equatorial Spread F, ESF)或赤道等离子体泡(Equatorial Plasma Bubble,EPB)【iot2］epb泡内电子密度远低于背景电离层电子密度，其电子密度耗尽可以达到90%以上. Liihr等(13-17\Park等网、Xiong等［叭Stolle 等【"I、Yokoyama等购研究使用CHAMP卫星观测结果，发现EPB产生的压强梯度驱动的电流和场向电流分别导致平行于主磁场和垂直于主磁场方向的磁场强度发生扰动.EPB主要出现在地理经度20°-60°W(SAA区域范围)，月份10至3月之间分布最密集，夏至季节分布非常稀疏，主要出现在20:00MLT-24:00MLT,并且依赖于太阳活动强度［14-18］.中低纬度的SSTID事件和EPB事件都会导致磁场扰动.利用星载高精度磁力计捕捉到的扰动信号对两种现象已进行了充分研究.但在研究这两种不同机制的事件时，必须对二者加以区分，否则会产生混淆•其区分的关键在于：SSTID事件导致的磁扰动存在于主磁场的垂直方向，在主磁场的平行方向上则表现平静;赤道等离子体泡产生的磁场扰动也明显存在于主磁场平行方向.Park等〔"I、Xiong等【叭Stolle等Ml、Liihr等【切通过分析CHAMP卫星磁场数据中平行于主磁场方向上的扰动来确定EPB 事件.Yin等⑴和Park等［&切分析CHAMP以及Swarm卫星磁场数据时，通过垂直于主磁场方向上的扰动来确定MSTID和SSTID事件，并且通过排除平行于主磁场方向的磁扰来避免EPB事件的1016Chin.J.Space Sci.空间科学学报2020,40(6)污染.一般地，事件性质可以通过不同方向上的时序信号是否存在超过设定阈值的扰动来确定.但是，由于EPB事件磁扰动的复杂，性,Stolle等网利用人工判读的方法来验证EPB事件但对于事件数量巨大的SSTID事件，人工判读显然不合适.Yin等【1】在探测SSTID事件时，虽然通过阈值判定的方法排除了EPB事件，但观察最终得到的SSTID事件集合在地理经度、地方时与磁纬等二维参数空间中分布，仍然可以发现明显的EPB事件的分布特征，即SSTID 事件集是不纯净的，受到了未被完全排除的EPB事件的污染.通过分析发现，SSTID探测算法中仅通过判定平行于主磁场方向上的磁扰来排除EPB事件的污染是存在缺陷的.一个非典型EPB事件，当其平行于主磁场方向上的磁扰持续时间比较短且扰动幅度相对较小(低于阈值但又不是噪声)时，会导致其不被判定为EPB事件，而被误判为SSTID事件.如果降低SSTID探测算法中排除EPB事件的判断阈值，又会因为噪声信号产生事件漏报，而排除了本该探测到的SSTID事件;而升高判断阈值则会将更多的EPB污染事件误报为SSTID事件,从而导致SSTID数据集污染更严重.另外,由于使用的等离子体密度数据的采样频率为2 Hz,远低于50Hz磁场数据，并且非典型EPB事件磁扰动及电子密度扰动均较弱，因此通过观测等离子体密度数据的扰动来排除非典型EPB事件也不合适.但非典型EPB事件和典型EPB事件的统计属，性是一致的.在Yin等〔1】的统计结果中，污染性的EPB事件地理上明显分布在磁赤道附近的SAA区域,在地方时上则集中分布在22:00MLT附近，这与已有观测一致.无论EPB污染事件是典型的还是非典型的,其在参数空间中的统计特征明显与SSTID事件不同.值得研究的是，是否存在一种能利用这两种事件在不同参数空间中的密度分布差异来对SSTID事件集合中未排除的EPB 事件进行进一步提取、剔除的方法，利用该方法进行数据处理,不仅能保证SSTID事件集的纯净性，而且能对未排除的EPB事件特征进行深入研究.基于密度的DBSCAN(Density-Based Spatial Clustering of Applications with Noise)I19)聚类算法能够弥补时序信号阈值判定事件算法存在的缺陷，对SSTID事件集合中存在的非典型EPB污染事件进行甄别.本文利用三颗Swarm卫星2015年1月1日至2019年12月31日的高精度磁场数据，探测磁纬45°N-45°S之间的SSTID事件.使用了通过判定平行主磁场方向上是否存在超过设定阈值的磁扰动来排除EPB事件的算法，针对使用这种算法的处理结果，使用基于密度的DBSCAN聚类算法对非典型EPB污染事件进行进一步甄选.最后分析验证由聚类算法导出的EPB事件的地理经度、磁地方时、季节和对太阳活动依赖性特征.1数据来源及处理方法1.1数据来源2013年11月22日欧空局(ESA)启动了Swarm 地球探测任务，将三颗卫星(Swarm-Alpha,Swarm-Bravo和Swarm-Char l ie)成功发射到近地极轨轨道. Swarm卫星的主要目标是测量迄今为止最完整、最高精度的地球磁场数据.经过卫星发射和轨道调试，2014年4月17日三颗卫星的飞行轨道最终配置完成.其中，相对较低的两颗卫星(Swarm-Alpha 和Swarm-Charlie)以462km的初始高度和87.35°的倾角轨道并行飞行.二者南北方向的距离大约只有75km,东西方向地理经度相差约为1.5°.Swarm-Bravo卫星单独飞行在520km的高度上，其轨道倾角为87.75°.本文使用了三颗Swarm卫星搭载的矢量场磁强计测量的50Hz高采样率磁场数据和朗缪尔探针测量的2Hz等离子体密度数据*.1.2SSTID事件探测方法Swarm卫星测量的50Hz磁场数据是基于North-East-Center(NEC)地理学坐标系，而SSTID 事件导致的磁扰动更容易出现在强背景磁场的垂直方向上.因此，需要将磁场数据从NEC坐标系转换到平均场(Mean-Field Aligned,MFA)坐标系.在MFA 坐标系中，磁场分量B x,B Z1B y构成MFA中的三个磁场正交分量，B z沿着平均磁场的方向，B y垂直于磁子午线并且指向东向I?。

Preparation and Study of the Crystallization Behav

Journal of Energy and Power Engineering 14 (2020) 100-110doi: 10.17265/1934-8975/2020.03.004Preparation and Study of the Crystallization Behavior of the Fe87Zr7B5-x Nanocristalline Soft-Magnetic AlloysNagantié Kone1 and Tu Guo Hua21. Malian Agency of Radiation Protection (AMARAP), Building A3, Complexe of Ex-CRES, PO Box 1872, Hill of Badalabougou, Bamako, Mali2. Nanging Normal University, Nanging, People’s Republic of ChinaAbstract: A series of materials of composition Fe87Zr7B6, Fe87Zr7B5Ag1, Fe87Zr7B5Cu1 have been prepared by the melt spinning method for different cooling speed to get completely or partly amorphous ribbons. It is proved that the crystallization process is composed of 2 steps. The first step is the precipitation of &-Fe only and the second step is the phase separation of &-Fe, Fe2Zr and Fe2B. All our own made materials have been used in the library monitoring system. Most of them showed a capability of triggling the alarm of the system. The triggling sensitivity at different positions and different sample geometry were investigated and the physical mechanisms were analyzed.Key words: Fe-based, amorphous, alloys, crystallization, nanocrystalline, magnetic, quenching.1. IntroductionFe-based amorphous alloys show a great interest for their soft magnetic properties, i.e. with their high magnetic flux density [1, 2]; or from a theoretical view, a magnetic material is to be said soft if it shows: •Low effective magnetic anisotropy;•Small magnetostriction, when it is used in AC (alternating current) field, it needs low electrical conductivity and small domain width to reduce the magnetic losses. The magnetostriction always happens in all magnetic materials.•The magnetostriction, in the case of saturation magnerization is characterized by a coefficient noted ʎ which is a characteristic of a magnetic material. The magnetocrystalline anisotropy is characterized by a coefficient K which shows the different energy of different directions of magnetization with respect to the crystal orientation.•It is the mechanical deformation along the substance (dimensions changes) with the changing ofCorresponding author: Nagantié Kone, Dr., master of conference, research field: radiation physics. the magnetical situation, the magnetostriction free energy is nothing but the interaction between mechanical stress and magnetostriction.•The magnetic domain is a local region in the material where all the local magnetic moments are aligned in the same direction because of the exchange interaction between spins [1, 2]. The domain structure of a material plays an important role in the magnetic behavior of the material.•In 1960, a new method of producing amorphous materials was reported by Graham et al. [5]. From that time, most of the research work and industrial interest have been concentrated on this method. It consists of producing amorphous (short range atomic order) alloy ribbons by rapidly quenching liquid melt on to a rotating wheel.•During the rapid cooling, the atoms do not have enough time to arrange themselves before solidifying and results in random arrangement. Therefore, better soft magnetic properties related to the very small magneto-cristalline anisotropy result from the absence of atomic long range order. In the case of domain wallmotion, due to structural uniformity, high flux densityPreparation and Study of the Crystallization Behaviour of the Fe87Zr7B5-xNanocristalline Soft-magnetic Alloys101is also obtained. So, during the later years, many research efforts are made to use Fe-based soft magnetic alloys for high frequency applications such as high frequency transformer and choke cores. Since the report of Yoshizawa et al. in 1988 [3], particular attention has been accorded to the Fe-based amorphous nanocrystalline alloys to minimize magnetic core losses annealing procedure.•In fact, a (Fe-based) nanocrystalline alloy is crystallized often from an amorphous matrix, containing many methods such as single roller spinning combined with after annealing or by consolidation of mechanical alloyed (using ball mill) powder by hot pressor. The magnetic properties of such materials produced, depend on many factors like the grain size, the Fe-B-M (M = Mo, Nb, Zr…) alloys with nanocrystalline structure showing a large interest. •Many reports have been published about improving Fe-based nanocrystalline structure showing a large interest.•Many reports have been published about improving Fe-based nanocrystalline amorphous materials. In a general way, they can be summarized as follows.1.1 Aim•Fabrication of amorphous soft magnetic materials using spinning procedure combined with after annealing;•Investigation of the effect of cooling speed after the end of annealing;•Annealing temperature and time effect.1.2 Experimental ProcedureThe magnetic properties have been measured: •Ms: by vibrating sample magnetometer µi, µmax, µe: by a BH loop tracer;•Hc: by impedance analyzer;•ʎ : by capacitance bridge or strain guage;•ρ: by DC four-point probe method;•Bs: by SQUID (super conducting quantum interference device);•The structure of sample: by super X-ray diffractometre using CuK & radiation and TEM;•The surface structure: by SEM and AFM;•The mean grain size and the volume fraction of residual amorphous phase evaluated: from the peak width of the X-ray diffraction spectrum and 57Fe Mossbauer study respectively;•the density: by Archimedian method;•the mechanical and thermomechanical analysis: by TMA.1.3 ResultsDuring these studies, it has been found that:•Very good soft magnetic properties can be obtained for grain size less than 15 nm, amorphous phase less than 35%, the zero crystalline anisotropy does not correspond necessarily to the highest permeability [4];•A decrease in the ribbon thickness which is related to higher quenching speed was very effective in improving the permeability as well as the core loss in high frequency range [5].•To reach good results, many ways are used to treat the raw material as well as in the spinning and annealing processes [6, 7].It has been reported that high cooling speed after the end of annealing is very effective to lead to zero crystalline anisotropy, minimize amorphous phase and achieve small grain size. Hence, the initial permeability µI and the saturation magnetic flux density Bs are improved and the magnetostriction coefficient ʎ lowered [8].But the most impressive improvement of grain size has been related to adding small amount of impurity (Cu, Ag ..., with only 1 at.% in the best case) [6, 8]. In the Fe-Zr-B-M (M = Ag, Cu...), the effect of the impurity has been explained as Cu, Zr, Ag rich regions around the b-c-c Fe cannot grow in the region. So, the grain size becomes very small and the alloy will show excellent properties [8].Preparation and Study of the Crystallization Behaviour of the Fe 87Zr 7B 5-xNanocristalline Soft-magnetic Alloys102Through the analysis of several reports mentioned above, it seems that the best composition in Fe-B-M alloys is between 92% and 93% of Fe-B with at least 80% of Fe and 7% to 8% additive elements M. This may be due to the fact that for more than 75% of the Fe there will be no completely amorphous phase.2. Experimental Procedures2.1 Preparation of Three Alloys: Fe 87Zr 7B 6, Fe 87Zr 7B 5Ag 1, Fe 87Zr 7B 5Cu 12.1.1 Description and Principle (Fig. 1)The samples can be melt either by an arc furnace or by a copper “boat” tube. We will be using this later. It is composed of a quartz tube of about 50 cm long and 35 mm of diameter containing a copper tube water cooled and carrying the “boats” in which samples are melted. A high frequency current was produced by radio frequency magnetic field in the winding coil. The raw materials placed in this field will induce a high frequency eddy current because of the low resistivity of the material which will create a high power. The power will generate a heat to melt the samples in the “boat”. Several melting will give homogenous alloy ingot. The boat tube presents more advantages than the arc furnace. With the former, there is no too high temperature and no spread of the sample will occur during the heating process. Also, because of the limited volume of the quartz tube, there will be much less oxidation for the sample and the composition of thesample will be accurately maintained.Our samples have been melted between 2 kV and 3 kV of the plate voltage under a pure Argon pressure of 0.6 atm.Three types of ingot of composition Fe 87Zr 7B 6, Fe 87Zr 7B 5Ag 1, Fe 87Zr 7B 5Cu 1 with respective masses of 1.5 g, 3.0 g and 15.0 g have been prepared by three different ways.The original materials were: Fe, Zr, B, Ag and Cu with respective purities of: 99.9%, 99.5%, 99.5%, 99.99% and 99.999%.(1) Direct combination of the elements Fe, B, Zr: In this case we met some difficulty in getting a good ingot because the B has quite high melting point and takes a long time to dissolve into the alloy completely. During this long period, the Zr has more chance to be oxidized. (2) Precombination of Fe and B to form Fe 80B 20 by using this way, melting became easier then Fe 87Zr 7B 6, Fe 87Zr 7B 5Cu 1 were prepared without difficulty. But a serious problem of evaporation occurred during the Ag melting and Ag seemed not well dissolved. To solve this problem, we tried another way of preparation. (3) Precombining Ag and Zr to form AgZr alloy No evaporation occurred during the melting of AgZr. But, when it came to combining AgZr with Fe 80B 20, additive iron and Zirconium, problem of silver evaporation still remained, although the evaporation is not serious, it does affect the precise composition of the final ingot.Fig. 1Copper boat.Preparation and Study of the Crystallization Behaviour of the Fe87Zr7B5-xNanocristalline Soft-magnetic Alloys1032.2 The SpinningDescription and principle (Fig. 2): The spinning chamber is a cylindrical iron tube of 30 cm height and 33 cm diameter containing a copper wheel of 120 mm in diameter droved by an electrical motor of variable speed. The heat generator is also a radio frequency furnace.About 3 to 5 g of the raw material broken in suitable sizes is melted in a quartz tube with an orifice of 0.6 mm in diameter and known ejection angle. The molten solution was pouched in jet on the copper wheel (good thermal conducting) rotating at a defined speed, the ribbon produced collected in the collecting tube. Fourteen samples have been prepared in the following conditions:•The chamber pressure: 0.7 atm;•Ejection gas pressure: 1.1 atm;•RF furnace PC = 2 amp, PV = 3 kV;•The copper wheel speed varied from 25 m/s to 45 m/s. 2.3 AnnealingDescription (Fig. 3): A very efficient annealing furnace was made with a ceramic tube carrying heater wire with thermal isolation layer of glass wool. But ends of the ceramic tube were closed with heat choke made of refractory material and carrying small openings for the thermocouple and the evacuating sample tube. A thin wall stainless steel tube was prepared to keep vacuum around the sample to be annealed. The temperature control was assumed by an automatic temperature controller using a chromel-alumel thermocouple.Such system has shown a very reliability since very light oxidation has been observed on the samples and the temperature fluctuation was very close to zero.The three samples of composition Fe87Zr7B6,Fe87Zr7B5Ag1, Fe87Zr7B5Cu1 spun at a copper wheel speed of 45 m/s have been successively annealed at 200 °C, 400 °C and 500 °C during a time period of 1 h, 50 min, 40 min and 30 min.Fig. 2 Spinning chamber.Preparation and Study of the Crystallization Behaviour of the Fe87Zr7B5-xNanocristalline Soft-magnetic Alloys104Fig. 3 Annealing furnace.The samples have been annealed under a vacuum of 104 mmHg produced by a diffusion pump.2.4 MeasurementA new device based on the automatic display of hysteresis loop has been built. It is using a single strip ribbon instead of toroid. With such device, the measurement becomes easier and rapid. The details of the measuring device as well as the measures will be described elsewhere.3. Experimental Results and Discussions3.1 Samples Preparation and Crystallization3.1.1 Sample PreparationTheir densities have been measured by the Archimedian method and the cross section deduced from the density which was in average 7.4 g/cm3. The permeability and coercive field of the samples have been measured in a quenched state. Three samples of composition respectively Fe87Zr7B6, Fe87Zr7B5Ag1, Fe87Zr7B5Cu1 annealed at 200 °C (1 h), 300 °C (40 min) and 500 °C (30 min) have been measured.The Fe87Zr7B5Ag1 reached a maximum permeability at 300 °C but did not show the drastic increase as in the reference paper [5]. The two others reached their maximum permeability at 400 °C. Before annealing the samples, the X-ray analysis showed an amorphous state as can be seen in Fig. 4. From the DTA (differential temperature analysis) data (Figs. 10 and 11), it appeared that:(a) All the samples were either completely amorphous (ideally amorphous for those spun at 45 m/s) or partly amorphous (lower spun speed).(b) The crystallization behavior in our case was different from that of the reference paper report [3]. All the samples showed two steps of crystallization as shown by their DTA curve in Fig. 10 except that spun at 25 m/s for which the DTA curve is shown in Fig. 11.3.1.2 Crystallization Behaviour of the SamplesThe two steps of crystallization can be clearly explained by the X-ray data in Figs. 5 and 7 where we can see the gradual passage from partly amorphous Fig.5 to totally crystallized Fig. 7 with very sharp peaks, through the intermediate state Fig. 6.Fig. 5 shows the XRD (X-ray diffraction) pattern of Fe87Zr7B6 ribbons annealed at 500 °C when the first crystallization peak rises and the ribbon still keeps very ductile and cannot be powdered, the X-ray pattern reflects more about the surface situation. We can see apart from &-Fe peaks of Fe3O4 and ZrO.Fig. 6 shows the pattern of the same ribbon heated in DSC system just after the first crystallization peak is over than rapidly cooled down. The sample became brittle and ready to be powdered. Here, we can see more perfect &-Fe lattice, oxidation products and only a trace of ZrO.Fig. 7 shows the same sample heated at 710 °C which is after the complete crystallization. Here, wecan see the sharpest peak with whole set of &-Fe peaksPreparation and Study of the Crystallization Behaviour of the Fe 87Zr 7B 5-xNanocristalline Soft-magnetic Alloys105as well as Fe 3O 4, ZrO, ZrFe 2 and Fe 2B which perfectly agree with the known phase diagram of Fe-Zr and Fe-B binary system. From these analyses we can conclude that the material crystallized firstly through the precipitation of &-Fe, then with the increasing density of Zr and B in amorphous matrix, finally crystallizedcompletely through a phase separation. This crystallization will give us some hint in controlling the microstructure of the material through annealing.In Figs. 8 and 9, we have also plotted the onset, the peak and the crystallization enthalpy versus the quenching rate.Fig. 4 X-ray diffraction pattern.Fig. 5 X-ray diffraction curve (after 500 °C ).Preparation and Study of the Crystallization Behaviour of the Fe 87Zr 7B 5-xNanocristalline Soft-magnetic Alloys106Fig. 6 X-ray diffraction curve (after 610 °C ).Fig. 7 X-ray diffraction curve (after 710 °C).Preparation and Study of the Crystallization Behaviour of the Fe87Zr7B5-xNanocristalline Soft-magnetic Alloys107 Fig. 8 DTA.Fig. 9 ΔH versus Cu wheel speed.The feature is as follows:•ΔH curves:The larger ΔH corresponds to higher quenching rate.A latent heat means a larger difference between the amorphous structure and the crystalline structure, which means the more complete amorphousness. Increasing the quenching rate, the degree of randomness increases correspondingly. The curve also reflects the sample with lower rate being only partly amorphous.•The DTA peak and onset in Fig. 8 show:The higher crystallization temperature corresponds to the higher quenching rate or more stability of the amorphous state because more complete amorphous structure will be more difficult to form crystalline nuclei within the amorphous matrix.The materials with smaller quenching rate are not completely amorphous, so, the crystalline nuclei already exists in the sample, only the crystalline nuclei growth can cause the crystallization of these samples as108Preparation and Study of the Crystallization Behaviour of the Fe87Zr7B5-xNanocristalline Soft-magnetic Alloys ArrayFig. 10 DTA curve.Preparation and Study of the Crystallization Behaviour of the Fe87Zr7B5-xNanocristalline Soft-magnetic Alloys109 Fig. 11 DTA curve.we know nucleation needs more energy than simple growth.As far as we know for various ferromagnetic amorphous materials, this series of material has quite high crystallization temperature and hence better stability in the practical purpose that is one of the advantages of these materials.4. Conclusion(1) Our materials can be made perfectly amorphous or partly amorphous by changing the quenching speed.(2) Almost all the samples will be crystallized through two steps:•the first step is the precipitation of &-Fe;•the second step is the phase separation of &-Fe, Fe2Zr and Fe2B.References[1]Choi, J. S., et al. 1995.“Nanoscale Size Effect ofMagnetic Nanocrystals and Their Utilization for Cancer Diagnosis via Magnetic Resonance Imagin.” J. of KoreanMagn. Soc. 5 (5): 47.[2]Freeman, M. R., and Smyth, J. F. 1996. “PicosecondTime-resolved Magnetization Dynamics of Thin-Film Heads. ” Journal of Applied Physics 79 (8): 5898-900. [3]Yoshizawa, Y. A., Oguma, S., and Yamauchi, K. 1988.“New Fe-based Soft Magnetic Alloys Composed of Ultrafine Grain Structure.” Journal of Applied Physics 64(10): 6044-6.[4]Chaudhari, P., and Turnbull, D. 1978. “Structure andProperties of Metallic Glasses.” Science199 (4324):Preparation and Study of the Crystallization Behaviour of the Fe87Zr7B5-xNanocristalline Soft-magnetic Alloys11011-21.[5]Graham, C. D., et al. 1995. “HTTLPR-environmentInterplay and Its Effects on Neural Reactivity in Adolescents.” J. of Korean Magn. Soc. 5 (5): 579.[6]Hermando, A., et al. 1995. J. of Korean Magn. Soc. 5 (5). [7]Chikazumi, S., and Graham, C. D. 2009. Physics ofMagnetism. Oxford: Oxford University Press.[8]Kim, B. G., et al. 1995. “Structure and MagneticProperties of Nanocrystalline Soft Ferromagnets.” Journalof Applied Physics 77 (10): 5298.。

第5章无失真编码

Theorem 5.1 (Kraft Inequality) A necessary and sufficient condition for the existence of a binary prefix code set whose code words have lengths n1 n2 ... nL is
code）
Kraft theorem
Unfixed length coding theorem（Shannon First Theorem）
Kraft theຫໍສະໝຸດ remquestion：find real-time, uniquely decodable code method：research the code partition condition of the
li
Xi {x1, x2 ,..., xr )
How to code losslessly? Assuming that the statistical characteristics of the source can be ignored, it must satisfy the following condition.
dimension (3) Node ↔ Part of a code word (4) Terminal node ↔ End of a code word (5) Number of branches ↔ Code length (6) Non-full branch tree ↔ Variable length code (7) Full branch tree ↔ Fixed length code
real-time uniquely decodable code introduction：concept of “code tree” conclusion：present the condition that the real-time

一种新的用于屏幕图像编码的HEVC帧内模式

一种新的用于屏幕图像编码的HEVC帧内模式陈先义;赵利平;林涛【摘要】由于传统编码方式对屏幕图像的编码效果不佳，该文根据屏幕图像包含大量非连续色调内容的特点，在HEVC(High Efficiency Video Coding)基础上，提出一种新的帧内编码模式称为帧内串匹配(Intra String Copy, ISC)。

基本思想是在HEVC的编码单元(Coding Unit, CU)级别上，引入字典编码工具：编码时，在一定长度的字典窗口内，利用散列表，对当前CU内的像素，进行串搜索和匹配；解码时，根据像素串匹配的距离和匹配长度，在重建缓存内复制相应位置像素重建当前CU像素。

实验结果表明，在编码复杂度增加很少的情况下，对于典型的屏幕图像测试序列，在全帧内(All Intra, AI)，随机接入(Random Access, RA)，低延迟(Low-delay B, LB)3种配置下，有损编码模式比HEVC分别节省码率15.1%,12.0%,8.3%，无损编码模式分别节省码率23.3%,14.9%,11.6%。

%Because of the poor effect of the traditional coding methods on the screen content coding, considering the screen content is rich in non-continuous tone content, a new intra coding mode based on High Efficiency Video Coding (HEVC), which is called Intra String Copy (ISC), is proposed. The basic idea is adopting the dictionary coding tool on the HEVC Coding Unit (CU) level. When encoding, the current CU pixels are searched and matched in a certain length dictionary window by using Hash table. When decoding, according to the pixels string matching distances and lengths, the current CU pixels in the reconstruction cache are restored by copying the corresponding position pixels. Experiment results show that with little coding complexity increase than HEVC, for typical screen contenttest sequences, ISC can achieve lossy coding bit-rate saving of 15.1%, 12.0%, 8.3% for All Intra (AI), Random Access (RA), and Low-delay B (LB) configurations, respectively, and lossless coding bit-rate saving of 23.3%, 14.9%, 11.6% for AI, RA, and LB configurations.【期刊名称】《电子与信息学报》【年(卷),期】2015(000)011【总页数】6页(P2685-2690)【关键词】高效视频编码;屏幕图像编码;字典编码;散列表【作者】陈先义;赵利平;林涛【作者单位】同济大学超大规模集成电路研究所上海 200092;同济大学超大规模集成电路研究所上海 200092; 嘉兴学院数理与信息工程学院嘉兴 314000;同济大学超大规模集成电路研究所上海 200092【正文语种】中文【中图分类】TN919.8随着云计算、移动云计算、远程桌面和无线显示技术的发展，如何在低码率下使屏幕图像在电脑屏幕、手机屏幕、电视屏幕和其它客户端上高质量地显示，吸引了学术界和工业界的关注。

基于HEVC框架的无损压缩编码算法研究

基于HEVC框架的无损压缩编码算法研究刘铁华【摘要】为了进一步提高无损压缩的效率,提出了一种基于高效率视频编码(High Efficiency Video Coding,简称HEVC)框架的无损压缩编码算法.此框架不只是对视频压缩有效,还对于全Ⅰ帧视频序列和图像都有很好的压缩效果,我们便是利用这个框架来压缩图像.该算法通过采用自适应差分脉冲编码调制(Differential Pulse Code Modulation,简称DPCM)和误差补偿相结合的方法来提高预测精度,从而达到降低码率的目的.实验结果表明,该方法比HEVC原有无损压缩方法节省10％以上的码率,并且对于纹理复杂的图像也有较好的压缩效果.【期刊名称】《软件》【年(卷),期】2013(034)002【总页数】4页(P69-72)【关键词】多媒体编码;高效率视频编码;无损压缩;差分编码;误差补偿【作者】刘铁华【作者单位】北京工业大学计算机学院,北京100124【正文语种】中文【中图分类】TP370 引言在视频图像领域中，压缩技术一般分为有损压缩和无损压缩。

对于某些应用中的图像，例如互联网中，采用有损压缩可以获得更好的效率。

无损压缩技术也有一些应用领域，比如医学图像，卫星遥感，指纹识别等。

在过去的若干年间，提出了很多图像无损压缩模式。

吴提出的基于自适应梯度预测和统计建模的CALIC[1]算法；作为图像无损压缩编码标准的JPEG-LS是基于LOCO-I的算法[2]；JPEG2000无损压缩模式基于整型小波变换，插值的算法[3-4]；在论文[5]中，提出了一种基于H.264/AVC无损编码模式；在论文[6]中，使用基于边缘检测方法进行无损压缩。

对于即将到来的HEVC标准也包含了无损编码模式，由于变换量化会引入误差，为了达到无损压缩的目的，解码过程中不产生失真，需要在这种模式中将变换量化模块关闭，对于熵编码，使用类似于有损编码中的相同的模块。

但是HEVC原有方案是以块为单位选择预测模式，块中的像素都是参考相邻块进行预测，会影响预测精度。

VTK在医学图像处理买验教学中的应用

VTK在医学图像处理买验教学中的应用张俊兰;胡亨伍;李敏【摘要】随着近年来高新技术的迅速发展,对医学图像处理的需求日益增加,突显出医学图像处理在医务系统中的重要性.为了改善医学类院校的实验教学,提高实验教学质量,将基于对象方法设计的可视化工具VTK应用到医学图像处理的实验教学中,使学生能够直观地学习医学图像处理的基本原理及方法,全程地接受关于医学图像处理方面软件的软件过程培训.【期刊名称】《智能计算机与应用》【年(卷),期】2012(002)002【总页数】3页(P62-64)【关键词】医学图像处理;VTK;实验教学【作者】张俊兰;胡亨伍;李敏【作者单位】广东医学院信息工程学院,广东东莞523808;广东医学院信息工程学院,广东东莞523808;南华大学网络信息中心,湖南衡阳421000【正文语种】中文【中图分类】TP3110 引言在医学类院校中，直接针对医学特色设置了与医学类紧密相关的医学计算机专业或医学信息工程专业，在这些专业中开设了医学图像处理课程。

这是一门讲述如何用计算机对医学图像进行处理的课程，也是现阶段非常热门的医学可视化技术的入门课程。

该课程主要讲解医学图像处理和分析的基本原理及方法，通过编程实践和建立图像处理应用系统完成各种医学图像信息处理，包括医学图像的增强、配准、分割、三维重建和显示[1]。

医学图像信息转换标准DICOM的提出[2]，解决了不同图像生成设备给图像转换所带来的障碍和困难，而数字医学的迅速发展，又使得图像尤其是三维图像在医疗中的应用越来越广泛。

但是图像在生成和传输过程中常受到各种噪声源的干扰和影响，如何通过图像处理技术对CT、MRI、B超等医学图像进行有效的处理，辅助医生进行更为精确的诊断甚至进行术前的模拟手术等，则已经成为亟待解决的重要课题。

针对这种情况，结合VTK的特点，本文提出了利用VTK来强化医学院计算机相关专业的医学图像处理实验教学，以提高学生对课程内容的理解，激发了学生的学习热情和钻研精神，培养学生的动手能力。

基于Log-Euclidean词袋模型与基于Stein核稀疏编码的人体行为识别算法的优化与改进

基于Log-Euclidean词袋模型与基于Stein核稀疏编码的人体行为识别算法的优化与改进人体行为识别是计算机视觉和模式识别领域的一个重要研究方向，它在视频监控、智能交通、医疗诊断等领域具有广泛的应用价值。

在人体行为识别领域，传统的方法主要是基于特征提取和分类器的结合，其中词袋模型和核稀疏编码是两种经典的特征提取方法。

本文将针对基于Log-Euclidean词袋模型与基于Stein核稀疏编码的人体行为识别算法进行优化与改进，以提高算法的准确性和鲁棒性。

一、基于Log-Euclidean词袋模型的优化介绍Log-Euclidean词袋模型。

Log-Euclidean词袋模型是一种在向量空间中表示词袋模型的方法，它通过计算单词在语料库中的文档频率来构建特征向量。

然后，利用对数欧几里得距离来度量特征向量之间的相似性。

Log-Euclidean词袋模型在自然语言处理和图像处理领域有着广泛的应用。

传统的Log-Euclidean词袋模型存在维度灾难和特征冗余的问题，降低了其在人体行为识别任务中的性能。

1. 特征选择：采用合适的特征选择方法，去除冗余和噪声特征，提取更具有判别性的特征。

2. 特征降维：利用主成分分析（PCA）等方法进行特征降维，减少特征空间的维度，提高算法的计算效率和泛化能力。

3. 参数调优：对Log-Euclidean词袋模型中的参数进行调优，通过交叉验证等方法寻找最优的参数组合，提高模型的性能。

二、基于Stein核稀疏编码的改进接下来，介绍基于Stein核稀疏编码的方法。

Stein核稀疏编码是一种基于核方法的稀疏表示学习方法，它能够有效地提取高维非线性特征，并且具有较强的鲁棒性。

在人体行为识别中，Stein核稀疏编码可以有效地捕捉动作序列之间的时空信息，提高了算法的准确性和鲁棒性。

传统的Stein核稀疏编码方法在实际应用中存在着一些问题，如计算复杂度高、参数选择困难等。

针对基于Stein核稀疏编码的方法存在的问题，可以从以下几个方面进行改进：1. 高效的优化算法：设计高效的优化算法，减少计算复杂度，提高算法的速度和效率。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Low-Complexity,Near-Lossless Coding of Depth Maps from Kinect-Like Depth Cameras Sanjeev Mehrotra,Zhengyou Zhang,Qin Cai,Cha Zhang,Philip A.ChouMicrosoft ResearchRedmond,WA,USA{sanjeevm,zhang,qincai,chazhang,pachou}@Abstract—Depth cameras are gaining interest rapidly in the market as depth plus RGB is being used for a variety of appli-cations ranging from foreground/background segmentation,face tracking,activity detection,and free viewpoint video rendering. In this paper,we present a low-complexity,near-lossless codec for coding depth maps.This coding requires no buffering of video frames,is table-less,can encode or decode a frame in close to 5ms with little code optimization,and provides between7:1to 16:1compression ratio for near-lossless coding of16-bit depth maps generated by the Kinect camera.I.I NTRODUCTIONOver the past few years,depth cameras have been rapidly gaining popularity with devices such as Kinect becoming popular in the consumer space as well as the research space. Depth information of a scene can be used for a variety of purposes and can be used on its own or in conjunction with texture data from a RGB camera.It can be used for aiding foreground/background segmentation,performing activity de-tection,face tracking,pose tracking,skeletal tracking,and for free viewpoint video rendering[1],[2],[3],needed for immersive conferencing and entertainment scenarios.Depth cameras primarily operate using one of two technolo-gies,one is by computing the time ofﬂight from light being emitted and the other is by observing the actual pattern that gets displayed when a known infrared pattern is projected onto the scene.The second of these two technologies requires an infrared projecter to project a known pattern onto the scene and an infrared camera to read the projected pattern.It is known that in the second of these two technologies,such as that used in the Microsoft Kinect sensors,the accuracy of the camera actually reduces with the inverse of the depth.For example, the error in the actual depth when reading a depth value which is twice as far as another one will be two to four times as large. This can be used when coding depth maps to lower the bitrate while maintaining the fullﬁdelity of the sensor.Stereovision systems,usually using two or three video cameras[4],produce depth maps with the depth accuracy also decreasing with the inverse of the depth.Therefore,the compression technique proposed in this paper can also be applied to depth maps from stereovision systems.In addition,depth maps typically have only a few values in a typical scene.For example in a typical scene with a background and a single foreground object,the background has no variation in depth.In fact the variation is much less than that found in texture map of the background.Even the foreground has less variation in depth when going from one pixel to the next as a greater degree of continuity naturally exists in depth when compared to texture.For example,if we look at ones clothing,the clothing itself may have signiﬁcant variation in texture due to the design,but the depth will still be relatively continuous as it is still physically the same object. This fact can be used to predict the depth of a given pixel from neighboring pixels.After this prediction,simple run-length/level techniques can be used to further perform entropy coding allowing for high compression ratios.The coding of depth maps is a relatively recent topic of interest to MPEG[5].In this paper,we do not attempt to come up with a relatively complicated lossy compression scheme,but rather present a low-complexity,and near-lossless compression scheme for depth maps which gives a high compression ratio between7(2.3bpp)to16(1bpp)depending on the content.This coding can be used as front-end to other compression schemes,or can be used on its own for compressing depth maps.For example,it can be implemented directly on the camera and used for sending the depth map to the computer.This bitrate reduction can allow for multiple depth cameras(depth camera array)to be put onto a single USB interface or allow for higher resolution depth cameras to be used on a single USB interface.Alternate methods for depth map encodings have been proposed[6],[7],[8].Several methods use triangular mesh representations for the depth map.Extracting the mesh from the raw depth map takes additional complexity which we avoid by directly coding the depth map using the pixel representa-tion.In addition,even using a mesh representation does not perform better than JPEG2000encoding[8],whereas our method does perform better for lossless coding.In addition, we concentrate on a lossless to near-loss encoding as opposed to a lossy encoding.II.D EPTH M AP C ODINGAn example depth map from a scene as captured from the depth camera on Microsoft’s Kinect is shown in Fig.1. Regardless of the texture,we can see that the depth map captures the outlines of the objects in the scene.It can be seen from the depth map that there are only a few distinct values in the depth map and thus we can infer that high compression ratios should be possible.In this paper,we present a lowFig.1.An example depth map from Set1. complexity,near-lossless compression scheme which consists of three components which will be described in greater detail.1)Inverse coding of the depth map:As the depth mapaccuracy from the sensor decreases with distance,we can quantize depth values which are further out more.2)Prediction:Since the depth map is relatively continuousfrom pixel to pixel simple prediction schemes can be applied to remove redundancy.3)Adaptive Run-length/Golomb-Rice(RLGR)coding[9]:After prediction,there will typically be large runs of ze-ros.Thus we can use adaptive RLGR as an adaptive,low-complexity,table-less code to perform entropy coding.A.Inverse Coding of Depth MapIt is well known that the sensor accuracy of depth cameras such as the Kinect has an accuracy which is at least inversely proportional to the actual depth—Kinect’s depth accuracy is actually proportional to the inverse of squared depth.Let Z be the depth value produced by the camera.We deﬁne Z min to be the minimum depth that can be produced by the sensor of the depth camera and let Z max be the maximum depth.We deﬁne Z0to be the depth at which the accuracy is1unit.This value can be found by looking at the sensor speciﬁcations of the manufacturer or can be measured by placing objects at known depths and repeatedly measuring the depth values using the camera.Therefore,at distances less than Z0,the depth map coding does not fully encode the sensor capabilities and at distances greater than Z0,the sensor accuracy decreases with depth.For example,at Z=2Z0,the sensor accuracy is2units rather than1(if inversely proportional)or4units(if inversely proportional to the square).and therefore we can quantize all values2Z0−1,2Z0,and2Z0+1all to2Z0and still be encoding the fullﬁdelity of the sensor.As an alternative to non-uniform quantization,we can use something similar to that proposed in[1],and simply code the inverse of the depth(call it D),D=aZ0−aZ max.(4)Z=0can be used as a special value to indicate missing depthvalues and can be coded as D=0.Decoding can simply invert Eqn.1,and we getZ=aZ min−1Z min Z max.(6)If a<Z min Z max,then we see that the dynamic range ofthe coefﬁcients after applying the inverse mapping is reduced,and thus even in the absence of other entropy coding schemes,there is a bit rate reduction of log2Z min Z maxC.Adaptive RLGR CodingWe use the adaptive Run-Length/Golomb-Rice code from [9]to code the resulting C n values.Since this entropy coding method uses a backward adaptive method for parameter adap-tation,no additional side information needs to be sent,and no tables are needed to perform the encoding or decoding. Although the original depth map,Z,and the inverse depth map,D,consist of positive values,the predicted values, C,can be negative or positive.If we assume that C n is distributed according to a Laplacian distribution,then,we can apply the interleave mapping to get a source with exponential distribution as needed by Golomb-Rice code,F n= 2C n if C n≥0−2C n−1if C n<0.(8)After this,the codec operates in either the“no-run”mode or the“run”mode.There are two parameters in the codec which are adapted,k and k R.In the no-run mode(k=0), each symbol F n gets coded using a Golomb-Rice coding, represented as GR(F n,k R).In the the run mode(k=0), a run of m=2k zeros gets coded using a0,and a run of m<2k symbols gets coded using1followed by the binary representation of m using k bits followed by the level GR(F n,k R).The Golomb-Rice coding is simply given byGR(F n,k R)=11 (1)p-bit preﬁx b kR−1b kR−2...b0k R-bit sufﬁx,(9)where there are p=⌊u2k R .Instead of keeping k and k Rﬁxed parameters,they are adapted using the backward adaptation similar to that de-scribed in[9].Although the adaptive RLGR coding on its own is table-less,we have optimized the decoding using a trivial amount of memory of3KB.With minimal optimizations,we can easily encode a640x48016-bit per pixel frame in close to5ms on a single-core3GHz core.D.Lossy CodingThe proposed coding scheme is numerically lossless if only the prediction and adaptive RLGR entropy coding is performed.It is also practically lossless,because of the sensor accuracy,if the inverse depth coding is done using a=Z0(Z0+1).To introduce some small amount of loss, we can choose one of three methods1)a<Z0(Z0+1)can be chosen.This will introduce lossin that the sensor capabilities will not be fully encoded.2)C n can be mildly quantized.In this case the coefﬁcientthat gets coded becomes C n=Q(D n−D′n−1),where Q(.)is the quantization operation,and D′n−1is the previous decoded pixel.The decoded pixel can be found by D′n=IQ(C n)+D′n−1,where IQ(.)is the inverse quantization operation.3)The level in the adaptive RLGR coding can be quantized.In this paper,we explore lossy coding using theﬁrst of these three options,that is choosing a<Z0(Z0+1).Lossy coding can also be done using more complex methods and different schemes may be used depending on the actual application of the depth map.For example,if the coded depth map is being used to reconstruct different views of a video scene,then the criteria to be optimized is the actual video reconstruction and not the accuracy of the coded depth map.III.R ESULTSWe present two sets of results,one is to show the computa-tional complexity and compression efﬁciency of the proposed depth map coding with an implementation done using minimal optimizations.The other set of results compares the recon-struction of video from alternate viewpoints when using the texture map and a coded depth map from a given viewpoint. The depth map is coded using near-lossless and slightly lossy compression using the method proposed.A.Coding EfﬁciencyWe compress two sets of captured depth maps from the Kinect camera.These two sets of depth maps are taken from multiple video sequences,the total consisting of close to165 seconds of video and depth maps captured at30fps,640x480, and16bits/pixel(bpp).Theﬁrst of these two sets is a more complicated depth map(Fig.1is a depth map from this set)with multiple depth values,the second is a relatively simpler depth map consisting of only a single background and foreground object(Fig.5(b)).The coding is done using the followingﬁve methods.In the Microsoft Kinect sensors,we measure and use the following parameters for the inverse coding presented in Sec.II-A, Z min=300mm,Z0=750mm,and Z max=10000mm. 1)Without inverse coding:This method just uses the entropycoding(prediction plus adaptive RLGR coding)and pro-duces numerically lossless results.2)With inverse coding(lossless):This method uses theinverse coding method with Z0=750mm.This value of Z0is the measured depth value at which the sensor accuracy is1unit.Thus,this produces results which fully encode the sensor capabilities.3)With inverse coding(lossy):We reduce Z0to Z0=600mm.This induces mild compression,with a quan-tization factor of1.25.4)With inverse coding(lossy):We further reduce to Z0=300mm.This results in stronger compression as the depth values are quantized by a factor2.5.5)With JPEG2000[10]lossless mode encoding.We usethis state of the art lossless image codec to compare the efﬁciency of our compression scheme.The coding efﬁciency results for these two sets of depth maps are summarized in Table I when using methods1to 4.In Table.I,we list the10-th percentile,50-th percentile (median),and90-th percentile results for the compression ratio per frame,encode time per frame,and decode time per frame. We do not show results for the mean compression ratio as theyTABLE IS UMMARY OF COMPRESSION RATIO AND CPU UTILIZATION RESULTS FOR BOTH WITHOUT AND WITH INVERSE CODING.T HIS TABLE SHOWS THE10-TH PERCENTILE,50-TH PERCENTILE(MEDIAN),90-TH PERCENTILE VALUES FOR COMPRESSION RATIO(CR),ENCODE TIME(LISTED AS“E NC”)PER FRAME,AND DECODE TIME(LISTED AS“D EC”)PER FRAME.W ITHOUT I NV.C ODING IS NUMERICALLY LOSSLESS,Z0=750mm IS NEARLY LOSSLESS BECAUSE OF THE SENSOR ACCURACY,Z0=600mm AND Z0=300mm ARE LOSSY DUE TO ADDITIONAL QUANTIZATION OF DEPTH VALUES BY AFACTOR OF1.25AND2.50RESPECTIVELY.Inv.Coding Inv.Coding W/O Inv.Inv.CodingCoding Z0=600mm Z0=750mm Z0=300mm 10-th%CR 5.747913.382712.695516.63604.8855 6.801215.453934.192290-th%CR7.227616.676213.514917.58645.213ms6.585ms 3.797ms 2.909ms50-th%Enc7.429ms 5.177ms 3.523ms 3.857ms6.472ms7.966ms 3.978ms 3.010ms10-th%Dec7.871ms 4.763ms 3.538ms 3.911ms7.862ms9.248ms 4.036ms 2.601ms90-th%Dec9.704ms 5.813ms 4.590ms 4.470ms(a)(b)(c)(d)Fig.3.PDF of encode time per frame for the ﬁrst data set.(a)Numerically lossless coding (without inverse coding),(b)Lossless coding of sensor capabilities (Z 0=750mm ),(c)Lossy coding with additional quantization factor of 1.25(Z 0=600mm ),(d)Lossy coding with additional quantization factorof 2.5(Z =300mm ).(a)(b)(c)(d)Fig.4.PDF of decode time per frame for the ﬁrst data set.(a)Numerically lossless coding,(b)Lossless coding of sensor capabilities (Z 0=750mm ),(c)Lossy coding with Z 0=600mm ,(d)Lossy coding with Z 0=300mm .takes place,but this can easily be reduced by using a look up table to perform the division.Other optimizations can easily further reduce the computational cost so that it can even be directly implemented on the camera itself.C.Viewpoint Reconstruction using Coded Depth MapWe also use the coded depth map and texture map to re-construct alternate viewpoints to measure the effect of coding on the reconstruction.Let viewpoint A be the viewpoint from which we capture video and depth which is used to reconstruct an alternate viewpoint,B.As an example,the texture map and depth map from viewpoint A are shown in Fig.5(a)and (b).The reconstructed video from viewpoint B using the uncoded depth map and the coded depth map (using Z 0=750mm )is shown in Fig.5(d)and (f)respectively.The coded depth map is shown in Fig.5(e).For comparison,the actual video captured from viewpoint B is shown in Fig.5(c).The video used in this experiment consists of 47frames.To measure the accuracy of the coded depth maps,we reconstruct video from ﬁve alternate viewpoints using the texture and depth map from viewpoint A.At the same time,we also capture video directly from these ﬁve alternate viewpoints.The cameras at these ﬁve alternate viewpoints (viewpoint B)are calibrated to the camera at viewpoint A,i.e.we know the translation and rotation.We measure the PSNR between the captured video and the reconstructed video.These results are shown in Fig.6(a)for the four different encoding methods explained before —numerically lossless,sensor-wise lossless (Z 0=750mm ),and lossy using Z 0=600mm and 300mm .The ﬁve alternate viewpoint PSNR results (shown for the luminance component (Y)only)are all shown together in the Figure (each viewpoint consists of 47frames in the ﬁgure).The best viewpoint reconstruction results in a PSNR of about 21dB and the worstis about 14.5dB.However,we note that regardless of which method is being used to encode the depth map,the PSNR results are all very similar.Thus,even mildly lossy compres-sion (with Z 0=300mm )gives a bitrate reduction of 15to 30times,but produces no additional artifacts in reconstruction when compared to the actual video.That is most of the addi-tional errors introduced by the mild quantization of the depth map are masked by the errors introduced by reconstruction inaccuracies when compared to the actual captured video.Only for the ﬁfth viewpoint do we see additional errors introduced by quantization of the depth map of about 0.5dB in PSNR.In Fig.6(b),we show the PSNR results between the nu-merically lossless video reconstruction and the reconstructions with the inverse coding for the various Z 0values.This result shows the reconstruction errors introduced by the depth map quantization when comparing to the reconstruction obtained using an unquantized depth map as opposed to the actual captured video as in Fig.6(a).We see that Z 0=750mm produces a reconstruction which is very close to that obtained when using the numerically lossless depth map (PSNR >40dB).Even the Z 0=300mm encoding produces good results with PSNR close to 32dB.IV.C ONCLUSIONIn this paper,we have presented a low-complexity,high-compression efﬁciency,near-lossless coding of depth maps.The coding takes advantage of the sensor accuracy,and continuity in the depth map to encode.The encoding consists of three components,inverse depth coding,prediction,and adaptive RLGR coding.The coding provides a high compres-sion ratio even for a near-lossless representation of the depth map.In addition,reconstruction of alternative viewpoints from the coded depth map does not result in additional artifacts.It can be implemented as either a front-end to more complicated(a)(b)(c)(d)(e)(f)Fig.5.Images from Set2.(a)The original texture map from viewpoint A.(b)The original depth map from viewpoint A.(c)The original video captured from viewpoint B.(d)The reconstructed video from viewpoint B using texture map and uncoded(lossless)depth map from viewpoint A.(e)The coded depth(a)(b)Fig.6.PSNR Results.(a)PSNR between captured video and video reconstructed using texture map plus depth map.The depth map is coded using four different ways,“LL”stands for numerically lossless(without inverse coding),Z0=750mm is lossless encoding of sensor capabilities,Z0=600mm,and Z0=300mm are lossy encodings.(b)PSNR between video reconstruction using original(uncoded)depth map and video reconstruction using coded depth map for various Z0values.Theﬁve viewpoints are shown back to back for the47frame sequence(frames1-47represent theﬁrst viewpoint,frames48-94 the second viewpoint,and so on).lossy compression algorithms,or can be used on its own,allowing multiple high-deﬁnition depth camera arrays to beconnected on a single USB interface.R EFERENCES[1]K.M¨u ller,P.Merkle,and T.Wiegand,“3-d video representation usingdepth maps,”Proceedings of the IEEE,vol.99,no.4,pp.643–656,Apr.2011.[2] B.-B.Chai,S.Sethuraman,and H.Sawhney,“A depth map repre-sentation for real-time transmission and view-based rendering of adynamic3d scene,”in Proc.First International Symposium on3D DataProcessing Visualization and Transmission,2002,pp.107–114.[3] A.Smolic,K.Mueller,P.Merkle,P.Kauff,and T.Wiegand,“Anoverview of available and emerging3d video formats and depth en-hanced stereo as efﬁcient generic solution,”in Proc.Picture CodingSymposium,May2009,pp.1–4.[4]O.Faugeras,Three Dimensional Computer Vision.MIT Press,1993.[5]Overview of3D Video Coding,Apr.2008,ISO/IEC JTC1/SC29/WG11,Doc.N9784.[6]S.-Y.Kim and Y.-S.Ho,“Mesh-based depth coding for3d video usinghierarchical decomposition of depth maps,”in Image Processing,2007.ICIP2007.IEEE International Conference on,vol.5,Oct.2007.[7]K.-J.Oh,A.Vetro,and Y.-S.Ho,“Depth coding using a boundaryreconstructionﬁlter for3-d video systems,”IEEE Trans.Circuits andSystems for Video Technology,vol.21,Mar.2011.[8] B.-B.Chai,S.Sethuraman,H.S.Sawhney,and P.Hatrack,“Depth mapcompression for real-time view-based rendering,”Pattern RecognitionLetters,Elsevier,pp.755–766,May2004.[9]H.S.Malvar,“Adaptive Run-Length/Golomb-Rice encoding of quan-tized generalized Gaussian sources with unknown statistics,”in DataCompression Conference,2006,pp.23–32.[10] A.Skodras,C.Christopoulos,and T.Ebrahimi,“The jpeg2000still im-age compression standard,”Signal Processing Magazine,IEEE,vol.18,no.5,pp.36–58,Sep.2001.。