Fast RLS Algorithms Running On Roughly Quantized Signals

合集下载

ON THE COMPUTATIONAL COMPLEXITY OF ALGORITHMS

ON THE COMPUTATIONALCOMPLEXITY OF ALGORITHMSBYJ. HARTMANIS AND R. E. STEARNSI. Introduction. In his celebrated paper [1], A. M. Turing investigated the computability of sequences (functions) by mechanical procedures and showed that the setofsequencescanbe partitioned into computable and noncomputable sequences. One finds, however, that some computable sequences are very easy to compute whereas other computable sequences seem to have an inherent complexity that makes them difficult to compute. In this paper, we investigate a scheme of classifying sequences according to how hard they are to compute. This scheme puts a rich structure on the computable sequences and a variety of theorems are established. Furthermore, this scheme can be generalized to classify numbers, functions, or recognition problems according to their compu-tational complexity.The computational complexity of a sequence is to be measured by how fast a multitape Turing machine can print out the terms of the sequence. This particular abstract model of a computing device is chosen because much of the work in this area is stimulated by the rapidly growing importance of computation through the use of digital computers, and all digital computers in a slightly idealized form belong to the class of multitape Turing machines. More specifically, if Tin) is a computable, monotone increasing function of positive integers into positive integers and if a is a (binary) sequence, then we say that a is in complexity class ST or that a is T-computable if and only if there is a multitape Turing machine 3~ such that 3~ computes the nth term of a. within Tin) operations. Each set ST is recursively enumerable and so no class ST contains all computable sequences. On the other hand, every computable a is contained in some com-plexity class ST. Thus a hierarchy of complexity classes is assured. Furthermore, the classes are independent of time scale or of the speed of the components from which the machines could be built, as there is a "speed-up" theorem which states that ST = SkT f or positive numbers k.As corollaries to the speed-up theorem, there are several limit conditions which establish containment between two complexity classes. This is contrasted later with the theorem which gives a limit condition for noncontainment. One form of this result states that if (with minor restrictions)Received by the editors April 2, 1963 and, in revised form, August 30, 1963.285286J. HARTMANIS AND R. E. STEARNS[May»*«, U(n)then S,; properly contains ST. The intersection of two classes is again a class. The general containment problem, however, is recursively unsolvable.One section is devoted to an investigation as to how a change in the abstract machine model might affect the complexity classes. Some of these are related by a "square law," including the one-tape-multitape relationship: that is if a is T-computable by a multitape Turing machine, then it is T2-computable by a single tape Turing machine. It is gratifying, however, that some of the more obvious variations do not change the classes.The complexity of rational, algebraic, and transcendental numbers is studied in another section. There seems to be a good agreement with our intuitive notions, but there are several questions still to be settled.There is a section in which generalizations to recognition problems and functions are discussed. This section also provides the first explicit "impossibility" proof, by describing a language whose "words" cannot be recognized in real-time [T(n) = n] .The final section is devoted to open questions and problem areas. It is our conviction that numbers and functions have an intrinsic computational nature according to which they can be classified, as shown in this paper, and that there is a good opportunity here for further research.For background information about Turing machines, computability and related topics, the reader should consult [2]. "Real-time" computations (i.e., T(n) = n) were first defined and studied in [3]. Other ways of classifying the complexity of a computation have been studied in [4] and [5], where the complexity is defined in terms of the amount of tape used.II. Time limited computations. In this section, we define our version of a multitape Turing machine, define our complexity classes with respect to this type of machine, and then work out some fundamental properties of these classes.First, we give an English description of our machine (Figure 1) since one must have a firm picture of the device in order to follow our paper. We imagine a computing device that has a finite automaton as a control unit. Attached to this control unit is a fixed number of tapes which are linear, unbounded at both ends, and ruled into an infinite sequence of squares. The control unit has one reading head assigned to each tape, and each head rests on a single square of the assigned tape. There are a finite number of distinct symbols which can appear on the tape squares. Each combination of symbols under the reading heads together with the state of the control unit determines a unique machine operation. A machine operation consists of overprinting a symbol on each tape square under the heads, shifting the tapes independently either one square left, one square1965]ON THE COMPUTATIONAL COMPLEXITY OF ALGORITHMS287ti 1111 i n cm U I I i I I I ID mm.Tn T| in i i i i i i i m-m Î2II I I I I I I I I m II I I I I I I IIP TnTAPESFINITE STATECOMPUTEROUTPUT TAPEFigure 1. An «-tape Turing machineright, or no squares, and then changing the state of the control unit. The machine is then ready to perform its next operation as determined by the tapes and control state. The machine operation is our basic unit of time. One tape is signaled out and called the output tape. The motion of this tape is restricted to one way move-ment, it moves either one or no squares right. What is printed on the output tape and moved from under the head is therefore irrevocable, and is divorced from further calculations.As Turing defined his machine, it had one tape and if someone put k successive ones on the tape and started the machine, it would print some f(k) ones on the tape and stop. Our machine is expected to print successively /(l),/(2), ••• on its output tape. Turing showed that such innovations as adding tapes or tape symbols does not increase the set of functions that can be computed by machines. Since the techniques for establishing such equivalences are common knowledge, we take it as obvious that the functions computable by Turing's model are the same as those computable by our version of a Turing machine. The reason we have chosen this particular model is that it closely resembles the operation of a present day computer; and being interested in how fast a machine can compute, the extra tapes make a difference.To clear up any misconceptions about our model, we now give a formal definition.Definition 1. An n-tape Turing machine, &~, is a set of (3n + 4)-tuples, {(q¡; Stl, Sh, — , Sin ; Sjo, Sjl, — , Sh ; m0, mx, —, m… ; qf)},where each component can take on a finite set of values, and such that for every possible combination of the first n + 1 entries, there exists a unique (3zi-t-4)-tupIe in this set. The first entry, q¡, designates the present state; the next n entries, S(l,-",S,B, designate the present symbols scanned on tapes Tx, •■•, T…,respectively; the next n + 1 symbols SJa, ••-, Sjn, designate the new symbols to be printed on288J. HARTMANIS AND R. E. STEARNS[May tapes T0, •■», T…, respectively; the next n entries describe the tape motions (left, right, no move) of the n + 1 tapes with the restriction m0 # left ; and the last entry gives the new internal state. Tape T0 is called the output tape. One tuple with S¡. = blank symbol for 1 = j = n is designated as starting symbol.Note that we are not counting the output tape when we figure n. Thus a zero-tape machine is a finite automaton whose outputs are written on a tape. We assume without loss of generality that our machine starts with blank tapes.For brevity and clarity, our proofs will usually appeal to the English description and will technically be only sketches of proofs. Indeed, we will not even give a formal definition of a machine operation. A formal definition of this concept can be found in [2].For the sake of simplicity, we shall talk about binary sequences, the general-ization being obvious. We use the notation a = axa2 ••• .Definition 2. Let Tin) be a computable function from integers into integers such that Tin) ^ Tin + 1) and, for some integer k, Tin) ^ n/ k for all n. Then we shall say that the sequence a is T-computable if and only if there exists a multitape Turing machine, 3~, which prints the first n digits of the sequence a on its output tape in no more than Tin) operations, n = 1,2, ••», allowing for the possibility of printing a bounded number of digits on one square. The class of all T-computable binary sequences shall be denoted by ST, and we shall refer to T(n) as a time-function. Sr will be called a complexity class.When several symbols are printed on one square, we regard them as components of a single symbol. Since these are bounded, we are dealing with a finite set of output symbols. As long as the output comes pouring out of the machine in a readily understood form, we do not regard it as unnatural that the output not be strictly binary. Furthermore, we shall see in Corollaries 2.5, 2.7, and 2.8 that if we insist that Tin) ^ n and that only (single) binary outputs be used, then the theory would be within an e of the theory we are adopting.The reason for the condition Tin) ^ n/fc is that we do not wish to regard the empty set as a complexity class. For if a is in ST and F is the machine which prints it, there is a bound k on the number of digits per square of output tape and T can print at most fcn0 d igits in n0 operations. By assumption, Tikn0) ^ n0 or (substituting n0 = n/ k) Tin) à n/ k . On the other hand, Tin) ^ n/ k implies that the sequence of all zeros is in ST because we can print k zeros in each operation and thus ST is not void.Next we shall derive some fundamental properties of our classes.Theorem 1. TAe set of all T-computable binary sequences, ST, is recursively enumerable.Proof. By methods similar to the enumeration of all Turing machines [2] one can first enumerate all multitape Turing machines which print binary sequences. This is just a matter of enumerating all the sets satisfying Definition 1 with the1965] ON THE COMPUTATIONAL C OMPLEXITY O F ALGORITHMS 289 added requirement that Sjo is always a finite sequence of binary digits (regarded as one symbol). Let such an enumeration be &~x, 3~2, ••• . Because T(n) is comput-able, it is possible to systematically modify each ^"¡ to a machine &"'t w ith the following properties : As long as y¡ prints its nth digit within T(n) operations (and this can be verified by first computing T(n) and then looking at the first T(n) operations of ^"¡), then the nth digit of &~'t will be the nth output of &~¡. If &~¡ s hould ever fail to print the nth digit after T(n) operations, then ^"¡'will print out a zero for each successive operation. Thus we can derive a new enumeration •^"'u &~2> "•• If' &\ operates within time T(n), then ^", and ^"¡'compute the same T-computable sequence <x¡. O therwise, &~{ c omputes an ultimately constant sequence a¡ and this can be printed, k bits at a time [where T(n) — n / fc] by a zero tape machine. In either case, a¡ is T-computable and we conclude that {«,} = ST.Corollary 1.1. There does not exist a time-function T such that ST is the set of all computable binary sequences.Proof. Since ST is recursively enumerable, we can design a machine !T which, in order to compute its ith output, computes the z'th bit of sequence a, and prints out its complement. Clearly 3~ produces a sequence a different from all <Xj in ST.Corollary 1.2. For any time-function T, there exists a time-function U such that ST is strictly contained in Sv. Therefore, there are infinitely long chainsSTl cr STl cz •••of distinct complexity classes.Proof. Let &" compute a sequence a not in ST (Corollary 1.1). Let V(n) equal the number of operations required by ^"to compute the nth digit of a. Clearly V is computable and a e Sr. Lett/(n) = max [Tin), V(n)] ,then Vin) is a time-function and clearlyOrí ^3 Oj1 *Since a in Sv and a not in ST, we haveCorollary 1.3. The set of all complexity classes is countable.Proof. The set of enumerable sets is countable.Our next theorem asserts that linear changes in a time-function do not change the complexity class. // r is a real number, we write [r] to represent the smallest integer m such that m = r.290J. HARTMANIS AND R. E. STEARNS[MayTheorem 2. If the sequence cc is T-computable and k is a computable, positive real number, then a is [kT~\-computable; that is,ST = S[kTX.Proof. We shall show that the theorem is true for k = 1/2 and it will be true for fc = 1/ 2m b y induction, and hence for all other computable k since, given k, k ^ 1 /2'" for some m. (Note that if k is computable, then \kT~\ is a computable function satisfying Definition 2.)Let ¡F be a machine which computes a in time T. If the control state, the tape symbols read, and the tape symbols adjacent to those read are all known, then the state and tape changes resulting from the next two operations of &~ are determined and can therefore be computed in a single operation. If we can devise a scheme so that this information is always available to a machine 5~', then &' can perform in one operation what ST does in two operations. We shall next show how, by combining pairs of tape symbols into single symbols and adding extra memory to the control, we can make the information available.In Figure 2(a), we show a typical tape of S" with its head on the square marked 0. In Figure 2(b), we show the two ways we store this information in &~'. Each square of the ^"'-tape contains the information in two squares of the ^-tape. Two of the ^"-tape symbols are stored internally in 3r' and 3~' must also remember which piece of information is being read by 9~. In our figures, this is indicated by an arrow pointed to the storage spot. In two operations of &~, t he heads must move to one of the five squares labeled 2, 1,0, — l,or —2. The corresponding next position of our ^"'-tape is indicated in Figures 2(c)-(g). It is easily verified that in each case, &"' can print or store the necessary changes. In the event that the present symbol read by IT is stored on the right in ¡T' as in Figure 2(f), then the analogous changes are made. Thus we know that ST' can do in one operation what 9~ does in two and the theorem is proved.Corollary 2.1. If U and T are time-functions such that«-.«> Vin)then Svçz ST.Proof. Because the limit is greater than zero, Win) ^ Tin) for some k > 0, and thus Sv = SlkVj çz sT.Corollary 2.2. If U and T are time-functions such thatTin)sup-TTT-r- < 00 ,n-»a> O(n)then SV^ST.Proof. This is the reciprocal of Corollary 2.1.1965] ON THE COMPUTATIONAL COMPLEXITY OF ALGORITHMSE37291/HO W2|3l4[5l(/ZEEI33OÏÏT2Ï31/L-2_-iJ(c]¿m W\2I3I4I5K/(b)ZBE o2|3|4l5|\r2Vi!¿En on2l3l4l5|/l-T-i](d)¿BE2 34[5|6|7ir\10 l|(f)¿m2 34|5l6l7l /L<Dj(g)Figure 2. (a) Tape of ^" with head on 0. (b) Corresponding configurations of 9"'. (c) 9~' if F moves two left, (d) 9~> i f amoves to -1. (e) 9~' if ^~ moves to 0. (f)^"' if amoves to 1.(g) 9~' if 3~ moves two rightCorollary 2.3. If U and T are time-functions such thatTin)0 < hm ) ; < oo ,H-.« Uin)then Srj = ST .Proof. This follows from Corollaries 2.1 and 2.2.Corollary 2.4. // Tin) is a time-function, then Sn^ST . Therefore, Tin) = n is the most severe time restriction.Proof. Because T is a time-function, Tin) = n/ k for some positive k by Definition 2; hence292j. hartmanis and r. e. stearns[Maymf m à 1 > O…-»o, n kand S… çz s T by Corollary 2.1.Corollary 2.5. For any time-function T, Sr=Sv where t/(n)=max \T(n),n\. Therefore, any complexity class may be defined by a function U(n) ^ n. Proof. Clearly inf (T/ Í7) > min (1,1/ k) and sup (T/ U) < 1 .Corollary 2.6. If T is a time-function satisfyingTin) > n and inf -^ > 1 ,…-co nthen for any a in ST, there is a multitape Turing machined with a binary (i.e., two symbol) output which prints the nth digit of a in Tin) or fewer operations. Proof. The inf condition implies that, for some rational e > 0, and integer N, (1 - e) Tin) > n or Tin) > eTin) + n for all n > N. By the theorem, there is a machine 9' which prints a in time \zT(ri)\. 9' can be modified to a machine 9" which behaves like 9' except that it suspends its calculation while it prints the output one digit per square. Obviously, 9" computes within time \i.T(ri)\ + n (which is less than Tin) for n > N). $~" can be modified to the desired machine9~ by adding enough memory to the control of 9~" to print out the nth digit of a on the nth operation for n ^ N.Corollary 2.7. IfT(n)^nandoieST,thenforanys >0, there exists a binary output multitape Turing machine 9 which prints out the nth digit of a in [(1 + e) T(n)J or fewer operations.Proof. Observe that. [(1 + e) T(n)]inf —--——■— — 1 + enand apply Corollary 2.6.Corollary 2.8. // T(n)^n is a time-function and oteST, then for any real numbers r and e, r > e > 0, /Aere is a binary output multitape Turing machine ¡F which, if run at one operation per r—e seconds, prints out the nth digit of a within rT(n) seconds. Ifcc$ ST, there are no such r and e. Thus, when considering time-functions greater or equal to n, the slightest increase in operation speed wipes out the distinction between binary and nonbinary output machines.Proof. This is a consequence of the theorem and Corollary 2.7.Theorem 3. // Tx and T2 are time-functions, then T(n) = min [T^n), T2(n)~] is a time-function and STí O ST2 = ST.1965] ON THE COMPUTATIONAL COMPLEXITY OF ALGORITHMS 293 Proof. T is obviously a time-function. If 9~x is a machine that computes a in time T, and 9~2 computes a in time T2, then it is an easy matter to construct a third device &~ i ncorporating both y, and 3T2 which computes a both ways simul-taneously and prints the nth digit of a as soon as it is computed by either J~x or 9~2. Clearly this machine operates inTin) = min \Txin), T2(n)] .Theorem 4. If sequences a and ß differ in at most a finite number of places, then for any time-function T, cceST if and only if ße ST.Proof. Let ,T print a in time T. Then by adding some finite memory to the control unit of 3", we can obviously build a machine 3~' which computes ß in time T.Theorem 5. Given a time-function T, there is no decision procedure to decide whether a sequence a is in ST.Proof. Let 9~ be any Turing machine in the classical sense and let 3Tx be a multitape Turing machine which prints a sequence ß not in ST. Such a 9~x exists by Theorem 1. Let 9~2 be a multitape Turing machine which prints a zero for each operation $~ makes before stopping. If $~ should stop after k operations, then 3~2 prints the /cth and all subsequent output digits of &x. Let a be the sequence printed by 9"2, Because of Theorem 4, a.eST if and only if 9~ does not stop. Therefore, a decision procedure for oceST would solve the stopping problem which is known to be unsolvable (see [2]).Corollary 5.1. There is no decision procedure to determine if SV=ST or Sv c STfor arbitrary time-functions U and T.Proof. Similar methods to those used in the previous proof link this with the stopping problem.It should be pointed out that these unsolvability aspects are not peculiar to our classification scheme but hold for any nontrivial classification satisfying Theorem 4.III. Other devices. The purpose of this section is to compare the speed of our multitape Turing machine with the speed of other variants of a Turing machine. Most important is the first result because it has an application in a later section.Theorem 6. If the sequence a is T-computable by multitape Turing machine, !T, then a is T2-computable by a one-tape Turing machine 3~x .Proof. Assume that an n-tape Turing machine, 3~, is given. We shall now describe a one-tape Turing machine Px that simulates 9~, and show that if &" is a T-computer, then S~x is at most a T2-computer.294j. hartmanis and r. e. stearns[May The S~ computation is simulated on S'y as follows : On the tape of & y will be stored in n consecutive squares the n symbols read by S on its n tapes. The symbols on the squares to the right of those symbols which are read by S~ on its n tapes are stored in the next section to the right on the S'y tape, etc., as indicated in Figure 3, where the corresponding position places are shown. The1 TAPE T|A 1 TAPE T2I?TAPE Tn(a)J-"lo(b)Figure 3. (a) The n tapes of S. (b) The tape of S~\machine Tx operates as follows: Internally is stored the behavioral description of the machine S", so that after scanning the n squares [J], [o], ■■■, [5]»-^"îdetermines to what new state S~ will go, what new symbols will be printed by it on its n tapes and in which direction each of these tapes will be shifted. First,¡Fy prints the new symbols in the corresponding entries of the 0 block. Then it shifts the tape to the right until the end of printed symbols is reached. (We can print a special symbol indicating the end of printed symbols.) Now the machine shifts the tape back, erases all those entries in each block of n squares which correspond to tapes of S~ which are shifted to the left, and prints them in the corresponding places in the next block. Thus all those entries whose corresponding S~ tapes are shifted left are moved one block to the left. At the other end of the tape, the process is reversed and returning on the tape 9y transfers all those entries whose corresponding S~ tapes are shifted to the right one block to the right on the S'y tape. When the machine S', reaches the rigAz most printed symbol on its tape, it returns to the specially marked (0) block which now contains1965] ON THE COMPUTATIONAL COMPLEXITY OF ALGORITHMS 295 the n symbols which are read by &~ o n its next operation, and #", has completed the simulation of one operation of 9~. It can be seen that the number of operations of Tx is proportional to s, the number of symbols printed on the tape of &"¡. This number increases at most by 2(n + 1) squares during each operation of &. Thus, after T(fc) operations of the machine J~, the one-tape machine S"t will perform at most7(*)T,(fc) =C0+ T Cxii = loperations, where C0 and C, are constants. But thenr,(fe) g C2 £ i^C [T(fc)]2 .¡ =iSince C is a constant, using Theorem 2, we conclude that there exists a one tape machine printing its fcth output symbol in less than T(fc)2 tape shifts as was to be shown.Corollary 6.1. The best computation time improvement that can be gained in going from n-tape machines to in + l)-tape machines is the square root of the computation time.Next we investigate what happens if we allow the possibility of having several heads on each tape with some appropriate rule to prevent two heads from occupy-ing the same square and giving conflicting instructions. We call such a device a multihead Turing machine. Our next result states that the use of such a model would not change the complexity classes.Theorem 7. Let a. be computable by a multihead Turing machine 3T which prints the nth digit in Tin) or less operations where T is a time-function; then a is in ST .Proof. We shall show it for a one-tape two-head machine, the other cases following by induction. Our object is to build a multitape machine Jr' which computes a within time 4T which will establish our result by Theorem 2. The one tape of !T will be replaced by three tapes in 9"'. Tape a contains the left-hand information from 9", tape b contains the right-hand information of 9~, and tape c keeps count, two at a time, of the number of tape squares of ST which are stored on both tapes a and b_. A check mark is always on some square of tape a to indicate the rightmost square not stored on tape b_ and tape b has a check to indicate the leftmost square not stored on tape a.When all the information between the heads is on both tapes a and b. then we have a "clean" position as shown in Figure 4(a). As &" operates, then tape296j. hartmanis and r. e. stearns [May7/Fio TTzTTR" 5 "6Ï7M I 4T5T6" 7 8TT77' ^f(a) rT-Tô:TT2l3l4l?l \J ¿Kh.1y(b) J I l?IM2!3|4 5.6T7 /I |?|4,|5|6 7 8TT7(c) f\7~ /\V\/\A7\7M J M/l/yTITTTTTTJ(a) (b)Figure 4. (a) .^"' in clean position, (b) S' in dirty positiona performs like the left head of S~, tape A behaves like the right head, and tape c reduces the count each time a check mark is moved. Head a must carry the check right whenever it moves right from a checked square, since the new symbol it prints will not be stored on tape A; and similarly head A moves its check left.After some m operations of S~' corresponding to m operations of S~, a "dirty"position such as Figure 4(b) is reached where there is no overlapping information.The information (if any) between the heads of S~ must be on only one tape of S~',say tape A as in Figure 4(b). Head A then moves to the check mark, the between head information is copied over onto tape a, and head amoves back into position.A clean position has been achieved and S~' is ready to resume imitating S~. The time lost is 3/ where I is the distance between the heads. But / ^ m since headA has moved / squares from the check mark it left. Therefore 4m is enough time to imitate m operations of S~ and restore a clean position. Thusas was to be shown.This theorem suggests that our model can tolerate some large deviations without changing the complexity classes. The same techniques can be applied to other changes in the model. For example, consider multitape Turing ma-chines which have a fixed number of special tape symbols such that each symbol can appear in at most one square at any given time and such that the reading head can be shifted in one operation to the place where the special symbol is printed, no matter how far it is on the tape. Turing machines with such "jump instructions^ are similarly shown to leave the classes unchanged.Changes in the structure of the tape tend to lead to "square laws." For example,consider the following :Definition 3. A two-dimensional tape is an unbounded plane which is sub-divided into squares by equidistant sets of vertical and horizontal lines as shown in Figure 5. The reading head of the Turing machine with this two-dimensional tape can move either one square up or down, or one square left or right on each operation. This definition extends naturally to higher-dimensional tapes.。

Advanced Circuit Simulation软件用户指南说明书

.SNNOISERuns periodic AC noise analysis on nonautonomous circuits in a large-signal periodic steady state..SNNOISE output insrc frequency_sweep [N1, +/-1]+ [LISTFREQ=(freq1 [freq2 ... freqN ]|none|all]) [LISTCOUNT=num ]+ [LISTFLOOR=val ] [LISTSOURCES=on|off].HBAC / .SNACRuns periodic AC analysis on circuits operating in a large-signal periodic steady state..HBAC frequency_sweep .SNAC frequency_sweep.HBXF / .SNXFCalculates transfer function from the given source in the circuit to the designated output..HBXF out_var frequency_sweep .SNXF out_var frequency_sweep.PTDNOISECalculates the noise spectrum and total noise at a point in time..PTDNOISE output TIME=[val |meas |sweep ] +[TDELTA=time_delta ] frequency_sweep+[listfreq=(freq1 [freq2 ... freqN ]|none|all)] [listcount=num ]+[listfloor=val ] [listsources=on|off]RF OptionsSIM_ACCURACY=x Sets and modifies the size of the time steps. The higher the value, thegreater the accuracy; the lower the value, the faster the simulation runtime. Default is 1.TRANFORHB=n 1 Forces HB analysis to recognize or ignore specific V/I sources, 0 (default) ignores transient descriptions of V/I sources.HBCONTINUE=n Specifies whether to use the sweep solution from the previous simulation as the initial guess for the present simulation. 0 restarts each simulation in a sweep from the DC solution, 1 (default) uses the previous sweep solution as the initial guess.HBSOLVER=n Specifies a preconditioner for solving nonlinear circuits. 0 invokes the direct solver. 1 (default) invokes the- matrix-free Krylov solver. 2 invokes the two-level hybrid time-frequency domain solver.SNACCURACY=n Sets and modifies the size of the time steps. The higher the value, the greater the accuracy; the lower the value, the faster the simulation runtime. Default is 10.SAVESNINIT=”filename ” Saves the operating point at the end of SN initialization.LOADSNINIT=”filename ” Loads the operating point saved at end of SN initialization.Output Commands.BIASCHK .MEASURE .PRINT .PROBEFor details about all commands and options, see the HSPICE ® Reference Manual: Commands and Control Options.Synopsys Technical Publications 690 East Middlefield Road Mountain View, CA 94043Phone (650) 584-5000 or (800) Copyright ©2017 Synopsys, Inc. All rights reserved.Signal Integrity Commands.LINCalculates linear transfer and noise parameters for a general multi-port network..LIN [sparcalc [=1|0]] [modelname=modelname ] [filename=filename ]+ [format=selem|citi|touchstone|touchstone2] [noisecalc [=1|0]]+ [gdcalc [=1|0]] [dataformat=ri|ma|db]+ [listfreq=(freq1 [freq2 ... freqN ]|none|all)] [listcount=num ]+ [listfloor=val ] [listsources=1|0|yes|no].STATEYEPerforms Statistical Eye Diagram analysis..STATEYE T=time_interval Trf=rise_fall_time [Tr=rise_time ] + [Tf=fall_time ] Incident_port=idx1[, idx2, … idxN ]+ Probe_port=idx1[, idx2, … idxN ] [Tran_init=n_periods ] + [V_low=val ] [V_high=val ] [TD_In=val ] [TD_PROBE=val ]+ [T_resolution=n ] [V_resolution=n ] [VD_range=val ]+ [EDGE=1|2|4|8] [MAX_PATTERN=n ] [PATTERN_REPEAT=n ] + [SAVE_TR=ascii] [LOAD_TR=ascii] [SAVE_DIR=string ]+ [IGNORE_Bits=n ] [Tran_Bit_Seg=n ]+ [MODE=EDGE|CONV|TRAN] [XTALK_TYPE = SYNC|ASYNC|DDP|NO|ONLY]+ [Unfold_Length=n ] [TXJITTER_MODE = 1|2]RF Analysis Commands.ACPHASENOISEHelps interpret signal and noise quantities as phase variables for accumulated jitter for closed-loop PLL analysis..ACPHASENOISE output input [interval ] carrier=freq+ [listfreq=(freq1 [freq2 ... freqN ]|none|all)][listcount=num ]+ [listfloor=val ] [listsources=1|0].HBRuns periodic steady state analysis with the single and multitone Harmonic Balance algorithm..HB TONES=F1[,F2,…,FN ] [SUBHARMS=SH ] [NHARMS=H1[,H2,…,HN ]]+ [INTMODMAX=n ] [SWEEP parameter_sweep ].SNRuns periodic steady state analysis using the Shooting Newton algorithm..SN TRES=Tr PERIOD=T [TRINIT=Ti ] [MAXTRINITCYCLES=integer ]+ [SWEEP parameter_sweep ] [NUMPEROUT=val ].SN TONE=F1 [TRINIT=Ti ] NHARMS=N [MAXTRINITCYCLES=integer ]+ [NUMPEROUT=val ] [SWEEP parameter_sweep ].HBOSC / .SNOSCPerforms analysis on autonomous oscillator circuits..HBOSC TONE=F1 NHARMS=H1+ PROBENODE=N1,N2,VP [FSPTS=NUM,MIN,MA X]+ [SWEEP parameter_sweep ] [SUBHARMS=I ] [STABILITY=-2|-1|0|1|2].SNOSC TONE=F1 NHARMS=H1 [TRINIT=Ti ]+ [OSCTONE=N ] [MAXTRINITCYCLES=N ]+ [SWEEP parameter_sweep ].PHASENOISEInterprets signal / noise quantities as phase variables for accumulated jitter in closed-loop PLL analysis..PHASENOISE output frequency_sweep [method= 0|1|2]+ [listfreq=(freq1 [freq2 ... freqN ]|none|all)] [listcount=num ]+ [listfloor=val ] [listsources=1|0] [carrierindex=int ].HBNOISEPerforms cyclo-stationary noise analysis on circuits in a large-signal periodic steady state..HBNOISE output insrc parameter_sweep [N1, N2, ..., NK ,+/-1]+ [LISTFREQ=(freq1 [freq2 ... freqN ]|none|all]) [LISTCOUNT=num ]+ [LISTFLOOR=val ] [LISTSOURCES=on|off].NOISERuns noise analysis in frequency domain..NOISE v(out ) vin [interval ] [listckt[=1|0]]+ [listfreq=freq1 [freq2 ... freqN ]|none|all]) [listcount=num ]+ [listfloor=val ] [listsources=1|0|yes|no]] [listtype=1|0].ALTERReruns a simulation using different parameters and data from a specified sequence or block. The .ALTER block can contain element commands and .AC, .ALIAS, .DATA, .DC, .DEL LIB, .HDL, .IC (initial condition), .INCLUDE, .LIB, .MODEL, .NODESET, .OP, .OPTION, .PARAM, .TEMP, .TF, .TRAN, and .VARIATION commands..ALTER title_string.DCPerforms DC analyses..DC var1 START=start1 STOP=stop1 STEP=incr1Parameterized Sweep.DC var1 start1 stop1 incr1 [SWEEP var2 type np start2 stop2].DC var1 START=[par_expr1] STOP=[par_expr2] STEP=[par_expr3]Data-Driven Sweep.DC var1 type np start1 stop1 [SWEEP DATA=datanm (Nums )].DC DATA=datanm [SWEEP var2 start2 stop2 incr2].DC DATA=datanm (Nums )Monte Carlo Analysis.DC var1 start1 stop1 incr1 [SWEEP MONTE=MCcommand ].DC MONTE=MCcommand.OPCalculates the operating point of the circuit..OP format_time format_time ... [interpolation].PARAMDefines parameters. Parameters are names that have associated numeric values or functions..PARAM ParamName = RealNumber | ‘AlgebraicExpression’ | DistributionFunction (Arguments ) | str(‘string’) | OPT xxx (initial_guess, low_limit, upper_limit )Monte Carlo Analysis.PARAM mcVar = UNIF(nominal_val , rel_variation [, multiplier ]) | AUNIF(nominal_val , abs_variation [, multiplier ])| GAUSS(nominal_val , rel_variation , num_sigmas [, multiplier ]) | AGAUSS(nominal_val , abs_variation , num_sigmas [, multiplier ]) | LIMIT(nominal_val , abs_variation ).STOREStarts creation of checkpoint files describing a running process during transient analysis..STORE [file=checkpoint_file ] [time=time1]+ [repeat=checkpoint_interval ].TEMPPerforms temperature analysis at specified temperatures..TEMP t1 [t2 t3 ...].TRANPerforms a transient analysis.Single-Point Analysis.TRAN tstep1 tstop1 [START=val ] [UIC]Multipoint Analysis.TRAN tstep1 tstop1 [tstep2 tstop2 ... tstepN tstopN ]+ RUNLVL =(time1 runlvl1 time2 runlvl2...timeN runlvlN )+ [START=val ] [UIC] [SWEEP var type np pstart pstop ]Monte Carlo Analysis.TRAN tstep1 tstop1 [tstep2 tstop2 ... tstepN tstopN ]+ [START=val ] [UIC] [SWEEP MONTE=MCcommand ]Invoking HSPICESimulation Modehspice [-i] input_file [-o [output_file ]] [-hpp] [-mt #num ][-gz] [-d] [-case][-hdl filename ] [-hdlpath pathname ] [-vamodel name ]Distributed-Processing Modehspice [-i] input_file [-o [output_file ]] -dp [#num ][-dpconfig [dp_configuration_file ]] [-dplocation [NFS|TMP][-merge]Measurement Modehspice -meas measure_file -i wavefile -o [output_file ]Help Modehspice [-h] [-doc] [-help] [-v]Argument Descriptions-i input_file Specifies the input netlist file name.-o output_file Name of the output file. HSPICE appends the extension .lis.-hpp Invokes HSPICE Precision Parallel.-mt #num Invokes multithreading and specifies the number of processors. Works best when -hpp is used.-gz Generates compression output on analysis results for these output types: .tr#, .ac#, .sw#, .ma#, .mt#, .ms#, .mc#, and .print*.-d (UNIX) Displays the content of .st0 files on screen while running HSPICE.-case Enable case sensitivity.-hdl filename Specifies a Verilog-A file.-hdlpath pathname Specifies the search path for Verilog-A files.-vamodel name Specifies the cell name for Verilog-A definitions.-dp #num -dpconfig dpconfig_file -dplocation [NFS|TMP] Invokesdistributed processing and specifies number of processes, the configuration file for DP, and the location of the output files.-merge Merge the output files in the distributed-processing mode.-meas measure_file Calculates new measurements from a previous simulation.-h Outputs the command line help message.-doc Opens the PDF documentation set for HSPICE (requires Adobe Acrobat Reader or other PDF document reader).-help Invokes the online help system (requires a Web browser).-v Outputs HSPICE version information.HSPICE is fully integrated with the Synopsys® Custom Compiler™ Simulation and Analysis Environment (SAE). See the Custom Compiler™ Simulation and Analysis Environment User Guide .To use the HSPICE integration to the Cadence® Virtuoso® Analog Design Environment, go to /$INSTALLDIR/interface/ and follow the README instructions.Analysis Commands.ACPerforms AC analyses.Single / Double Sweep.AC type np fstart fstop.AC type np fstart fstop [SWEEP var+ [START=]start [STOP=]stop [STEP=]incr ].AC type np fstart fstop [SWEEP var type np start stop ]Sweep Using Parameters.AC type np fstart fstop [SWEEP DATA=datanm (Nums )].AC DATA=datanm.AC DATA=datanm [SWEEP var [START=]start [STOP=]stop [STEP=]incr ].AC DATA=datanm [SWEEP var type np start stop ]Monte Carlo Analysis.AC type np fstart fstop [SWEEP MONTE=MCcommand ].LSTBInvokes loop stability analysis..LSTB [lstbname ] mode=[single|diff|comm + vsource=[vlstb |vlstbp,vlstbn ]Data-Driven Sweep.TRAN DATA=datanm.TRAN DATA=datanm [SWEEP var type np pstart pstop ].TRAN tstep1 tstop1 [tstep2 tstop2 ... tstepN tstopN ]+ [START=val ] [UIC] [SWEEP DATA=datanm (Nums )]Time Window-based Speed/Accuracy Tuning by RUNLVL.TRAN tstep tstop [RUNLVL=(time1 runlvl1...timeN runlvlN )]Circuit Block-based Speed/Accuracy Tuning by RUNLVL.TRAN tstep tstop+ [INST=inst_exp1 RUNLVL=(time11 runlvl11...time1N runlvl1N )]+ [SUBCKT=subckt_exp2 RUNLVL=(time21 runlvl21...time2N runlvl2N )]Time Window-based Temperature Setting.TRAN tstep tstop [tempvec=(t1 Temp1 t2 Temp2 t3 Temp3...)+[tempstep=val ]].TRANNOISEActivates transient noise analysis to compute the additional noise variables over a standard .TRAN analysis..TRANNOISE output [METHOD=MC] [SEED=val ] [SAMPLES=val ] [START=x ]+ [AUTOCORRELATION=0|1|off|on] [FMIN=val ] [FMAX=val ] [SCALE=val ]+ [PHASENOISE=0|1|2] [JITTER=0|1|2] [REF=srcName ] [PSD=0|1]HSPICE Options.OPTION opt1 [opt2 opt3 …]opt1 opt2 … Specify input control options.General OptionsALTCC=n Enables reading the input netlist once for multiple .ALTER statements. Default is 0.LIS_NEW=x Enables streamlining improvements to the *.lis file. Default is 0. SCALE=x Sets the element scaling factor. Default is 1.POSTTOP=n Outputs instances up to n levels deep. Default is 0.POSTLVL=n Limits data written to the waveform file to the level of nodes specified by n .POST=n Saves results for viewing by an interactive waveform viewer. Default is 0.PROBE=n Limits post-analysis output to only variables specified in .PROBE and .PRINTstatements. Default is 0.RC Reduction OptionsSIM_LA=name Starts linear matrix (RC) reduction to the PACT, PI, or LNE algorithm. Defaultis off.Transient OptionsAUTOSTOP=n Stops transient analysis after calculating all TRIG-TARG, FIND-WHEN, andFROM-TO measure functions. Default is 0.METHOD=name Sets numerical integration method for a transient analysis to GEAR, or TRAP(default), or BDF.RUNLVL=n Controls the speed and accuracy trade-off; where n can be 1 through 6. The higher the value, the greater the accuracy; the lower the value, the faster the simulation runtime. Default is 3.Variability and Monte Carlo Analysis.AC .DC .TRAN .MEASURE .MODEL .PARAM .ACMATCHCalculates the effects of variations on the AC transfer function, with one or more outputs..ACMatch Vm(n1) Vp(n1) Vr(n1) Vi(n1) Vm(n1,n2) Im(Vmeas ).DCMATCHCalculates the effects of variations on the DC operating point, with one or more outputs..DCMatch V(n1) V(n1,n2) I(Vmeas )。

fastmarching算法原理

fastmarching算法原理Fast marching algorithm (FMA) is a numerical technique used for solving the Eikonal equation, which describes the propagation of wavefronts. This algorithm is widely used in various fields such as computer graphics, medical imaging, and computational physics.The basic principle of the fast marching algorithm is to iteratively update the travel time (or distance) from a given starting point to all other points in the computational domain. This is done by considering the local characteristics of the wavefront and updating the travel time based on the minimum arrival time from neighboring points.The algorithm starts by initializing the travel time at the starting point to zero and setting the travel time at all other points to infinity. Then, it iteratively updates the travel time at each grid point based on the neighboring points, ensuring that the travel time decreasesmonotonically as the wavefront propagates outward.At each iteration, the algorithm selects the grid point with the minimum travel time among the set of points that have not been updated yet. It then updates the travel time at this point based on the local wavefront characteristics and the travel times of its neighboring points. This process is repeated until the travel times at all points have been computed.One of the key advantages of the fast marching algorithm is its computational efficiency. By exploiting the properties of the Eikonal equation and the characteristics of the wavefront, the algorithm can compute the travel times in a relatively short amount of time, making it suitable for real-time or interactive applications.In conclusion, the fast marching algorithm is a powerful numerical technique for solving the Eikonal equation and computing wavefront propagation. Itsefficiency and versatility make it a valuable tool invarious fields, enabling the simulation and analysis of wave propagation phenomena in a wide range of applications.。

Numerical Linear Algebra

letters (and occasionally lower case letters) will denote scalars. RI will denote the set of real
tions to the algorithm, it can be made to work quite well. We understand these algorithmic
transformations most completely in the case of simple algorithms like Cholesky, on simple
LA
Numerical Linear Algebra
Copyright (C) 1991, 1992, 1993, 1994, 1995 by the Computational Science Education Project
This electronic book is copyrighted, and protected by the copyright laws of the United States. This (and all associated documents in the system) must contain the above copyright notice. If this electronic book is used anywhere other than the project's original system, CSEP must be noti ed in writing (email is acceptable) and the copyright notice must remain intact.

Synopsys OptoDesigner 2020.09安装指南说明书

Accidental full scan proliferation by a build server farm..................................................................... 25 Solution......................................................................................................................................25
3. Troubleshooting scanning issues........................................................25
Accidental full scan proliferation by folder paths which include build or commit ID............................ 25 Solution......................................................................................................................................25
Contents
Contents
Preface....................................................................................................5
1. Scanning best practices......................................................................... 8

斑马技术公司DS8108数字扫描仪产品参考指南说明书

Chapter 1: Getting Started Introduction .................................................................................................................................... 1-1 Interfaces ....................................................................................................................................... 1-2 Unpacking ...................................................................................................................................... 1-2 Setting Up the Digital Scanner ....................................................................................................... 1-3 Installing the Interface Cable .................................................................................................... 1-3 Removing the Interface Cable .................................................................................................. 1-4 Connecting Power (if required) ................................................................................................ 1-4 Configuring the Digital Scanner ............................................................................................... 1-4

Declaration of Authorship

Efficient Hardware Architectures forModular MultiplicationbyDavid Narh AmanorA Thesissubmitted toThe University of Applied Sciences Offenburg, GermanyIn partial fulfillment of the requirements for theDegree of Master of ScienceinCommunication and Media EngineeringFebruary, 2005Approved:Prof. Dr. Angelika Erhardt Prof. Dr. Christof Paar Thesis Supervisor Thesis SupervisorDeclaration of Authorship“I declare in lieu of an oath that the Master thesis submitted has been produced by me without illegal help from other persons. I state that all passages which have been taken out of publications of all means or unpublished material either whole or in part, in words or ideas, have been marked as quotations in the relevant passage. I also confirm that the quotes included show the extent of the original quotes and are marked as such. I know that a false declaration willhave legal consequences.”David Narh AmanorFebruary, 2005iiPrefaceThis thesis describes the research which I conducted while completing my graduate work at the University of Applied Sciences Offenburg, Germany.The work produced scalable hardware implementations of existing and newly proposed algorithms for performing modular multiplication.The work presented can be instrumental in generating interest in the hardware implementation of emerging algorithms for doing faster modular multiplication, and can also be used in future research projects at the University of Applied Sciences Offenburg, Germany, and elsewhere.Of particular interest is the integration of the new architectures into existing public-key cryptosystems such as RSA, DSA, and ECC to speed up the arithmetic.I wish to thank the following people for their unselfish support throughout the entire duration of this thesis.I would like to thank my external advisor Prof. Christof Paar for providing me with all the tools and materials needed to conduct this research. I am particularly grateful to Dipl.-Ing. Jan Pelzl, who worked with me closely, and whose constant encouragement and advice gave me the energy to overcome several problems I encountered while working on this thesis.I wish to express my deepest gratitude to my supervisor Prof. Angelika Erhardt for being in constant touch with me and for all the help and advice she gave throughout all stages of the thesis. If it was not for Prof. Erhardt, I would not have had the opportunity of doing this thesis work and therefore, I would have missed out on a very rewarding experience.I am also grateful to Dipl.-Ing. Viktor Buminov and Prof. Manfred Schimmler, whose newly proposed algorithms and corresponding architectures form the basis of my thesis work and provide the necessary theoretical material for understanding the algorithms presented in this thesis.Finally, I would like to thank my brother, Mr. Samuel Kwesi Amanor, my friend and Pastor, Josiah Kwofie, Mr. Samuel Siaw Nartey and Mr. Csaba Karasz for their diverse support which enabled me to undertake my thesis work in Bochum.iiiAbstractModular multiplication is a core operation in many public-key cryptosystems, e.g., RSA, Diffie-Hellman key agreement (DH), ElGamal, and ECC. The Montgomery multiplication algorithm [2] is considered to be the fastest algorithm to compute X*Y mod M in computers when the values of X, Y and M are large.Recently, two new algorithms for modular multiplication and their corresponding architectures were proposed in [1]. These algorithms are optimizations of the Montgomery multiplication algorithm [2] and interleaved modular multiplication algorithm [3].In this thesis, software (Java) and hardware (VHDL) implementations of the existing and newly proposed algorithms and their corresponding architectures for performing modular multiplication have been done. In summary, three different multipliers for 32, 64, 128, 256, 512, and 1024 bits were implemented, simulated, and synthesized for a Xilinx FPGA.The implementations are scalable to any precision of the input variables X, Y and M.This thesis also evaluated the performance of the multipliers in [1] by a thorough comparison of the architectures on the basis of the area-time product.This thesis finally shows that the newly optimized algorithms and their corresponding architectures in [1] require minimum hardware resources and offer faster speed of computation compared to multipliers with the original Montgomery algorithm.ivTable of Contents1Introduction 91.1 Motivation 91.2 Thesis Outline 10 2Existing Architectures for Modular Multiplication 122.1 Carry Save Adders and Redundant Representation 122.2 Complexity Model 132.3 Montgomery Multiplication Algorithm 132.4 Interleaved Modular Multiplication 163 New Architectures for Modular Multiplication 193.1 Faster Montgomery Algorithm 193.2 Optimized Interleaved Algorithm 214 Software Implementation 264.1 Implementational Issues 264.2 Java Implementation of the Algorithms 264.2.1 Imported Libraries 274.2.2 Implementation Details of the Algorithms 284.2.3 1024 Bits Test of the Implemented Algorithms 30 5Hardware Implementation 345.1 Modeling Technique 345.2 Structural Elements of Multipliers 34vTable of Contents vi5.2.1 Carry Save Adder 355.2.2 Lookup Table 375.2.3 Register 395.2.4 One-Bit Shifter 405.3 VHDL Implementational Issues 415.4 Simulation of Architectures 435.5 Synthesis 456 Results and Analysis of the Architectures 476.1 Design Statistics 476.2 Area Analysis 506.3 Timing Analysis 516.4 Area – Time (AT) Analysis 536.5 RSA Encryption Time 557 Discussion 567.1 Summary and Conclusions 567.2 Further Research 577.2.1 RAM of FPGA 577.2.2 Word Wise Multiplication 57 References 58List of Figures2.3 Architecture of the loop of Algorithm 1b [1] 163.1 Architecture of Algorithm 3 [1] 21 3.2 Inner loop of modular multiplication using carry save addition [1] 233.2 Modular multiplication with one carry save adder [1] 254.2.2 Path through the loop of Algorithm 3 29 4.2.3 A 1024 bit test of Algorithm 1b 30 4.2.3 A 1024 bit test of Algorithm 3 314.2.3 A 1024 bit test of Algorithm 5 325.2 Block diagram showing components that wereimplemented for Faster Montgomery Architecture 35 5.2.1 VHDL implementation of carry save adder 36 5.2.2 VHDL implementation of lookup table 38 5.2.3 VHDL implementation of register 39 5.2.4 Implementation of ‘Shift Right’ unit 40 5.3 32 bit blocks of registers for storing input data bits 425.4 State diagram of implemented multipliers 436.2 Percentage of configurable logic blocks occupied 50 6.2 CLB Slices versus bitlength for Fast Montgomery Multiplier 51 6.3 Minimum clock periods for all implementations 52 6.3 Absolute times for all implementations 52 6.4 Area –time product analysis 54viiList of Tables6.1 Percentage of configurable logic block slices(out of 19200) occupied depending on bitlength 47 6.1 Number of gates 48 6.1 Minimum period and maximum frequency 48 6.1 Number of Dffs or Latches 48 6.1 Number of Function Generators 49 6.1 Number of MUX CARRYs 49 6.1 Total equivalent gate count for design 49 6.3 Absolute Time (ns) for all implementations 53 6.4 Area –Time Product Values 54 6.5 Time (ns) for 1024 bit RSA encryption 55viiiChapter 1Introduction1.1 MotivationThe rising growth of data communication and electronic transactions over the internet has made security to become the most important issue over the network. To provide modern security features, public-key cryptosystems are used. The widely used algorithms for public-key cryptosystems are RSA, Diffie-Hellman key agreement (DH), the digital signature algorithm (DSA) and systems based on elliptic curve cryptography (ECC). All these algorithms have one thing in common: they operate on very huge numbers (e.g. 160 to 2048 bits). Long word lengths are necessary to provide a sufficient amount of security, but also account for the computational cost of these algorithms.By far, the most popular public-key scheme in use today is RSA [9]. The core operation for data encryption processing in RSA is modular exponentiation, which is done by a series of modular multiplications (i.e., X*Y mod M). This accounts for most of the complexity in terms of time and resources needed. Unfortunately, the large word length (e.g. 1024 or 2048 bits) makes the RSA system slow and difficult to implement. This gives reason to search for dedicated hardware solutions which compute the modular multiplications efficiently with minimum resources.The Montgomery multiplication algorithm [2] is considered to be the fastest algorithm to compute X*Y mod M in computers when the values of X, Y and M are large. Another efficient algorithm for modular multiplication is the interleaved modular multiplication algorithm [4].In this thesis, two new algorithms for modular multiplication and their corresponding architectures which were proposed in [1] are implemented. TheseIntroduction 10 algorithms are optimisations of Montgomery multiplication and interleaved modular multiplication. They are optimised with respect to area and time complexity. In both algorithms the product of two n bit integers X and Y modulo M are computed by n iterations of a simple loop. Each loop consists of one single carry save addition, a comparison of constants, and a table lookup.These new algorithms have been proved in [1] to speed-up the modular multiplication operation by at least a factor of two in comparison with all methods previously known.The main advantages offered by these new algorithms are;•faster computation time, and•area requirements and resources for the implementation of their architectures in hardware are relatively small compared to theMontgomery multiplication algorithm presented in [1, Algorithm 1a and1b].1.2 Thesis OutlineChapter 2 provides an overview of the existing algorithms and their corresponding architectures for performing modular multiplication. The necessary background knowledge which is required for understanding the algorithms, architectures, and concepts presented in the subsequent chapters is also explained. This chapter also discusses the complexity model which was used to compare the existing architectures with the newly proposed ones.In Chapter 3, a description of the new algorithms for modular multiplication and their corresponding architectures are presented. The modifications that were applied to the existing algorithms to produce the new optimized versions are also explained in this chapter.Chapter 4 covers issues on the software implementation of the algorithms presented in Chapters 2 and 3. The special classes in Java which were used in the implementation of the algorithms are mentioned. The testing of the new optimized algorithms presented in Chapter 3 using random generated input variables is also discussed.The hardware modeling technique which was used in the implementation of the multipliers is explained in Chapter 5. In this chapter, the design capture of the architectures in VHDL is presented and the simulations of the VHDLIntroduction 11 implementations are also discussed. This chapter also discusses the target technology device and synthesis results. The state machine of the implemented multipliers is also presented in this chapter.In Chapter 6, analysis and comparison of the implemented multipliers is given. The vital design statistics which were generated after place and route were tabulated and graphically represented in this chapter. Of prime importance in this chapter is the area – time (AT) analysis of the multipliers which is the complexity metric used for the comparison.Chapter 7 concludes the thesis by setting out the facts and figures of the performance of the implemented multipliers. This chapter also itemizes a list of recommendations for further research.Chapter 2Existing Architectures for Modular Multiplication2.1 Carry Save Adders and Redundant RepresentationThe core operation of most algorithms for modular multiplication is addition. There are several different methods for addition in hardware: carry ripple addition, carry select addition, carry look ahead addition and others [8]. The disadvantage of these methods is the carry propagation, which is directly proportional to the length of the operands. This is not a big problem for operands of size 32 or 64 bits but the typical operand size in cryptographic applications range from 160 to 2048 bits. The resulting delay has a significant influence on the time complexity of these adders.The carry save adder seems to be the most cost effective adder for our application. Carry save addition is a method for an addition without carry propagation. It is simply a parallel ensemble of n full-adders without any horizontal connection. Its function is to add three n -bit integers X , Y , and Z to produce two integers C and S as results such thatC + S = X + Y + Z,where C represents the carry and S the sum.The i th bit s i of the sum S and the (i + 1)st bit c i+1 of carry C are calculated using the boolean equations,001=∨∨=⊕⊕=+c z y z x y x c z y x s ii i i i i i i i i iExisting Architectures for Modular Multiplication 13 When carry save adders are used in an algorithm one uses a notation of the form (S, C) = X + Y + Zto indicate that two results are produced by the addition.The results are now represented in two binary words, an n-bit word S and an (n+1) bit word C. Of course, this representation is redundant in the sense that we can represent one value in several different ways. This redundant representation has the advantage that the arithmetic operations are fast, because there is no carry propagation. On the other hand, it brings to the fore one basic disadvantage of the carry save adder:•It does not solve our problem of adding two integers to produce a single result. Rather, it adds three integers and produces two such that the sum of these two is equal to that of the three inputs. This method may not be suitable for applications which only require the normal addition.2.2 Complexity ModelFor comparison of different algorithms we need a complexity model that allows fora realistic evaluation of time and area requirements of the considered methods. In[1], the delay of a full adder (1 time unit) is taken as a reference for the time requirement and quantifies the delay of an access to a lookup table with the same time delay of 1 time unit. The area estimation is based on empirical studies in full-custom and semi-custom layouts for adders and storage elements: The area for 1 bit in a lookup table corresponds to 1 area unit. A register cell requires 4 area units per bit and a full adder requires 8 area units. These values provide a powerful and realistic model for evaluation of area and time for most algorithms for modular multiplication.In this thesis, the percentage of configurable logic block slices occupied and the absolute time for computation are used to evaluate the algorithms. Other hardware resources such as total number of gates and number of flip-flops or latches required were also documented to provide a more practical and realistic evaluation of the algorithms in [1].2.3 Montgomery Multiplication AlgorithmThe Montgomery algorithm [1, Algorithm 1a] computes P = (X*Y* (2n)-1) mod M. The idea of Montgomery [2] is to keep the lengths of the intermediate resultsExisting Architectures for Modular Multiplication14smaller than n +1 bits. This is achieved by interleaving the computations and additions of new partial products with divisions by 2; each of them reduces the bit-length of the intermediate result by one.For a detailed treatment of the Montgomery algorithm, the reader is referred to [2] and [1].The key concepts of the Montgomery algorithm [1, Algorithm 1b] are the following:• Adding a multiple of M to the intermediate result does not change the valueof the final result; because the result is computed modulo M . M is an odd number.• After each addition in the inner loop the least significant bit (LSB) of theintermediate result is inspected. If it is 1, i.e., the intermediate result is odd, we add M to make it even. This even number can be divided by 2 without remainder. This division by 2 reduces the intermediate result to n +1 bits again.• After n steps these divisions add up to one division by 2n .The Montgomery algorithm is very easy to implement since it operates least significant bit first and does not require any comparisons. A modification of Algorithm 1a with carry save adders is given in [1, Algorithm 1b]:Algorithm 1a: Montgomery multiplication [1]P-M;:M) then P ) if (P (; }P div ) P :(*M; p P ) P :(*Y; x P ) P :() {n; i ; i ) for (i (;) P :(;: LSB of P p bit of X;: i x X;in bits of n: number M ) ) (X*Y(Output: P MX, Y Y, M with Inputs: X,i th i -n =≥=+=+=++<===<≤625430201 mod 20001Existing Architectures for Modular Multiplication15Algorithm 1b: Fast Montgomery multiplication [1]P-M;:M) then P ) if (P (C;S ) P :(;} C div ; C :S div ) S :(*M; s C S :) S,C (*Y; x C S :) S,C () {n; i ; i ) for (i (; ; C : ) S :(;: LSB of S s bit of X;: i x X;of bits in n: number M ) ) (X*Y(Output: P M X, Y Y, M with Inputs: X,i th i -n =≥+===++=++=++<====<≤762254302001mod 20001In this algorithm the delay of one pass through the loop is reduced from O (n ) to O (1). This remarkable improvement of the propagation delay inside the loop of Algorithm 1b is due to the use of carry save adders to implement step (3) and (4) in Algorithm 1a.Step (3) and (4) in Algorithm 1b represent carry save adders. S and C denote the sum and carry of the three input operands respectively.Of course, the additions in step (6) and (7) are conventional additions. But since they are performed only once while the additions in the loop are performed n times this is subdominant with respect to the time complexity.Figure 1 shows the architecture for the implementation of the loop of Algorithm 1b. The layout comprises of two carry save adders (CSA) and registers for storing the intermediate results of the sum and carry. The carry save adders are the dominant occupiers of area in hardware especially for very large values of n (e.g. n 1024).In Chapter 3, we shall see the changes that were made in [1] to reduce the number of carry save adders in Figure1 from 2 to 1, thereby saving considerable hardware space. However, these changes also brought about other area consuming blocks such as lookup tables for storing precomputed values before the start of the loop.Existing Architectures for Modular Multiplication 16Fig. 1: Architecture of the loop of algorithm 1b [1].There are various modifications to the Montgomery algorithm in [5], [6] and [7]. All these algorithms aimed at decreasing the operating time for faster system performance and reducing the chip area for practical hardware implementation. 2.4 Interleaved Modular MultiplicationAnother well known algorithm for modular multiplication is the interleaved modular multiplication. The details of the method are sketched in [3, 4]. The idea is to interleave multiplication and reduction such that the intermediate results are kept as short as possible.As shown in [1, Algorithm 2], the computation of P requires n steps and at each step we perform the following operations:Existing Architectures for Modular Multiplication17• A left shift: 2*P• A partial product computation: x i * Y• An addition: 2*P+ x i * Y •At most 2 subtractions:If (P M) Then P := P – M; If (P M) Then P := P – M;The partial product computation and left shift operations are easily performed by using an array of AND gates and wiring respectively. The difficult task is the addition operation, which must be performed fast. This was done using carry save adders in [1, Algorithm 4], introducing only O (1) delay per step.Algorithm 2: Standard interleaved modulo multiplication [1]P-M; }:M) then P ) if (P (P-M; :M) then P ) if (P (I;P ) P :(*Y; x ) I :(*P; ) P :() {i ; i ; n ) for (i (;) P :( bit of X;: i x X;of bits in n: number M X*Y Output: P M X, Y Y, M with Inputs: X,i th i =≥=≥+===−−≥−===<≤765423 0 1201mod 0The main advantages of Algorithm 2 compared to the separated multiplication and division are the following:• Only one loop is required for the whole operation.• The intermediate results are never any longer than n +2 bits (thus reducingthe area for registers and full adders).But there are some disadvantages as well:Existing Architectures for Modular Multiplication 18 •The algorithm requires three additions with carry propagation in steps (5),(6) and (7).•In order to perform the comparisons in steps (4) and (5), the preceding additions have to be completed. This is important for the latency because the operands are large and, therefore, the carry propagation has a significant influence on the latency.•The comparison in step (6) and (7) also requires the inspection of the full bit lengths of the operands in the worst case. In contrast to addition, the comparison is performed MSB first. Therefore, these two operations cannot be pipelined without delay.Many researchers have tried to address these problems, but the only solution with a constant delay in the loop is the one of [8], which has an AT- complexity of 156n2.In [1], a different approach is presented which reduces the AT-complexity for modular multiplication considerably. In Chapter 3, this new optimized algorithm is presented and discussed.Chapter 3New Architectures for Modular Multiplication The detailed treatment of the new algorithms and their corresponding architectures presented in this chapter can be found in [1]. In this chapter, a summary of these algorithms and architectures is given. They have been designed to meet the core requirements of most modern devices: small chip area and low power consumption.3.1 Faster Montgomery AlgorithmIn Figure 1, the layout for the implementation of the loop of Algorithm 1b consists of two carry save adders. For large wordsizes (e.g. n = 1024 or higher), this would require considerable hardware resources to implement the architecture of Algorithm 1b. The motivation behind this optimized algorithm is that of reducing the chip area for practical hardware implementation of Algorithm 1b. This is possible if we can precompute the four possible values to be added to the intermediate result within the loop of Algorithm 1b, thereby reducing the number of carry save adders from 2 to 1. There are four possible scenarios:•if the sum of the old values of S and C is an even number, and if the actual bit x i of X is 0, then we add 0 before we perform the reduction of S and C by division by 2.•if the sum of the old values of S and C is an odd number, and if the actual bit x i of X is 0, then we must add M to make the intermediate result even.Afterwards, we divide S and C by 2.•if the sum of the old values of S and C is an even number, and if the actual bit x i of X is 1, but the increment x i *Y is even, too, then we do not need to add M to make the intermediate result even. Thus, in the loop we add Y before we perform the reduction of S and C by division by 2. The same action is necessary if the sum of S and C is odd, and if the actual bit x i of X is 1 and Y is odd as well. In this case, S+C+Y is an even number, too.New Architectures for Modular Multiplication20• if the sum of the old values of S and C is odd, the actual bit x i of X is 1, butthe increment x i *Y is even, then we must add Y and M to make the intermediate result even. Thus, in the loop we add Y +M before we perform the reduction of S and C by division by 2.The same action is necessary if the sum of S and C is even, and the actual bit x i of X is 1, and Y is odd. In this case, S +C +Y +M is an even number, too.The computation of Y +M can be done prior to the loop. This saves one of the two additions which are replaced by the choice of the right operand to be added to the old values of S and C . Algorithm 3 is a modification of Montgomery’s method which takes advantage of this idea.The advantage of Algorithm 3 in comparison to Algorithm 1 can be seen in the implementation of the loop of Algorithm 3 in Figure 2. The possible values of I are stored in a lookup-table, which is addressed by the actual values of x i , y 0, s 0 and c 0. The operations in the loop are now reduced to one table lookup and one carry save addition. Both these activities can be performed concurrently. Note that the shift right operations that implement the division by 2 can be done by routing.Algorithm 3: Faster Montgomery multiplication [1]P-M;:M) then P ) if (P (C;S ) P :(;} C div ; C :S div ) S :(I;C S :) S,C ( R;) then I :) and x y c ((s ) if ( Y;) then I :) and x y c (not(s ) if ( M;) then I :x ) and not c ((s ) if (; ) then I :x ) and not c ((s ) if () {n; i ; i ) for (i (; ; C : ) S :(M; of Y uted value R: precomp ;: LSB of Y , y : LSB of C , c : LSB of S s bit of X;: i x X;of bits in n: number M ) ) (X*Y(Output: P M X, Y Y, M with Inputs: X,i i i i th i -n =≥+===++==⊕⊕=⊕⊕=≠==++<===+=<≤10922876540302001mod 2000000000000001New Architectures for Modular Multiplication 21Fig. 2: Architecture of Algorithm 3 [1]In [1], the proof of Algorithm 3 is presented and the assumptions which were made in arriving at an Area-Time (AT) complexity of 96n2 are shown.3.2 Optimized Interleaved AlgorithmThe new algorithm [1, Algorithm 4] is an optimisation of the interleaved modular multiplication [1, Algorithm 2]. In [1], four details of Algorithm 2 were modified in order to overcome the problems mentioned in Chapter 2:•The intermediate results are no longer compared to M (as in steps (6) and(7) of Algorithm 2). Rather, a comparison to k*2n(k=0... 6) is performedwhich can be done in constant time. This comparison is done implicitly in the mod-operation in step (13) of Algorithm 4.New Architectures for Modular Multiplication22• Subtractions in steps (6), (7) of Algorithm 2 are replaced by one subtractionof k *2n which can be done in constant time by bit masking. • Next, the value of k *2n mod M is added in order to generate the correctintermediate result (step (12) of Algorithm 4).• Finally, carry save adders are used to perform the additions inside the loop,thereby reducing the latency to a constant. The intermediate results are in redundant form, coded in two words S and C instead of generated one word P .These changes made by the authors in [1] led to Algorithm 4, which looks more complicated than Algorithm 2. Its main advantage is the fact that all the computations in the loop can be performed in constant time. Hence, the time complexity of the whole algorithm is reduced to O(n ), provided the values of k *2n mod M are precomputed before execution of the loop.Algorithm 4: Modular multiplication using carry save addition [1]M;C) (S ) P :(M;})*C *C S *S () A :( A);CSA(S, C,) :) (S,C ( I); CSA(S, C,C) :) (S,(*Y;x ) I :(*A;) A :(*C;) C :(*S;) S :(; C ) C :(; S ) S :() {; i ; i n ) for (i (; ; A : ; C :) S :( bit of X;: i x X;of bits in n: number M X*Y Output: P MX, Y Y, M with Inputs: X,n n n n n i n n th i mod 12mod 2221110982726252mod 42mod 30120001mod 011+=+++=========−−≥−=====<≤++New Architectures for Modular Multiplication 23Fig. 3: Inner loop of modular multiplication using carry save addition [1]In [1], the authors specified some modifications that can be applied to Algorithm 2 in order simplify and significantly speed up the operations inside the loop. The mathematical proof which confirms the correctness of the Algorithm 4 can be referred to in [1].The architecture for the implementation of the loop of Algorithm 4 can be seen in the hardware layout in Figure 3.In [1], the authors showed how to reduce both area and time by further exploiting precalculation of values in a lookup-table and thus saving one carry save adder. The basic idea is:。

Bi2Se3未考虑vdw的错误汇总

在没有考虑vdw作用之前，算Bi2Se3材料soc中出现的错误汇总V ASP自旋轨道耦合计算错误汇总静态计算时，报错：VERY BAD NEWS! Internal内部error in subroutine子程序IBZKPT:Reciprocal倒数的lattice and k-lattice belong to different class of lattices. Often results are still useful (48)INCAR参数设置：对策：根据所用集群，修改INCAR中NPAR。

将NPAR=4变成NPAR=1，已解决！错误：sub space matrix类错误报错：静态和能带计算中出现警告：W ARNING: Sub-Space-Matrix is not hermitian共轭in DA V结构优化出现错误：WARNING: Sub-Space-Matrix is not hermitian in DA V 4 -4.681828688433112E-002对策：通过将默认AMIX=0.4，修改成AMIX=0.2（或0.3），问题得以解决。

以下是类似的错误：WARNING: Sub-Space-Matrix is not hermitian in rmm -3.00000000000000RMM: 22 -0.167633596124E+02 -0.57393E+00 -0.44312E-01 1326 0.221E+00BRMIX:very serious problems the old and the new charge density differ old charge density: 28.00003 new 28.06093 0.111E+00错误：WARNING: Sub-Space-Matrix is not hermitian in rmm -42.5000000000000ERROR FEXCP: supplied Exchange-correletion table is too small, maximal index : 4794错误：结构优化Bi2Te3时，log文件：WARNING in EDDIAG: sub space matrix is not hermitian 1 -0.199E+01RMM: 200 0.179366581305E+01 -0.10588E-01 -0.14220E+00 718 0.261E-01BRMIX: very serious problems the old and the new charge density differ old charge density: 56.00230 new 124.70394 66 F= 0.17936658E+01 E0= 0.18295246E+01 d E =0.557217E-02curvature: 0.00 expect dE= 0.000E+00 dE for cont linesearch 0.000E+00ZBRENT: fatal error in bracketingplease rerun with smaller EDIFF, or copy CONTCAR to POSCAR and continue但是，将CONTCAR拷贝成POSCAR，接着算静态没有报错，这样算出来的结果有问题吗？对策1：用这个CONTCAR拷贝成POSCAR重新做一次结构优化，看是否达到优化精度！对策2：用这个CONTCAR拷贝成POSCAR，并且修改EDIFF（目前参数EDIFF=1E-6）,默认为10-4错误：WARNING: Sub-Space-Matrix is not hermitian in DA V 1 -7.626640664998020E-003网上参考解决方案：对策1：减小POTIM: IBRION=0，标准分子动力学模拟。

纹理物体缺陷的视觉检测算法研究--优秀毕业论文

摘要
在竞争激烈的工业自动化生产过程中，机器视觉对产品质量的把关起着举足轻重的作用，机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检测技术相比，自动化的视觉检测系统更加经济、快捷、高效与安全。纹理物体在工业生产中广泛存在，像用于半导体装配和封装底板和发光二极管，现代化电子系统中的印制电路板，以及纺织行业中的布匹和织物等都可认为是含有纹理特征的物体。本论文主要致力于纹理物体的缺陷检测技术研究，为纹理物体的自动化检测提供高效而可靠的检测算法。纹理是描述图像内容的重要特征，纹理分析也已经被成功的应用与纹理分割和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检测算法。这种算法能容忍物体变形引起的图像配准误差，对纹理的影响也具有鲁棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义，如缺陷区域的大小、形状、亮度对比度及空间分布等。同时，在参考图像可行的情况下，本算法可用于同质纹理物体和非同质纹理物体的检测，对非纹理物体的检测也可取得不错的效果。在整个检测过程中，我们采用了可调控金字塔的纹理分析和重构技术。与传统的小波纹理分析技术不同，我们在小波域中加入处理物体变形和纹理影响的容忍度控制算法，来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段，我们检测了一系列具有实际应用价值的图像。实验结果表明本文提出的纹理物体缺陷检测算法具有高效性和易于实现性。关键字: 缺陷检测；纹理；物体变形；可调控金字塔；重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II

量子计算密码攻击进展

第43卷第9期计算机学报Vol. 43 No. 9量子计算密码攻击进展王潮1) ,2), 3)姚皓南1),2)王宝楠1),2),5)胡风1),2)张焕国6)纪祥敏4),6)1)(特种光纤与光接入网重点实验室, 特种光纤与先进通信国际合作联合实验室上海大学上海 200444)2)(密码科学技术国家重点实验室北京 100878)3)(鹏城实验室量子计算中心广东深圳 518000)4)(福建农林大学计算机与信息学院福州 350002)5)(上海电力大学计算机科学与技术学院上海 200090)6)(武汉大学国家网络安全学院武汉 430072)摘要通用量子计算机器件进展缓慢，对实用化1024-bit的RSA密码破译尚不能构成威胁，现代密码依旧是安全的. 量子计算密码攻击需要探索新的途径：一是，量子计算能否协助/加速传统密码攻击模式，拓展已有量子计算的攻击能力；二是，需要寻找Shor算法之外的量子计算算法探索密码攻击. 对已有的各类量子计算整数分解算法进行综述，分析量子计算密码攻击时面对的挑战，以及扩展至更大规模整数分解存在的问题. 结合Shor算法改进过程，分析Shor算法对现代加密体系造成实质性威胁前遇到的困难并给出Shor破译2048位RSA需要的资源. 分析基于D-Wave量子退火原理的RSA破译，这是一种新的量子计算公钥密码攻击算法，与Shor算法原理上有本质性不同. 将破译RSA问题转换为组合优化问题，利用量子退火算法独特的量子隧穿效应跳出局部最优解逼近全局最优解，和经典算法相比有指数级加速的潜力. 进一步阐述Grover量子搜索算法应用于椭圆曲线侧信道攻击，拓展其攻击能力. 探讨量子人工智能算法对NTRU等后量子密码攻击的可能性.关键词量子计算；量子退火；量子计算密码；量子攻击中图法分类号TP309 DOI号 10.11897/SP.J.1016.2020.01691Progress in Quantum Computing Cryptography AttacksWANG Chao1),2),3) YAO Hao-Nan1),2) WANG Bao-Nan1),2),5) HU Feng1),2)ZHANG Huan-Guo6) JI Xiang-Min4),6)1)(Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Joint International Research Laboratory of Specialty Fiber Optics and AdvancedCommunication, Shanghai University, Shanghai 200444)2)( State Key Laboratory of Cryptology, Beijing 100878)3)( Center for Quantum Computing, Peng Cheng Laboratory, Guangdong Shenzhen 518000)4)(College of Computer Information Science, Fujian Agriculture and Forestry University, Fuzhou 350002)5)(College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 200090)6)(School of Cyber Science and Engineering, Wuhan University, Wuhan 430072)Abstract Due to the limitations of hardware, the development of universal quantum computer devices isslow. At present, the maximum integer factorization by general Shor’s algorithm is 85 (using the characteristics of Fermat numbers to factor the integer 85 with 8 qubits), which is not a threat to thepractical 1024-bit RSA by Shor’s algorithm. Since the universal quantum computer cannot be practical in a收稿日期：2019-08-27；在线出版日期：2020-02-07. 本课题得到“国防创新特区项目”、国家自然科学基金项目（61572304，61272096）、国家自然科学基金重点项目（61332019）、密码科学技术国家重点实验室开放课题基金资助. 王潮，博士，教授，中国计算机学会(CCF)会员(E200008909S)，主要研究领域为人工智能、网络信息安全、量子密码学等.E-mail:****************.cn.姚皓南，硕士研究生，主要研究领域为信息安全、量子密码学. 王宝楠，博士研究生，主要研究领域为信息安全、量子密码学. 胡风，博士研究生，主要研究领域为信息安全、量子密码学. 张焕国，博士，教授，主要研究领域为密码学、密码协议、可信计算等. 纪祥敏(通信作者)，博士研究生，副教授，中国计算机学会(CCF)会员(30663M)，主要研究领域为信息安全、密码学、可信计算.E-mail:***************1692 计算机学报 2020年short time, the modern cryptography is still secure enough now. Quantum computing cryptography attack needs to explore new ways to enhance its quantum attacking ability: Firstly, whether quantum computing can assist/accelerate traditional cryptography attack mode and expand its more powerful quantum attacking ability on the basis of the existing quantum computing. Secondly, it is necessary to find quantum computing algorithms other than Shor’s algorithm to explore quantum computing cryptographic attack. In this paper, various existing algorithms for integer factorization algorithms of quantum computing are studied and show optimistic potentials of quantum annealing algorithm and D-Wave quantum computer for deciphering the RSA cryptosystem. Such as Shor’s algorithm (factor up to 85) via different platforms (like Hua-Wei quantum computing platform), quantum adiabatic computation via NMR (291311), D-Wave (purchased by Lockheed Martin and Google etc., has been initially used for image processing, machine learning, combinatorial optimization, and software verification etc.) quantum computer (factored up to 376289), quantum computing software environment provided by D-Wave (factor the integer 1001677 with 87 qubits) to obtain a higher success rate and extend it to a larger factorization scale. Actually, D-Wave using quantum annealing may be closer to cracking practical RSA codes than a general-purpose quantum computer (IBM) using Shor’s algorithm. In addition, the model limitations and precision problems existing in the expansion of integer factorization to a larger scale are discussed. Majorities of scholars think Shor’s algorithm as the unique and powerful quantum algorithm for cryptanalysis of RSA. Therefore, the current state of post-quantum cryptography research exclusively referred to potential threatens of Shor’s algorithm. This paper analyzes the RSA deciphering method based on D-Wave quantum annealing principle, which is a new public key cryptography attack algorithm for quantum computing, and it is fundamentally different from Shor’s algorithm in principle. It is the second effective quantum attack method (RSA deciphering) in addition to Shor’s algorithm. Thus, the post-quantum cryptography research should further consider the potentials of D-Wave quantum computer for deciphering the RSA cryptosystem in future. Furthermore, Grover’s quantum searching algorithm is applied to the elliptic curve side channel attack to expand its attack capability. It is a new effective public key cryptosystem attack method, which is helpful to expand the attack of quantum computing on other public key cryptosystem constitutions. Finally, the possibility of quantum artificial intelligence algorithm attacking NTRU and other post-quantum cryptography is discussed. It is necessary to explore a new cryptographic scheme that can resist the attack of quantum computing, and combine evolutionary cryptography with quantum artificial intelligence, which is expected to be applied to the design and analysis of cryptography algorithms in the post-quantum cryptography.Keywords traditional cryptography; quantum computing; quantum annealing; quantum computing cryptography; quantum attack1 引言现在的量子计算机可以分成两类. 一是通用量子计算机[1,2]，由于硬件平台发展缓慢，对现在实用化1024-bit RSA密码破译尚不能构成威胁，现代密码依旧安全. 二是D-Wave专用量子计算机，其商业化进展迅猛. 基于通用量子计算机的Shor算法受限于量子硬件发展缓慢，对现在广泛使用的RSA公钥密码体系没有实质上的威胁. 但是人们容易将硬件发展较快、无法运行Shor算法的专用量子计算机与通用量子计算机混淆，对现代密码体系的安全性做出错误的判断，错误地以为现代密码体系即将受到Shor算法的攻击进而不再安全[3].量子计算[4,5]将有助于推动密码攻击领域的诸多课题. 量子攻击提供了一种新的、不同于传统密码的计算模式，在密码设计和破译领域实现对传统密码的进一步拓展.2014年，Nature资深评论员Matthias Troyer在报道中指出，包括Shor算法在内的量子密码破译无法实用化[6]. 2018年，Google量子人工智能实验室9期王潮等：量子计算密码攻击进展 1693主任John Martinis在Science报道中指出通用量子实用化任重道远，认为破译公钥密码距离实用化“be years”，美国能源部DOE也赞同这个观点[7,8].2018年，荣获2012年物理诺贝尔奖的Serge Haroche教授在报告“The Nobel Prize Series India 2018”中指出，短期内量子计算机有望应用于量子模拟、量子通信、量子测试等研究领域.2019年《Nature》上发表了Google最新一代量子处理器Sycamore[9]，包含53个量子比特. 对量子处理器的输出进行重复性采样，并与经典计算机模拟的结果进行比较. Sycamore完成同样的任务只需要200秒，而Google估计使用目前世界上最强大的超级计算机Summit需要1万年. 以此证明该量子处理器实现了量子优越性(Quantum Supremacy). IBM 提出使用二级存储[10]可以模拟54-bit量子计算机，并且通过优化将经典计算机执行任务的时间从1万年降低到2.55天. IBM研究中心主任Dario Gil表示，量子计算机不会凌驾于经典计算机之上，两种计算机会是协同工作的方式. 量子计算机对硬件的要求较高，而将其与经典计算机进行混合架构共同执行任务，可以在达到量子加速的同时降低对量子硬件的需求.尽管现在量子计算在一个任务上已经实现了量子优越性，但是由于量子纠错和容错量子计算技术远超当前技术水平[11-13]，通用量子计算机进展缓慢，量子算法达到实用化阶段尚需时日，以破译RSA公钥密码为例，目前分解n位大整数需要2n位逻辑量子比特[14]. 基于Shor算法实现破译1024位RSA密码实际需2000多位通用逻辑量子比特，远非当下的通用量子计算机所能达到. 而专用量子计算机D-Wave硬件平台发展较快并且与洛克希德马丁、谷歌、美国国家航空航天局、美国国家实验室等多家机构进行合作，在量子计算机商用化的道路上处于领跑地位. 因此，有必要探索专用量子计算机(D-Wave)在密码设计与密码分析领域的潜力.从密码学家的角度来看[15]，通用量子计算存在的两个问题可能构成了量子密码分析的主要障碍:增加容错量子位的数量. 第一个是所谓的量子比特的数量. 技术的进步使可用量子位的数量经常翻倍，这与经典计算机的摩尔定律相似，但是通用量子计算机受硬件限制，发展难度较大. 第二个就是容错. 在量子计算中，错误通常是由于量子比特与其环境之间不受控制的相互作用而产生的. 2019年，最多量子比特的量子计算机拥有72个容错量子位，已经可以解决以前无法解决的组合、优化等问题，但目前还不能解决密码问题. 所以在通用量子计算机的进展缓慢、对实际运行的公钥密码不能构成安全威胁背景下，未来抗量子密码的研究有必要探索专用量子计算机(D-Wave)在密码设计与密码分析领域的潜力.本篇综述主要归纳与总结传统密码与量子计算密码攻击的国内外研究进展，分析量子计算在公钥密码攻击和椭圆曲线侧信道攻击现状，并展望量子人工智能算法对NTRU等后量子密码攻击的可能性. 为量子计算在信息科学领域的工作提供思路. 根据《Nature》[6]和《Science》[7][8]报道，通用量子计算机进展缓慢，它的几个典型应用都无法成功，破译公钥密码距离实用化“be years”，因此，量子计算密码攻击需要探索新的途径：一是，量子计算能否加速传统密码攻击模式；二是，需要寻找Shor 算法之外的量子计算算法探索密码攻击.2 Shor算法对公钥密码的攻击三种公钥密码包括离散对数难题(Discrete Logarithm Problem, DLP)，大整数分解难题(Inte-ger Factorization Problem, IFP)和椭圆曲线离散对数难题(Elliptic Curve Discrete Logarithm Problem, ECDLP). 对于IFP和ECDLP，传统算法中最有效的方法是1991年由Pollard等人提出的通用数域筛选算法(GNFS)[16]. 通用数域筛选法(GNFS)求解大数质因子和椭圆曲线离散对数的时间复杂度分别是1/32/3[(log)(log(log))](e)N NO[17]和(expO(())M P [18]，这里的N是指要分解的大数，k就是我们要求的椭圆曲线的离散对数，P 是有限域的素域范围.本章节主要介绍Shor算法对RSA的攻击研究和Shor算法求解椭圆曲线离散对数的研究.2.1 Shor算法分解整数的研究公钥密码系统的安全性随着Shor算法的提出和量子计算的发展受到了威胁. 众所周知，RSA密码系统的安全性在于整数分解问题的难度. 它所依赖的数论问题不能在有效的多项式时间内求解. 破译RSA的核心问题即整数分解问题[19].Shor算法通过将因式分解问题简化为求阶的问题来发挥效用. 关于量子计算机的仿真[20]及Shor算法对整数N的分解的研究一直受到国内外学者的广泛关注.1996年，阿根廷科学家Cesar Miquel等人[21]分析了损耗和退相干对量子分解电路性能的影响，1694 计算机学报 2020年并且展示分解整数N=15，这项工作为实践中实施Shor 量子算法提供了很好的参考.2000年英国伦敦帝国学院布莱克特实验室的Parker 等人[22]给出了一个单一的纯量子比特与log2N 量子比特在任意混合态下的集合都足以有效地实现Shor 分解算法.2001年，IBM 研究实验室的Vandersypen 等人[23]使用室温液态核磁共振技术实现了整数N =15的Shor 算法演示性实验. 该实验主要目的是演示量子计算机的控制和建模，没有针对Shor 算法的扩展性进行研究，无法应用到更大的整数.2004年，美国赫尔辛基理工大学的Vartiainen [24]基于约瑟夫森电荷量子位寄存器实现整数N =21的Shor 算法实验. 由于实验对退相干时间有严格要求，通过使用特别设计的量子位门和数值优化的方法完成了物理实现，因此难以扩展到大规模整数分解.2007年，文献[25]基于Quantware 库使用30个量子比特完成整数N =943的分解，并研究残余耦合引起的缺陷对Shor 算的影响. 同年，基于光量子计算机的Shor 算法整数分解由Lu 等人[26]首次完成，通过操控四个光量子实现了N =15的整数分解，通过实验证明该平台可以执行Shor 算法，并完成整数分解.2011年，布里斯托大学的M.G.Thompson 等人使用可控相位门和哈达门展示执行Shor 算法的基本过程[27]，并成功完成整数N =15的分解.2012年，Lucero 等人[28]基于约瑟夫森电荷量子电路成功使用3个量子比特分解整数N =15，和前文中Vartiainen 的实验一样，对退相干时间严格要求，物理实现的要求较高.2012年，Enrique Martin Lopez 等人实验实现了Shor 量子分解21的过程，通过使用一个迭代协议将量子比特重复利用，使得所需的所有的量子比特的数量是标准协议中要求的三分之一[29].2013年，佐治亚大学的Geller 等人[30]使用8量子比特完成整数N =51和N =85的分解，由于利用了费马数的特殊性质，不能作为通用方法.2013年，文献[31]提出量子电路执行整数分解任务时第二寄存器的优化方法. 通过寻找2阶元素来实现整数的分解. 因此第二寄存器的量子比特数可以大大减小，有效降低总量子比特数.2016年，Thomas 等人[32]提出基于Kitaev 的Shor 算法的实现. 通过有效地使用和控制七个量子位和四个“高速缓存量子位”分解整数15. 与传统算法相比，减少了近3/4的量子比特数.2017年，上海大学王宝楠等人[33]提出了针对RSA 的小Qubit 量子攻击算法设计，降低了算法的复杂度和成功率，提高了原算法中模幂计算的运算速率. 实验表明，该方法可以用11、10、9量子比特成功分解整数119的量子电路.2018年Google 公司Craig Gidney [14]提出引入1n -个辅助量子比特(Dirty Ancillae Qubits)将Shor 算法执行所需的量子比特数(Clean Qubits)从Zalka [34] 1.5(1)n O +缩减到了2n -，并且没有增加电路的渐进规模和深度. Dirty Qubits 以一种未知状态存在不需要精确的初始化，但在电路执行结束之前需转为已知状态. Clean Qubits 用于具体的电路构造，需要初始化为已知的计算基态，Shor 算法发展如表1. 表中()M n 是乘法的经典时间复杂度，其极限是(lg )(lg )2O n n n ⋅⋅[35]. ε是通用门的最大误差.表1 Shor 量子算法改进过程Year DepthGatesClean QubitsTotal QubitsShor [36]1994(())nM n Θ (())nM n Θ ()n Θ()n ΘBechman [37] 1996 3()n Θ 3()n Θ 51n + 51n + Veldral [38] 1996 3()n Θ3()n Θ43n +43n + Beauregard [39] 2003 31(lg n εΘ31(lg lg )n n εεΘ23n + 23n + Takahashi [40] 2006 31(lg n εΘ 31(lg lg )n n εεΘ 22n +22n +Zalka [41] 2006 31(lg n εΘ31(lg lg )n n εεΘ1.5(1)n O +1.5(1)n O +Häner [42] 2016 3()n Θ 3(lg )n n Θ 22n + 22n + Craig Gidney [14]20173()n Θ3(lg )n n Θ2n +21n +2019年，Google 公司Craig Gidney [43]假设物理门错误率为310-，surface code 周期1微秒反应时间10微秒，同时考虑到噪声的影响等条件，使用窗口算法进行优化. 评估破解2048位RSA 需要的物理9期王潮等：量子计算密码攻击进展 1695量子比特数为22325184，需要的时间为8小时，所需的物理量比特数以及量子比特精度远远超出当前硬件水平，对实际运行的公钥密码不能构成安全威胁.Microsoft和Google量子研究组的研究人员Matthias Troyer和John Martinis均表示由于通用量子计算机硬件的限制短期内无法实现Shor算法破译现在实际使用的RSA加密体系，寻找通用量子计算机的杀手级应用仍是一大挑战. 因此，在量子计算攻击密码方面需要探索不同于Shor算法的量子计算密码破译之路.2.2 Shor算法求解椭圆曲线离散对数的研究目前，与Shor算法攻击RSA的研究相比，针对ECC的Shor算法的研究比较少. 原因大致有两个：一是因为ECC算法相对于RSA算法的数学理论较为复杂，在工程应用中实现较为困难；二是因为Shor算法本是设计用来解决大数分解和求解离散对数的，如果要想利用Shor算法求解椭圆曲线离散对数，理论上是不能直接完成的，而且因为椭圆曲线上的运算都是点的运算，很难进行量子电路的设计. 从而导致Shor算法求解椭圆曲线离散对数问题成为了一个科研上的难题.1994年，Shor[44]提出Shor算法，作为量子计算机最著名的应用之一，在大素数分解问题上比非量子计算算法有指数级别的优势，同时Shor算法还可以应用在离散对数问题上.1997年，里奇蒙大学的Jodie Eicher和Yaw Opoku[45]在理论上设计了使用Shor算法解决与离散对数问题类似的椭圆曲线问题的具体步骤. 证明在Shor算法的基础上进行修改可以解决椭圆曲线问题. 和Shor算法一样，想真正物理实现这种算法需要解决量子器件的诸多挑战.2003年，滑铁卢大学John Proos研究了基于Shor算法的椭圆曲线问题[46]. John Proos从Shor算法和数学分析进行研究：不同椭圆曲线在有限域下具有不同的性质，选择其中适当的一条特殊的椭圆曲线进行分析. 在Shor算法基础上针对模幂运算与量子傅里叶变换进行优化. 以此分析椭圆曲线问题上的优化方案和算法步骤. 根据计算结果，在Shor 算法的基础上求解ECDLP时，对于给定的n位整数，需要6n量子比特. 但是没有就实验模拟的过程进行研究. 同时该文献给出了另一种观点：求解椭圆曲线离散对数问题可以看作是求解二维的大数分解问题.2014年，美国微软研究院Martin Roetteler等人[47]给出了一个类似于Shor量子分解算法的概要量子电路，该量子电路设计了三个量子寄存器，由于椭圆曲线上点的表示及点的运算的复杂性，用量子电路来体现是极其困难的. 一般说的实现Shor算法的量子电路需要三个量子寄存器：第一个和第二个为控制量子寄存器，第三个为工作寄存器，事实上第三个量子寄存器只是一个代称，其可能需要多个量子寄存器来协同完成“工作寄存器”的功能. 即便可以，构造量子电路实现对ECC的攻击也将会是非常复杂的.2017年，美国微软研究院Martin Roetteler等人[48]对使用Shor算法求解椭圆曲线的离散对数问题时需要的量子资源进行了估算. 这些估算来自用于受控椭圆曲线点加法的Toffoli门网络的仿真，在量子计算软件工具套件|iLIQU〉的框架内实现. 确定可逆模块化运算的电路实现，包括模加法、模乘法和模逆运算，以及可逆椭圆曲线点加法. 得出结论:在n比特素数域上定义的椭圆曲线上的椭圆曲线离散对数，可以在量子计算机上用至多332448log()4090n n n+Toffoli门的量子电路计算，其量子比特数最多为292log()10n n++⎡⎤⎢⎥. 虽然提出通过量子电路来实现Shor算法解决ECDLP(Elliptic Curve Discrete Logarithm Problem)，并分析了实现这些电路所需的资源，但没有通过实验完全证明.2018年，上海大学陈宇航等人[49]提出能够使用小量子比特数来破解椭圆曲线加密的Shor量子攻击方法，对当前安全曲线有较大威胁，它的通用性更强. 该方法的步骤为：选取一条二进制素域上的椭圆曲线，然后输出该椭圆曲线上的所有点；任意选取椭圆曲线上两点P和Q，满足P kQ=，k为离散对数，输出与椭圆曲线上各点(,)t tx y对应的x t P+ ty Q和x t P的点；构造以k为周期的周期函数；创建两个量子寄存器并设置其初始状态；对第一量子寄存器1ϕ执行Hadamard变换；将,U x a算符应用于第二量子寄存器2ϕ；对第一量子寄存器进行量子傅立叶逆变换：测量第一量子寄存器的本征态概率，求使其达到最大值的阶k；如果阶k是满足P kQ=，则攻击私钥为k.图1是基于Shor算法求解椭圆曲线离散对数k问题的流程图.目前，Shor算法对公钥密码的攻击还需要更深入的研究，通用量子计算机分解大数和求解ECC离散对数的能力还很有限. 当前的物理实现只能控制运行Shor算法的小规模的量子比特，尚不能对现今使用的1024位RSA和163位的ECC构成威胁. 如果真正部署Shor算法在多项式时间内攻击现在的加1696 计算机学报 2020年密算法，必须使用千位以上的通用量子计算机.据《Nature 》和《Science 》[6]-[8]等报道，破译现有的163位ECC 密码所需要的千位量子比特的通用量子计算机，在未来5到10年内仍难实现. 目前，在通用量子计算机的器件条件限制的情况下，对公钥密码ECC 的小Qubit 量子计算攻击问题仍没有得到较好解决. 未来，探索小比特破译椭圆曲线ECC 是一大挑战.图1 Shor 算法求解椭圆曲线离散对数k 的流程图3 基于量子绝热理论的整数分解方法目前，通过量子计算实现整数分解主要有两个研究方向：一种为上面介绍的Shor 算法的电路模型算法，另一种为绝热量子计算[50](Adiabatic Quantum Computation ，AQC). 量子绝热计算已经应用于诸多组合优化问题，比如旅行商、图着色、蛋白质折叠、整数分解等问题[51-54]. 此外，AQC 对由相位差、环境噪声和酉运算的不完善引起的误差具有更好的鲁棒性[55,56]. 因此，它很快发展成为量子计算中极具吸引力的一个领域.2001年，Burges [57]首次提出将整数分解问题转化为优化问题，为绝热量子计算应用到整数分解做出了基础性工作，并于2010年由Schaller 和Schutz-hold [58]改进该方法.3.1 基于绝热量子理论的NMR 的整数分解的研究现状2008年，彭新华等人[59]首次提出了基于绝热量子计算的因子分解算法，并成功地在NMR 量子处理器上实现了21的分解.2012年，徐南阳等人[60]在文献[59]的基础上提出了一个改进的绝热量子算法，并通过核磁共振量子处理器实现了对整数143的分解.2014年，Nikesh S. Dattani 等人[61]利用两个质因子的特殊性质实现4比特分解整数56153，但是该数两个因子的二进制形式仅有两位不同，因此该方法不具有通用性与可扩展性.2016年，Soham Pal 等人[62]提出经典和量子计算混合方案通过500 MHz 核磁共振谱仪(NMR)实现对整数551的分解.2017年，李兆凯，彭新华，杜江峰等人[63]使用核磁共振谱仪(Nuclear Magnetic Resonance ，NMR)在高于室温下使用3个量子比特实现整数291311的分解. 该方法是使用大数两个素因子的特殊性质实现的，不具有通用性和可扩展性.2017年，杜江峰等人在室温下的固态单自旋系统上完成绝热量子算法整数分解实验[64]，系统将金刚石中的自旋作为量子处理器，通过分解整数N=35作为系统的基准测试，证明实验结果具有高保真度.基于绝热量子理论的NMR 的整数分解的研究由于量子比特数的限制，无法扩展到大规模整数分解的情况. 该研究仅可作为理论验证性的探索性的实验. 在通用量子计算机的进展缓慢和NMR 的量子比特数受限的情况下，D-Wave 量子计算机和基于D-Wave 量子退火原理的整数分解的研究突飞猛进. 3.2 基于D-Wave 量子退火原理的整数分解的研究使用量子计算方法攻击RSA 公钥体系问题上，Shor 算法作为量子计算机最著名的应用之一，学界普遍性认为在不考虑硬件平台限制的情况下，严重威胁现在的公钥密码体系，从而忽略其他量子计算攻击RSA 公钥体系的算法. 实际上学界众多学者均认为实际部署Shor 算法攻击现有的加密体系仍然遥遥无期.受限于相干时间、噪声、量子纠错等技术限制，近期难以研制出对现在使用的公钥密码具有威胁的通用量子计算机. 因此需要寻找不依赖通用量子计算机的量子算法攻击公钥密码. 实现原理不同的专用量子计算机D-Wave 可以执行与Shor 量子算法不同的绝热量子算法，对攻击公钥密码有重要的扩展9期王潮等：量子计算密码攻击进展 1697作用. D-Wave 量子计算机核心原理量子退火(Quan-tum Annealing ，QA)利用量子领域重要的物理性质量子隧穿效应，在组合优化容易陷入局部最优问题上比传统优化算法更具优势，指数级搜索问题中有望逼近甚至达到全局最优解. 这也是考虑将D-Wave 量子计算机用于密码设计及密码分析的基础.2018年ETSI 会议专家分析D-Wave 专用量子计算机在攻击加密体系受到忽视的原因. 因为D-Wave 最初商业化的应用主要是的应用包括Loc-kheed Martin 用于公司里飞机控制软件的测试，Google 将其用于图像识别问题等. 早期应用中不包括攻击加密体系，同时将整数分解问题转换为组合优化问题一直以来并没有被重点关注，从而忽视了利用D-Wave 的量子隧穿效应在组合优化问题上的独特优势攻击RSA 加密体系的应用. 因此，未来抗量子密码领域的研究还需要考虑基于量子退火原理的专用量子计算机攻击的威胁. 3.2.1 D-Wave 量子计算机的背景2011年5月，加拿大D-Wave 公司推出全球首款128个量子位的商用量子计算机D-Wave One 系统. 随后以1000万美元卖给著名的Lockheed Martin （洛克希德马丁）公司用于F35战机等先进武器的设计，它标志着量子计算机正式进入商用阶段[65].2012年，Geordie Rose 提出了D-Wave 量子计算机发展趋势图，预示D-Wave 量子计算机的量子比特规模大约每两年增加一倍，达到了经典计算机中摩尔定律的增长速度，如图2所示.图2 D-Wave 量子计算机发展路线（引自D-Wave 官网）D-Wave 量子计算机发展迅猛，完全不同于通用量子计算机的量子门电路构造思路，旨在多学科路线发展以及实用型商业化目标，目前正处于从纯科学向工程学的转型阶段. D-Wave 自2011年开始发布商用型D-Wave 量子计算机，相继与美国军火商Lockheed Martin 、Google 、美国Los Alamos 国家实验室(LANL)、美国橡树岭国家实验室(Oak RidgeNational Laboratory ，ORNL)等建立合作. 2019年德国Forschungszentrum Jülich 超级计算中心购买了D-Wave 公司最新的Advantage 量子计算机，配备超过5000个量子比特，是D-Wave 2000Q 的两倍以上. 符合Geordie Rose 提出的发展趋势，与经典计算机领域的摩尔定律提出的发展速度相当.3.2.2 基于D-Wave 量子退火原理的整数分解国内外研究现状量子退火算法最早是由 A.B.Finnila [66]提出来的，主要是用来解决多元函数的最小值问题. D-Wave 的量子退火算法旨在组合优化、机器学习[67]、采样等问题的研究，包括地球物理反演[68]、蛋白质折叠问题[53]、旅行商问题(Travelling salesman problem)[69]、图像着色问题(GCP)[52]、城市交通问题[70]、整数分解问题[71]、希格斯玻色子优化问题[72]、量子模拟问题[73-74]等.量子退火(Quantum Annealing ，QA)算法基本思想是利用量子涨落来构造优化算法，即量子隧穿效应(Quantum Tunneling Effect). 与传统经典计算机模拟热波动的模拟退火不同，量子退火算法独特的量子隧穿效应更容易跳出局部最优解，有望逼近全局最优解.如图3所示，模拟退火算法在陷入局部最优点P 后只能以“翻山越岭”的方式越过能量势垒到达全局最优点P '，而量子退火算法独特的量子隧穿效应不用暂时接受较差的当前解就可以直接从P 点穿透能量势垒到达P '，这是与经典模拟退火及其他众多计算搜索算法相比的一个独特优势.图3 量子退火与模拟退火示意图2012年，上海大学王潮等人首先提出将组合优化问题映射到D-Wave 机器的理论模型[65]，并且分析了量子计算在密码破译方面的应用.2017年，Dridi 等人[54]首次提出将代数几何应用于量子退火相关问题. 将代数几何与量子退火算法结合通过D-Wave 2X 分解整数200099，该模型存在量子连接限制，需要的比特数多.2018年，Shuxian Jiang 等人[71]通过D-Wave。

ManachersAlgorithm马拉车算法

ManachersAlgorithm马拉车算法这个马拉车算法 Manacher‘s Algorithm 是⽤来查找⼀个字符串的的线性⽅法，由⼀个叫 Manacher 的⼈在 1975 年发明的，这个⽅法的最⼤贡献是在于将时间复杂度提升到了线性，这是⾮常了不起的。

对于回⽂串想必⼤家都不陌⽣，就是正读反读都⼀样的字符串，⽐如 "bob", "level", "noon" 等等，那么如何在⼀个字符串中找出最长回⽂⼦串呢，可以以每⼀个字符为中⼼，向两边寻找回⽂⼦串，在遍历完整个数组后，就可以找到最长的回⽂⼦串。

但是这个⽅法的时间复杂度为 O(n*n)，并不是很⾼效，下⾯我们来看时间复杂度为 O(n)的马拉车算法。

由于回⽂串的长度可奇可偶，⽐如 "bob" 是奇数形式的回⽂，"noon" 就是偶数形式的回⽂，马拉车算法的第⼀步是预处理，做法是在每⼀个字符的左右都加上⼀个特殊字符，⽐如加上 '#'，那么bob --> #b#o#b#noon --> #n#o#o#n#这样做的好处是不论原字符串是奇数还是偶数个，处理之后得到的字符串的个数都是奇数个，这样就不⽤分情况讨论了，⽽可以⼀起搞定。

接下来我们还需要和处理后的字符串t等长的数组p，其中 p[i] 表⽰以 t[i] 字符为中⼼的回⽂⼦串的半径，若 p[i] = 1，则该回⽂⼦串就是 t[i] 本⾝，那么我们来看⼀个简单的例⼦：# 1 # 2 # 2 # 1 # 2 # 2 #1 2 1 2 5 2 1 6 1 2 3 2 1为啥我们关⼼回⽂⼦串的半径呢？看上⾯那个例⼦，以中间的 '1' 为中⼼的回⽂⼦串 "#2#2#1#2#2#" 的半径是6，⽽未添加#号的回⽂⼦串为"22122"，长度是5，为半径减1。

the issue can’t be reproduced anymore

the issue can’t be reproduced anymore全文共四篇示例，供读者参考第一篇示例：最近在软件开发过程中，一个非常常见的问题是"the issue can't be reproduced anymore"，即无法再重现问题。

这种情况给开发人员和测试人员带来了极大的困扰，因为他们无法解决或验证问题是否已经解决。

造成问题无法再次重现的原因有很多种，其中一种可能是由于复制的环境不一样。

在软件开发的过程中，测试人员往往会在一个特定的环境中重现问题，一旦环境发生变化，问题就可能不再出现。

这也是为什么在软件开发过程中要尽量保持环境的一致性，以便更容易重现和修复问题。

另一个可能的原因是数据的不一致性。

在某些情况下，问题可能是由于特定的数据输入或操作触发的，一旦输入的数据不再存在或者操作不再执行，问题就不会再次出现。

这也需要测试人员详细记录并分析问题触发的条件和数据，以便更好地理解和解决问题。

此外，问题不再重现还可能是由于代码变更导致的。

在软件开发过程中，代码的变更是常有的事情，如果问题已经修复或者相关代码发生了变更，问题就有可能不再出现。

这也是为什么要及时记录和跟踪问题对应的代码变更，以便更好地理解和解决问题。

针对问题无法重现的情况，我们可以采取一些方法来解决。

首先，我们可以尝试还原问题发生时的环境和数据，以尽量模拟出问题的触发条件。

其次，我们可以尝试回顾问题的历史记录，查看相关的代码变更和提交信息，以了解问题的修复和改动情况。

最后，我们可以尝试与团队其他成员或相关利益方进行讨论，以了解问题的根本原因，并共同寻找解决方案。

总的来说，问题无法再次重现是软件开发过程中的常见问题，但我们可以通过仔细记录和分析问题的触发条件、环境、数据和代码变更等信息，来尽力解决问题并确保软件的质量和稳定性。

希望我们在今后的软件开发过程中能够更好地应对和处理这种情况，以提高软件的质量和用户体验。

Planar graphs, negative weight edges, shortest paths, and near linear time

Planar graphs,negative weight edges,shortest paths,and near linear timeJittat Fakcharoenphol Satish RaoAbstractIn this paper,we present an time algo-rithm forﬁnding shortest paths in a planar graph with real weights.This can be compared to the best previous strongly poly-nomial time algorithm developed by Lipton,Rose,and Tar-jan in1978which ran in time,and the best poly-nomial algorithm developed by Henzinger,Klein,Subrama-nian,and Rao in1994which ran in time.We also present signiﬁcantly improved algorithms for query and dynamic versions of the shortest path problems.1IntroductionThe shortest path problem with real(positive and nega-tive)weights is the problem ofﬁnding the shortest distance from a speciﬁed source node to all the nodes in the graph. For this paper,we assume that the graph actually has no negative cycles since the shortest path between two nodes will typically be undeﬁned in the presence of negative cy-cles.In general,algorithms for the shortest path problem can, however,easily be modiﬁed to output a negative cycle,if one exists.This also holds for the algorithms in this paper.The shortest path problem has long been studied and continues toﬁnd applications in diverse areas.The prob-lem has wide application even when the underlying graph is a grid graph.For example,there are recent image segmen-tation approaches that use negative cycle detection[4,5]. Other of our favorite applications for planar graphs include separator algorithms[17],multi-source multi-sinkﬂow al-gorithms[15],or algorithms forﬁnding minimum weighted cuts.In1958,Bellman and Ford[2,7]gave an algorithm forﬁnding shortest paths on an-edge,-vertex graph with real edge weights.Gabow and Tar-jan[10]showed that this problem could indeed be solved inComputer Science Division,University of California,Berkeley,CA 94720.E-mail:jittat@.Supported by Fulbright Scholarship and the scholarship from the Faculty of Engineering,Kasetsart University,Thailand.Computer Science Division,University of California,Berkeley,CA 94720.E-mail:satishr@.1The notation ignores logarithmic factors.values of the edge weights.For strongly polynomial algo-rithms,Bellman-Ford remains the best known.As for graphs with positive edge weights,the problem is much easier.For example,Dijkstra’s shortest path algo-rithm can be implemented in time.For planar graphs,upon the discovery of planar sepa-rator theorems[14],an algorithm was given by Lipton,Rose,and Tarjan.[13].Their algorithm is based on partitioning the graph into pieces,recursively comput-ing distances on the borders of the pieces using numerous invocations of Dijkstra’s algorithm to build a dense graph. Then they use the Bellman-Ford algorithm on the resulting dense graph to construct a global solution.Their algorithms worked not only for planar graphs but for anysized separator).In this paper,we present an time algo-rithm forﬁnding shortest paths in a planar graph with real weights.We also present algorithms for query and dynamic ver-sions of the shortest path problems.1.1The ideaOur approach is similar to the approaches discussed above in that it constructs a rather dense non-planar graph on a subset of nodes and then computes a shortest path tree in that graph.We observe that there exists a shortest path tree in this dense graph that must obey a non-crossing property in the geometric embedding of the graph inherited from the embedding of the original planar ing this non-crossing condition,we can compute a shortest path tree of the dense graph in time that is near linear in the number of nodes in the dense graph and signiﬁcantly less than linear in the number of edges.Speciﬁcally,we decompose our dense graph into a set of bipartite graphs whose distance matrices obey a noncrossing condition(called the Monge condition).Efﬁcient algorithms for searching for minima in Monge arrays have been developed previously.See,for example,[1,3].Our algorithm proceeds by combining Dijkstra’s and the Bellman-Ford algorithms with methods for searching Monge matrices in sublinear time.We use an on-line method for searching Monge arrays with our version of Di-jkstra’s algorithm on the dense graph.We note that our algorithms heavily rely on planarity, whereas some of the previous methods only require that the graphs are separable.However,our methods are at least tolerant to a few vi-olations to planarity.All of our results continue to hold when the graph can be embedded in the plane such that only The best previous algorithms had a query,preprocess-ing time product of at least,where ours isAn algorithm that supports distance queries and update operations that change edge weights in amortized timeper operation.This algorithm works for positive edge weights.An algorithm that supports distance queries and up-date operations that change edge weights in amortized time per operation.This algorithm works for negative edge weights as well.We also present an on-line Monge searching problem and methods to solve it that may be novel and of indepen-dent interest.1.3More related workFor planar graphs with positive edge weights,Henzinger et al.[12]gave an time algorithm.Their work im-proves on work of Frederickson[9]who had previously given3The algorithmWe proceed in this section with a description of our al-gorithm.In section3.1,we deﬁne our main tool,the densedistance graph,which is an efﬁciently searchable represen-tation of distances in the planar graph.In section3.2,weshow how to compute the graph inductively,by relying onsome Monge data structures and efﬁcient implementationsof Dijkstra’s algorithm and the Bellman-Ford algorithm.In section3.3,we show how to use it to compute a short-est path labelling of the graph.In sections3.4to3.6,we usethe dense distance graph as the basis for query and dynamicshortest path algorithms.3.1The dense distance graphA decomposition of a graph is a set of subsets(not necessarily disjoint)such that the unionof all the sets is and for all,for some.A node is a border nodes of a set ifand there exists an edge where.We re-fer to the subgraph induced on a subset as a piece of thedecomposition.We assume that we are given a recursive decomposi-tion where at each level,a piece with-nodes and bor-der nodes is divided into2subpieces such that each sub-piece has no more than nodes and at mostbor-der nodes.By a property of the decomposition,we assumethat the number of border nodes of each of the2subpiecesof contain at mostnodes.3.2.1The Bellman-Ford stepThe Bellman-Ford algorithm that we run proceeds as fol-lows.while,so the number of edges is.Therefore,if we relax every edge directly as in[13],the running timefor each step of edge relaxation would be for all of.The total running time for the Bellman-Ford step wouldthen beHowever,we will relax the edges in time that is nearlinear in the number of nodes,in particular(a) (b) (c)Figure1:Partitions of nodes to form Monge arrays:(a)the1st and the2nd arrays,(b)the3rd and the4th arrays,and(c)the5th and the6th arrays.time.This gives a running time of for the Bellman-Ford step.We accomplish this by maintaining the edges of each subpiece of in levels of Monge arrays.The edges in each Monge array can be relaxed intime where is the number of nodes in the data structure.Theﬁrst Monge array that we deﬁne is formed as fol-lows.Divide the border nodes in some subpiece into2 halves,theﬁrst(or left)half in the circular order(with an arbitrary starting point)and the second(or right)half.Con-sider the set of edges in the dense distance graph that go from the left border nodes to the right ones.The edges obey the Monge property,since the underlying shortest path tree need not cross.Using the same left-right partitioning,we can deﬁne an-other Monge array with the direction of edges reversed,i.e., edges in the array go from the right border nodes to the left border nodes.Successive Monge arrays,are constructed by recursively dividing the left and right halves further.Each node will occur in at most data structures,and each edge will occur in one data structure.Figure1shows how we partition nodes and edges between them.We can relax all the edges in a Monge array as follows. The nodes on the left have a label associated with them,and a node on the right must choose a left node which min-imizes.However,because of the planarity of the piece,the“parent”edges of two right nodes need not cross,and this gives us the Monge property.For this special case,we can use a standard divide-and-conquer technique toﬁnd all the parents in time,where the number of nodes in the Monge array isThe total number of nodes in all the data structures is.The number of phases the Bellman-Ford runs is the number of nodes in the longest path,which ispath tree starting at a border node.It uses the following operations on data structures that are maintained for each of the subpieces.ScanInSubpiece():Relax all edges of in the dense distance graph at level in piece condi-tioned on.A sequence of calls to ScanInSubpiece can be imple-mented in time.FindMinInSubpiece():Return the border node (which might already be scanned)in piece whose la-bel is no greater than all unscanned nodes in the piece.This procedure can be implemented in time.ExtractMinInSubpiece():Return the border node (which might already be scanned)in piece whose la-bel is no greater than all unscanned nodes in the piece, and attempt to remove it from the heap in the piece.No node can be returned by this procedure more than times.A sequence of calls to ExtractMinInSubpiece can beimplemented in time.We will show how to implement this data structure in section4.2.At this point we assume the bounds stated above and use them to bound the running time of the Di-jkstra step.We stress that an already scanned node might be returned from FindMinInSubpiece and ExtractMinInSubpiece.The data structure does,however,guarantee that any border nodes will not be returned from ExtractMinInSubpiece more than times.When the data structure for each subpiece returns a minimum labelled unscanned node,it is the minimum un-scanned node in the subpiece.Also,the algorithm only scans a node which is the minimum over all nodes returned from all the subpieces.Therefore,every time the algorithm scans a node,has the minimum distance label over all unscanned nodes.Thus,our algorithm is a valid implementation of Dijk-stra’s algorithm in that it only scans the minimum labelled nodes.Thus,it correctly computes a shortest path labelling.3.2.3Analysis of the running time of the Dijkstra step For this piece,there arecalls to ScanInSubpiece.ExtractMinInSubpiece is called at most times on each node in the level dense distance graph.So the total number of operations is for a total cost of Finally,the number of calls to FindMinInSubpiece is bounded by the number of calls to ScanInPiece for a sub-piece.Therefore,the running time for computing each short-est path tree istime.The algorithm for this is very similar to the one for the Dijkstra step in the shortest path algorithm.Suppose the query is for the distance of a pair.The shortestpath can be viewed as a sequence of paths between border nodes of the nested pieces that contain and.The lengths of these paths is represented in the dense distance graph as an edge between border nodes and enclosing border nodes or as an edge among border nodes of a piece.Thus,we can perform a Dijkstra’s computation on this subgraph of the dense distance graph to compute the shortest path.We derive the bound on the number of border nodes in the pieces containing as follows.Each piece,except the ﬁrst one,which is,is a piece in the decomposition of the other.Hence the number of nodes goes down geometrically. Also,the number of border nodes,which is bounded above by the square root of the number of nodes in the piece,goesdown geometrically.Therefore the number of border nodesinvolved is,the running time is bounded by.Naively,one can bound the total number of activated nodesbyas fol-lows.Consider the decomposition tree.There are at mostleafs that are activated.Hence,at most pieces haveboth their children activated;call these pieces branchingpieces.Because the number of nodes goes down geomet-rically along the tree,we can bound the total number ofactivated border nodes using the number of border nodesof the branching pieces.The worst case is that allbranching pieces are in the highest level of the decomposi-tion tree,i.e.,they form a balanced binary tree.We note thatthe pieces on the same level partition the the graph;thus,the number of border nodes is maximized when they parti-tion the graph evenly.Hence,on level there are at mostborder nodes.The sum on the last leveldominates the total sum;therefore,the number of bordernodes isThus,the total cost of a sequence of updates andqueries isfor(re)building the dense distance graph.By choosingto be we get an amortized complexity ofper operation.3.6Dynamic algorithms for graphs with negativeedge weightsWe follow the same strategy in this case,as well.That is,we simply maintain the notion of the activated graph duringa sequence of updates.To answer a query for a pair,we compute a distance labelling in the extended activatedgraph for.Unfortunately,there may be negative edges in the ex-tended activated graph so we cannot just do a Dijkstra com-putation as above.We note that if we have a feasible price function over thenode set of the extended activated graph,if only one edgeis updated with a negative weight,we canuse one computation of Dijkstra’s algorithm to update theprice function as follows.We compute the shortest distancelabels of all the nodes starting from.If is greaterthan,changing the weight of does not introduce anyedge with a negative reduced cost on the graph withas a price function;hence,we can update and update theprice function to be.Therefore,if we already have updates,we can computea feasible price function in the extended activated graphby performing Dijkstra computations by starting with theoriginal price function on the extended activated graph,andfor each update,we update the price function as describedabove.After we have the feasible price function for the extendedactivated graph which includes all the updates,we can pro-ceed as in the previous section.After queries and updates,we rebuild the dense dis-tance graph.Thus,the total time for a sequence of queriesand updates is3The idea is toﬁnd the correct parent of the middle right nodeﬁrst bychecking all the left nodes,and then recurse on the top and bottom half ofthe right nodes.You will only look at each left node once for each recursivelevel.Without loss of generality,assume that.Let an be the ordered set and be the ordered set.The data structure maintains the initial distance variables for all and the subset ,and supports the following operations.ActivateLeft():for,set.FindNextMinNode():return a node such that.AddCurrentMinNodeToS():setFindNextMinNode().To build this data structure,we will use an interval tree, which,for an ordered set,supports a query of the form,for any given. The interval tree can be implemented using a balanced bi-nary tree.The time for each query is,where is the size of the ordered set.The data structure maintains:for each node in,whether it is active,the left neighbor tree,a ordered(by index)binary tree for nodes in which are the best left neighbors for some right node.the heap for the minimum edges of every .the best left node data structure,which stores for each node its best left node in.We use a binary tree storing the triplet to im-plement.Also,for each active node,the data structure maintains the range,and the data structure is given an interval tree rep-resentation for the values4The algorithm maintains the invariant that for every which has the initial distance,the node is the best left node for the right nodes in its range ,i.e.,for all and.Note that this implies that the ranges are nonoverlapping.(For example, seeﬁgure4(a).)We now describe how each operation is performed.ActivateLeft():If is theﬁrst one activated,letand,and insert into.Oth-erwise,we willﬁnd a set of right nodesof which the node is now the minimum left node..Also we set and,and remove from the heap.Finally,for and,we use’s interval tree toﬁnd their best right neighbors and add them to.We note that we can eliminate the use of the best left node data structure by allowing FindNextMinNode to return together with the minimum node its left neighbor ,i.e.,FindNextMinNode returns the edge.How-ever,this does not improve the running time.4.2.1Analysis of the running timeWe note that the size of might be greater than dur-ing the execution of the algorithm because we create some nodes every time AddCurrentMinNodeToS is called.How-ever,it is called at most times;thus,we create no more than nodes.We now analyze the running time for each operation.ActivateLeft():In the beginning,searching for the index in takes time.Toﬁnd we do a sequential search and a binary search.Every node in that we examined during the sequential search is removed except the last one.We charge the cost for the sequential search to the cost for removing and updating these nodes.The cost for the binary search is.The search for the lower end costs the same.Then,the node has to pick its best right neighbor and add it to.This can be done in time.After the interval is found,some other node in must update its data structure.At most2nodes have to change their intervals,re-pick their best right neigh-bors,and update their entries in;this takestime.All other nodes are deleted and will never re-appear.Each delete takes time and we charge this to the time the node was inserted to the data struc-ture.Therefore,the operation takes amortized time.FindNextMinNode():We can read the top-most item in in time.AddCurrentMinNodeToS():We canﬁnd the current minimum node in time.It takestime toﬁnd the left matched node for.We then do another operations on and which taketime.Therefore,AddCurrentMinNodeToS runs in time.4.3Non-bipartite on-line Monge searchingWe generalize our data structure to support the case when the graph is not bipartite in this section.We have the graph with the distance function.The nodes in are in a circular order,and the distance function satisﬁes the property that(2) for every such that in. Notice that the sign of the inequality is reversed because in this case crosses,contrary to the bipartite case that crosses.This general case can be reduced to bipartite cases.The idea is as explained in section3.2.1.From the graph,we create bipartite graphs, because for each left-right partition,edges between them can go in two directions.We denoted these bipartite graphs as.Under this reduction,each edge belongs to one and only one bipartite graph.We re-fer to each bipartite graph as a level of.Let denote the set of these bipartite graphs.The operations that we need from this non-bipartite data structure are the following.We want to be able to set the initial offset distance as in the bipartite case,and also, we want toﬁnd the minimum labelled node.The minimum labelled node over the graph is the minimum one over all the levels.However,the notion of the set is different now. Suppose that a node is the current minimum labelled node with label on the level bipartite graph.When has the correct match,its reach this minimum only on the level.On the other levels,the labels of do not necessar-ily reach their minima,i.e.,they can still change.Therefore, we cannot put to in all the other levels,because it can affect how we search for the interval of some unactivated left node.Hence,we only add to the set of the level bipartite graph.This has a drawback,i.e.,the call to Find-NextMinNode can return again.However,because each node belongs to at most levels,the node can reappear at most times.The data structure for the non-bipartite case consists of data structures for the bipartite cases,for all.It maintains a heap of minimum nodes over all lev-els,and initially the distance offset for all in all levels.To make the names of the procedures consistent with the algorithm that constructs the dense distance graph, we call these procedures ScanInSubpiece,FindMinInSub-piece,and ExtractMinInSubpiece instead of ActivateNode, FindMin,and ExtractMin,respectively.We now describe the operations that the data structure supports together with their implementation and the running time.ScanInSubpiece():let.This operation can be done by calling ActivateLeft()on every of whichis in the left-side node.On those affected levels,we call FindMin and update their entries in the heap.This operation can be done in amortized time,because there are levels and each call to ActivateLeft costs amortized time.The time forﬁnding the minimum nodes and updating theheap is only,because the heap is of size.FindMinInSubpiece():ﬁnd the minimum distance node over all levels.This can be done in time by returning the mini-mum entry in the heap.ExtractMinInSubpiece():ﬁnd the minimum distance node over all levels,remove that node from its level, and attempt to add the node to the set of the data structure.For this operation,we do as in FindMinInSubpiece,but after the minimum node is found,we call AddCurrent-MinToS once on the level to which the minimum node belongs and update that level’s entry in.The cost for AddCurrentMinToS is time,and the cost for updating is.Therefore,this oper-ation can be done in time.As noted in the discussion above,after attempts to add a node to,that node will never appear as a mini-mum node again.AcknowledgmentsWe would like to thank Chris Harrelson for his careful reading of this paper.References[1]Alok Aggarwal,Amotz Bar-Noy,Samir Khuller,DinaKravets,and Baruch Schieber.Efﬁcient Minimum Cost Matching and Transportation Using the Quadrangle Inequal-ity.Journal of Algorithms,19(1):116–143,1995.[2]R.E.Bellman.On a Routing Problem.Quart.Appl.Math,16:87–90,1958.[3]Samuel R.Buss and Peter N.Yianilos.Linear andTime Minimum-Cost Matching Algorithms for Quasi-Convex Tours.SIAM Journal on Computing, 27(1):170–201,1998.[4]I.J.Cox,S.B.Rao,and Y.Zhong.‘Ratio Regions’:A Tech-nique for Image Segmentation.In Proceedings International Conference on Pattern Recognition,pages557–564.IEEE, Aug.1996.[5]L.A.Costa D.Geiger,A.Gupta and J.Vlontzos.Dynamicprogramming for detecting,tracking and matching elastic contours.IEEE Trans.On Pattern Analysis and Machine Intelligence,1995.[6]Hristo N.Djidjev,Grammati E.Pantziou,and Christos D.puting Shortest Paths and Distances in Pla-nar Graphs.In Proc.18th ICALP,pages327–339.Springer-Verlag,1991.[7]L.R.Ford and D.R.Fulkerson.Flows in Networks.Prince-ton Univ.Press,Princeton,NJ,1962.[8]Greg N.Frederickson.A new approach to all pairs shortestpaths in planar graphs(extended abstract).In Proceedings of the Nineteenth Annual ACM Symposium on Theory of Com-puting,pages19–28,May1987.[9]Greg N.Frederickson.Fast algorithms for shortest paths inplanar graphs,with applications.SIAM Journal on Comput-ing,16(6):1004–1022,December1989.[10]Harold N.Gabow and Robert E.Tarjan.Faster Scaling Algo-rithm for Network Problems.SIAM Journal on Computing, 18(5):1013–1036,1989.[11]Andrew V.Goldberg.Scaling algorithms for the shortestpath problem.SIAM Journal on Computing,21(1):140–150, 1992.[12]Monika R.Henzinger,Philip N.Klein,Satish Rao,andSairam Subramanian.Faster Shortest-Path Algorithms for Planar Graphs.Journal of Computer and System Sciences, 55(1):3–23,1997.[13]R.Lipton,D.Rose,and R.E.Tarjan.Generalized nesteddissection.SIAM Journal on Numerical Analysis,16:346–358,1979.[14]Richard J.Lipton and Robert E.Tarjan.A separator theoremfor planar graphs.SIAM Journal on Applied Mathematics, 36:177–189,1979.[15]ler and J.Naor.Flow in planar graphs with multiplesources and sinks.SIAM Journal on Computing,24:1002–1017,1995.[16]Joseph S.B.Mitchell,David M.Mount,and Christos H.Pa-padimitriou.The discrete geodesic problem.SIAM Journal on Computing,16(4):647–668,1987.[17]Satish B.Rao.Faster algorithms forﬁnding small edge cutsin planar graphs(extended abstract).In In Proceedings of the Twenty-Fourth Annual ACM Symposium on the Theory of Computing,pages229–240,May1992.。

稀疏恢复和傅里叶采样

Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leslie A. Kolodziejski Chair, Department Committee on Graduate Students
2
Sparse Recovery and Fourier Sampling by Eric Price
Submitted to the Department of Electrical Engineering and Computer Science on August 26, 2013, in partial fulﬁllment of the requirements for the degree of Doctor of Philosophy in Computer Science
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical Engineering and Computer Science August 26, 2013
Certiﬁed by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Piotr Indyk Professor Thesis Supervisor

rough set

INTRODUCTION Intrusion detection is used to classify normal and intrusive activities, in which machine learning can play an important role. Recently the machine learning-based intrusion detection approaches (Allen et al., 2000) have been subjected to extensive researches because they can detect both misuse and anomaly. The learning-based intrusion detection approaches include two key steps: feature ex-
†
E-mail: a000309035@
Received May 21, 2003; revision accepted July 21, 2003
Abstract: Recently machine learning-based intrusion detection approaches have been subjected to extensive researches because they can detect both misuse and anomaly. In this paper, rough set classification (RSC), a modern learning algorithm, is used to rank the features extracted for detecting intrusions and generate intrusion detection models. Feature ranking is a very critical step when building the model. RSC performs feature ranking before generating rules, and converts the feature ranking to minimal hitting set problem addressed by using genetic algorithm (GA). This is done in classical approaches using Support Vector Machine (SVM) by executing many iterations, each of which removes one useless feature. Compared with those methods, our method can avoid many iterations. In addition, a hybrid genetic algorithm is proposed to increase the convergence speed and decrease the training time of RSC. The models generated by RSC take the form of “IF-THEN” rules, which have the advantage of explication. Tests and comparison of RSC with SVM on DARPA benchmark data showed that for Probe and DoS attacks both RSC and SVM yielded highly accurate results (greater than 99% accuracy on testing set). Key words: Intrusion detection, Rough set classification, Support vector machine, Genetic algorithm doi:10.1631/jzus.2004.1076 Document code: A CLC number: TP393

resource

Extension of O(n log n)ﬁltering algorithms for the unary resource constraint to optional activitiesPetr Vil´ım(vilim@kti.mff.cuni.cz),Roman Bart´a k(bartak@kti.mff.cuni.cz)and Ondˇr ejˇCepek(ondrej.cepek@mff.cuni.cz)Charles UniversityFaculty of Mathematics and PhysicsMalostransk´e n´a mˇe st´ı2/25,Praha1,Czech RepublicAbstract.Scheduling is one of the most successful application areas of constraint programming mainly thanks to special global constraints designed to model resource restrictions.Among these global constraints,edge-ﬁnding and not-ﬁrst/not-last are the most popularﬁltering algorithms for unary resources.In this paper we introduce new O(n log n)versions of these twoﬁltering algorithms and one more O(n log n)ﬁltering algorithm called detectable precedences.These algorithms use a special data structuresΘ-tree andΘ-Λ-tree.These data structures are especially designed for”what-if”reasoning about a set of activities so we also propose to use them for handling so called optional activities,i.e.activities which may or may not appear on the resource.In particular,we propose new O(n log n)variants ofﬁltering algo-rithms which are able to handle optional activities:overload checking,detectable precedences and not-ﬁrst/not-last.1.IntroductionIn scheduling,a unary resource is an often used generalization of a machine(or a job in openshop).A unary resource models a set of non-interruptible activities T which must not overlap in a schedule.Each activity i∈T has the following requirements:−earliest possible starting time est i−latest possible completion time lct i−processing time p iA(sub)problem is toﬁnd a schedule satisfying all these require-ments.This problem is long known to be computationally diﬃcult[7]1. One of the most used techniques to solve this problem is constraint programming.In constraint programming,we associate a unary resource constraint with each unary resource.A purpose of such a constraint is to reduce2the search space by tightening the time bounds est i and lct i.This pro-cess of elimination of infeasible values is called propagation,an actual propagation algorithm is often called aﬁltering algorithm.Due to the NP-hardness of the problem,it is not eﬃcient to remove all infeasible values.Instead,it is customary to use several fast but not complete algorithms which canﬁnd only some impossible assignments. Theseﬁltering algorithms are repeated in every node of a search tree, therefore their speed andﬁltering power are crucial.Filtering algorithms considered in this paper are:Edge-ﬁnding.Paper[5]presents O(n log n)version,another two O(n2) versions of edge-ﬁnding can be found in[8,9].Not-ﬁrst/not-last.O(n log n)version of the algorithm can be found in[12],two older O(n2)versions are in[2,10].Detectable precedences.This O(n log n)algorithm was introduced in[12].All theseﬁltering algorithms can be used together to join theirﬁltering powers.In this paper,we present not-ﬁrst/not-last algorithm with time com-plexity O(n log n)and edge-ﬁnding algorithm with the same time com-plexity,which is quite simpler then the algorithm by Carlier and Pinson [5]and it is faster then quadratic algorithms which are widely used today.Another asset of the algorithm is the introduction of theΘ-Λ-tree–a data structure which can be used to extendﬁltering algorithms to handle optional activities.1.1.Basic NotationLet us establish the basic notation concerning a subset of activities. Let T be a set of all activities on the resource and letΘ⊆T be an arbitrary non-empty subset of activities.An earliest starting time estΘ, a latest completion time lctΘand a processing time pΘof the setΘare deﬁned as:estΘ=min{est j,j∈Θ}lctΘ=max{lct j,j∈Θ}pΘ= j∈Θp jOften we need to estimate an earliest completion time of a setΘ:ECTΘ=max estΘ′+pΘ′,Θ′⊆Θ (1)3 To extend the deﬁnitions also forΘ=∅let est∅=−∞,lct∅=∞, p∅=0and ECT∅=−∞.2.Θ-TreeΘ-tree wasﬁrst introduced in[12].In this paper we present a slightly diﬀerent version of this data structure.This new version is easier to implement and runs a little bit faster then the old one.The purpose of aΘ-tree is to quickly recompute ECTΘwhen an activity is inserted or removed from the setΘ.Because the set repre-sented by the tree will always be namedΘin this paper,we will call the treeΘ-tree.AΘ-tree is a balanced binary tree.Activities from the setΘare represented by leaf nodes2.Internal nodes of the tree are used to hold some precomputed values.In the following we do not make a diﬀerence between an activity and the leaf node representing that activity.Let v be an arbitrary node of theΘ-tree(an internal node or a leaf).We deﬁne Leaves(v)to be the set of all activities represented in the leaves of the subtree rooted at the node v.Further let:p jΣP v=j∈Leaves(v)ECT v=ECT Leaves(v)=max estΘ′+pΘ′,Θ′⊆Leaves(v) Clearly,for an activity i∈Θwe haveΣP i=p i and ECT i=est i+p i. Also,for root node r we have ECT r=ECTΘ.For internal node v the valueΣP v can be easily computed from the direct descendants left(v)and right(v):ΣP v=ΣP left(v)+ΣP right(v)(2) In order to compute also ECT v quickly,the activities cannot be stored in the leaves randomly,but in the ascending order by est from left to right.I.e.for any two activities i,j∈Θ,if est i<est j then the activity i is stored on the left from the activity j.Thanks to this property the following inequality holds(Left(v)is a shortcut for Leaves(left(v)), similarly Right(v)):∀i∈Left(v),∀j∈Right(i):est i≤est j(3)4Proposition1.For an internal node v,the value ECT v can be com-puted by the following recursive formula:ECT v=max ECT right(v),ECT left(v)+ΣP right(v) (4) Proof.From the deﬁnition(1),the value ECT v is:ECT v=ECT Leaves(v)=max estΘ′+pΘ′,Θ′⊆Leaves(v) With respect to the node v we will split the setsΘ′into the following two categories:1.Left(v)∩Θ′=∅,i.e.Θ′⊆Right(v).Clearly:max estΘ′+pΘ′,Θ′⊆Right(v) =ECT Right(v)=ECT right(v)2.Left(v)∩Θ′=∅.Then estΘ′=estΘ′∩Left(v)because of the property(3).Let S be the set of all possibleΘ′considered now:S={Θ′,Θ′⊆Θ&Θ′∩Left(v)=∅}Thus:max estΘ′+pΘ′,Θ′⊆S ==max estΘ′∩Left(v)+pΘ′∩Left(v)+pΘ′∩Right(v),Θ′⊆S ==max estΘ′∩Left(v)+pΘ′∩Left(v),Θ′⊆S +p Right(v)==ECT left(v)+ΣP right(v)Therefore the formula(4)is correct.Figure1shows an example aΘ-tree.Thanks to the recursive formulae(2)and(4),the values ECT v and ΣP v can be easily computed within usual operations with a balanced bi-nary tree without changing their time complexities.Table I summarizes time complexities of diﬀerent operations with aΘ-tree.Notice that so farΘ-tree does not require any particular way of balancing.The only requirement is a time complexity O(log n)for inserting or deleting a leaf,and time complexity O(1)forﬁnding a root node.According to authors experience,the fastest way to implementΘ-tree is to make the shape of the treeﬁxed during the whole compu-tation.I.e.we start with the perfectly balanced tree which represents all activities on the resource.To indicate that an activity i is not in the setΘit is enough to setΣP i=0and ECT i=−∞.Clearly,these5ΣP=25ECT=45ΣP=11 ECT=31ΣP=14 ECT=44est a=0p a=5ΣP a=5 ECT a=5est b=25p b=6ΣP b=6ECT b=31est c=30p c=4ΣP c=4ECT c=34est d=32p d=10ΣP d=10ECT d=52Figure1.An example of aΘ-tree forΘ={a,b,c,d}.Table I.Time complexities of opera-tions onΘ-tree.Θ:=∅O(1)or O(n log n)Θ:=Θ∪{i}O(log n)Θ:=Θ\{i}O(log n)ECTΘO(1)61Θ:=∅;2for j∈T in ascending order of lct j do begin3Θ:=Θ∪{j};4i f ECTΘ>lct j then5f a i l;{No solution exists}6end;Time complexity of this algorithm is O(n log n):the activities have to be sorted and n-times an activity is inserted into the setΘ.4.Not-ﬁrst/not-last usingΘ-treeNot-ﬁrst and not-last are two symmetric propagation algorithms fora unary resource.From these two,we will consider only the not-lastalgorithm.Let us consider a setΩ⊆T and an activity i∈(T\Ω).The activityi cannot be scheduled after the setΩ(i.e.i is not last withinΩ∪{i})if:estΩ+pΩ>lct i−p i(6) In that case,at least one activity from the setΩmust be scheduled after the activity i.Therefore the value lct i can be updated:lct i:=min lct i,max lct j−p j,j∈Ω (7)There are two versions of the not-ﬁrst/not-last algorithms:[2]and[10].Both of them have time complexity O(n2).Theﬁrst algorithm[2]ﬁnds all the reductions resulting from the previous rules in one pass.Still,after this propagation,next run of the algorithm mayﬁnd more reductions(not-ﬁrst and not-last rules are not idempotent).Therefore the algorithm should be repeated until no more reduction is found(i.e.aﬁxpoint is reached).The second algorithm[10]is simpler and faster, but more iterations of the algorithm may be needed to reach aﬁxpoint.The algorithm presented here may also need more iterations to reach aﬁxpoint then the algorithm[2]maybe even more than the algorithm[10].However,time complexity is reduced from O(n2)to O(n log n).Let us suppose that we have chosen a particular activity i and now we want to update lct i according to the not-last rule.To really achieve some change of lct i using the rule(7),the setΩmust fulﬁl the following property:max lct j−p j,j∈Ω <lct i7 Therefore:Ω⊆ j,j∈T&lct j−p j<lct i&j=iWe will use the same trick as[10]:Because the search for the best update of lct i may be time consuming we will simply search for some up-date of lct i.If lct i can be updated better,it will be done in subsequent runs of the algorithm.In fact,our algorithm updates lct i tomax lct j−p j,j∈T&lct j−p j<lct i which is the“smallest”possible update among all possible updates3.Let us deﬁne the setΘ(i):Θ(i)= j,j∈T&lct j−p j<lct iThus:lct i can be changed according to the rule not-last if and only if there is some setΩ⊆(Θ(i)\{i})for which the inequality(6)holds.The only problem is to decide whether such a setΩexists or not.Let us recall the deﬁnition(1)of ECT and use it on the setΘ(i)\{i}: ECTΘ(i)\{i}=max{estΩ+pΩ,Ω⊆Θ(i)\{i}} Notice,that ECTΘ(i)\{i}is exactly the maximum value which can be on the left side of the inequality(6).Therefore there is a setΩfor which the inequality(6)holds if and only if:ECTΘ(i)\{i}>lct i−p iThe algorithm proceeds as follows.Activities i are taken in the as-cending order of lct i.For each single activity i the setΘ(i)is computed by extending the setΘ(i)of previous activity i4.For each i ECTΘ(i)\{i} is checked and lct i is eventually updated:1for i∈T do2lct′i:=lct i;3Θ:=∅;4Q:=queue of all activities j∈T in ascending order of lct j−p j;5for i∈T in ascending order of lct i do begin86while lct i>lct Q.ﬁrst−p Q.ﬁrst do begin9j:=Q.f i r s t;10Θ:=Θ∪{j};11Q.dequeue;12end;13i f ECTΘ\{i}>lct i−p i then14lct′i:=min lct j−p j,lct′i ;15end;16for i∈T do17lct i:=lct′i;Lines9–11are repeated n times maximum over all iterations of the for cycle on the line5,because each time an activity is removed from the queue.Check on the line13can be done in O(log n).Therefore the time complexity of the algorithm is O(n log n).Without changing the time complexity,the algorithm can be slightly improved:the not-last rule can be also checked for the activity Q.ﬁrst just before the insertion of the activity Q.ﬁrst into the setΘ(i.e.after the line6):7i f ECTΘ>lct Q.ﬁrst−p Q.ﬁrst then8lct′Q.ﬁrst:=lct j−p j;This modiﬁcation may in some cases save few iterations of the algorithm.5.Detectable PrecedencesAn idea of detectable precedences was introduced in[11]for a batch resource with sequence dependent setup times,which is an extension ofa unary resource.Theﬁgure2is taken from[11].It shows a situation when neither edge-ﬁnding nor the not-ﬁrst/not-last algorithm can change any time bound,but a propagation of detectable precedences can(see section6 for details on edge-ﬁnding algorithm).Edge-ﬁnding algorithm recognizes that the activity A must be pro-cessed before the activity C,i.e.A≪C,and similarly B≪C.Still, each of these precedences alone is weak:they do not enforce any change of any time bound.However,from the knowledge{A,B}≪C we can deduce est C≥est A+p A+p B=21.A precedence j≪i is called detectable,if it can be“discovered”only by comparing the time bounds of these two activities:est i+p i>lct j−p j⇒j≪i(8)90==251==27est C ==35Figure 2.A sample problem for detectable precedencesNotice that both precedences A ≪C and B ≪C are detectable.There is a simple quadratic algorithm,which propagates all known precedences on a resource.For each activity i build a set Ω={j ∈T,j ≪i }.Note that precedences j ≪i can be of any type:detectable precedences,search decisions or initial ing such set Ω,est i can be adjusted:est i :=max {est i ,ECT Ω}because Ω≪i .1for i ∈T do begin2m :=−∞;3for j ∈T in non-decreasing order of est j do4i f j ≪i then5m :=max {m,est j }+p j ;6est i :=max {m,est i };7end ;A symmetric algorithm adjusts lct i .However,propagation of only detectable precedences can be done within O (n log n ).Let Θ(i )be the following set of activities:Θ(i )= j,j ∈T &est i +p i >lct j −p jThus Θ(i )\{i }is a set of all activities which must be processed before the activity i because of detectable ing the set Θ(i )\{i }the value est i can be adjusted:est i :=max est i ,ECT Θ(i )\{i }There is also a symmetric rule for precedences j ≫i ,but we will not consider it here,nor the resulting symmetric algorithm.Our algorithm is based on the observation that set Θ(i )does not have to be constructed from scratch for each activity i .Rather,the set Θ(i )can be computed incrementally in the similar way as in the previous section.101Θ:=∅;2Q:=queue of all activities j∈T in ascending order of lct j−p j;3for i∈T in ascending order of est i+p i do begin4while est i+p i>lct Q.ﬁrst−p Q.ﬁrst do begin5Θ:=Θ∪{Q.ﬁrst};6Q.dequeue;7end;8est′i:=max est i,ECTΘ\{i} ;9end;10for i∈T do11est i:=est′i;Initial sorts takes O(n log n).Lines5and6are repeated n times maximum over all iterations of the for cycle,because each time an activity is removed from the queue.Line8can be done in O(log n).Therefore the time complexity of the algorithm is O(n log n).6.Edge-Finding usingΘ-Λ-treeEdge-ﬁnding is probably the most often usedﬁltering algorithm for a unary resource constraint.Let usﬁrst recall classical edge-ﬁnding rules[2].Consider a setΩ⊆T and an activity i∈Ω.If the following con-dition holds,then the activity i has to be scheduled after all activities fromΩ:∀Ω⊂T,∀i∈(T\Ω):estΩ∪{i}+pΩ∪{i}=min{estΩ,est i}+pΩ+p i>lctΩ⇒Ω≪i(9) Once we know that the activity i must be scheduled after the setΩ, we can adjust est i:Ω≪i⇒est i:=max{est i,ECTΩ}(10) Edge-ﬁnding algorithm propagates according to this rule and its symmetric version.There are several implementations of edge-ﬁnding algorithm,two diﬀerent quadratic algorithms can be found in[8,9],[5] presents a O(n log n)algorithm.In the following we present another edge-ﬁnding algorithm with time complexity O(n log n).It is quite simpler than the algorithm by Carlier and Pinson[5]and it is faster than the quadratic algorithms[8,9]which are widely used today.Proposition2.LetΘ(j)={k,k∈T&lct k≤lct j}.Filtering accord-ing to rules(9),(10)is equivalent withﬁltering by the following rule:∀j∈T,∀i∈T\Θ(j):ECTΘ(j)∪{i}>lct j⇒Θ(j)≪i⇒⇒est i:=max est i,ECTΘ(j) (11) Proof.We will prove the equivalence by proving both implications. First,let us prove that the new rule(11)generates all the changes which the original rules(9)and(10)do.Let us consider a setΩ⊆T and an activity i∈T\Ω.Let j be one of the activities fromΩfor which lct j=lctΩ.Thanks to this deﬁnition of j we haveΩ⊆Θ(j)and so(recall the deﬁnition(1)of ECT): estΩ∪{i}+pΩ∪{i}=min{estΩ,est i}+pΩ+p i≤ECTΘ(j)∪{i}ECTΩ≤ECTΘ(j) Thus:when the original rule(9)holds forΩand i,then the new rule (11)holds forΘ(j)and i too,and the change of est i is at least the same as the change by the rule(10).Hence theﬁrst implication is proved.Now we will prove the second implication:ﬁltering according to the new rule(11)will not generate any changes which the old rules(9)and (10)cannot prove too.Let us consider an arbitrary setΩ⊆T.Overload rule(5)says that if the setΩcannot be processed within its time bounds then no solution exists:lctΩ−estΩ<pΩ⇒failIt is useless to continueﬁltering when a fail wasﬁred.Therefore in the following we will assume that the resource is not overloaded.Let us consider a pair of activities i,j for which the new rule(11) holds.We deﬁne a setΩ′as a subset ofΘ(j)∪{i}for which:ECTΘ(j)∪{i}=estΩ′+pΩ′(12) Note that thanks to the deﬁnition(1)of ECT such a setΩ′must exist.If i∈Ω′thenΩ′⊆Θ(j),thereforeestΩ′+pΩ′(12)=ECTΘ(j)∪{i}(11)>lct j≥lctΩ′So the resource is overloaded(see the overload rule(5))and fail should have already beenﬁred.Thus i∈Ω′.Let us deﬁneΩ=Ω′\{i}.We will assume thatΩ=∅, because otherwise est i≥ECTΘ(j)and rule(11)changes nothing.For this setΩwe have:min{estΩ,est i}+pΩ+p i=estΩ′+pΩ′(12)=ECTΘ(j)∪{i}(11)>lct j≥lctΩHence the rule(9)holds for the setΩ.To complete the proof we have to show that both rules(10)and(11)adjust est i equivalently,i.e.ECTΩ= ECTΘ(j).We already know that ECTΩ≤ECTΘ(j)becauseΩ⊆Θ(j). Suppose now for a contradiction thatECTΩ<ECTΘ(j)(13) LetΦbe a setΦ⊆Θ(j)such that:ECTΘ(j)=estΦ+pΦ(14) Therefore:estΩ+pΩ≤ECTΩ(13)<ECTΘ(j)(14)=estΦ+pΦ(15)Because the setΩ′=Ω∪{i}deﬁnes the value of ECTΘ(j)∪{i}(i.e. estΩ′+pΩ′=ECTΘ(j)∪{i}),it has the following property(see the deﬁ-nition(1)of ECT):∀k∈Θ(j)∪{i}:est k≥estΩ′⇒k∈Ω′And becauseΩ=Ω′\{i}:∀k∈Θ(j):est k≥estΩ′⇒k∈Ω(16) Similarly,the setΦdeﬁnes the value of ECTΘ(j):∀k∈Θ(j):est k≥estΦ⇒k∈Φ(17) Combining properties(16)and(17)together we have that eitherΩ⊆Φ(if estΩ′≥estΦ)orΦ⊆Ω(if estΩ′≤estΦ).However,Φ⊆Ωis not possible,because in this case estΦ+pΦ≤ECTΩwhich contradicts the inequality(15).The result is thatΩ Φ,and so pΩ<pΦ.Now we are ready to prove the contradiction:ECTΘ(j)∪{i}(12)==estΩ′+pΩ′=min{estΩ,est i}+pΩ+p i becauseΩ=Ω′\{i}=min{estΩ+pΩ+p i,est i+pΩ+p i}<min{estΦ+pΦ+p i,est i+pΦ+p i}by(15)and pΩ<pΦ≤ECTΘ(j)∪{i}becauseΦ⊆Θ(j)Property1.The rule(11)has a very useful property.Let us consider an activity i and two diﬀerent activities j1and j2for which the rule(11)holds.Moreover let lct j1≤lct j2.ThenΘ(j1)⊆Θ(j2)and soECTΘ(j1)≤ECTΘ(j2),therefore j2yields better propagation then j1.Thus for a given activity i it is suﬃcient to look for the activity j for which(11)holds and lct j is maximum.6.1.Θ-Λ-treeLet us consider the alternative edge-ﬁnding rule(11).We choose an arbitrary activity j and check the rule(11)for each applicable activity i, i.e.we would like toﬁnd all activities i for which the following condition holds:ECTΘ(j)∪{i}>lct jUnfortunately,such an algorithm would be too slow:before the check can be performed,each particular activity i must be added into the Θ-tree,and after the check the activity i have to be removed back from theΘ-tree.The idea how to surpass this problem is to extend theΘ-tree struc-ture in the following way:all applicable activities i will be also included in the tree,but as a gray nodes.A gray node represents an activity i which is not really in the setΘ.However,we are curious what would happen with ECTΘif we are allowed to include one of the gray ac-tivities into the setΘ.More exactly:letΛ⊆T be a set of all gray activities,Λ∩Θ=∅.The purpose of theΘ-Λ-tree is to compute the following value:ECT(Θ,Λ)quickly,we add the following two values into each node of the tree:ECT v=ΣP is maximum sum of processing activities in a subtree if one of gray activities can be used.SimilarlyAn idea how to compute values ECT v for internal node v follows.A gray activity can be used only once:in the left subtree of v or in the right subtree of v.Note that the gray activity used forECT v.The formulae (2)and(4)can be modiﬁed to handle gray nodes:ΣP left(v)+ΣP right(v),ΣP left(v)+ECT v=maxΣP right(v),ΣP=26ΣP=11ΣP=15ΣP a=5ΣP b=9ΣP c=5ΣP d=10ECT andECT we can compute for each node v the gray activity which is responsible forTable II.Time complexities of operations onΘ-Λ-tree.(Θ,Λ):=(∅,∅)O(1)(Θ,Λ):=(T,∅)O(n log n)(Θ,Λ):=(Θ\{i},Λ∪{i})O(log n)Θ:=Θ∪{i}O(log n)Λ:=Λ∪{i}O(log n)Λ:=Λ\{i}O(log n)white nodes are discolored to gray.As soon asECT(Θ,Λ)>lct j do begin13i:=gray activity responsible forECT(Θ,Λ)because otherwise we would end up by fail on line11.During the entire run of the algorithm,the maximum number of iterations of the inner while loop is n,because each iteration removesan activity from the setΛ.Similarly,the number of iterations of the repeat loop is n,because each time an activity is removed from the queue Q.According to table II time complexity of each single line within the loops is O(log n)maximum.Therefore the time complexity of the whole algorithm is O(n log n).Note that at the beginningΘ=T andΛ=∅,hence there are no gray activities and thereforeΣP k=ΣP k for each node k.Hence we can save some time by building the initialΘ-Λ-tree as a“normal”Θ-tree.7.Optional ActivitiesNowadays,many practical scheduling problems have to deal with alter-natives–activities which can choose their resource,or activities which exist only if a particular alternative of processing is chosen.From the resource point of view,it is not yet decided whether such activities will be processed or not.Therefore we will call such activities optional.For an optional activity,we would like to speculate what would happen if the activity actually would be processed by the resource.Traditionally,resource constraints are not designed to handle op-tional activities properly.However,several diﬀerent modiﬁcations are used to model them:Dummy activities.It is basically a workaround for constraint solvers which do not allow to add more activities on the resource dur-ing problem solving(i.e.resource constraint is not dynamic[3]).Processing times of activities are changed from constants to do-main variables.Several“dummy”activities with possible process-ing times 0,∞)are added on the resource as a reserve for later activity addition.Filtering algorithms work as usual,but they use minimal possible processing time instead of original constant processing time.Note that dummy activities have no inﬂuence on other activities on the resource,because their processing time can be zero.Once an alternative is chosen,a dummy activity is turned into a regular activity(i.e.minimal processing time is no longer zero).The main disadvantage of this approach is that an impossibility of a particular alternative cannot be found before that alternative is actually tried.Filtering of options.The idea is to run aﬁltering algorithm several times,each time with one of the optional activities added on the re-source.When a fail is found,then the optional activity is rejected.Otherwise time bounds of the optional activity can be adjusted.[4]introduces so called PEX-edge-ﬁnding with time complexityO(n3).This is a pretty strong propagation,however rather time consuming.Modiﬁedﬁltering algorithms.Regular and optional activities are treated diﬀerently:optional activities do not inﬂuence any other activity on the resource,however regular activities inﬂuence other regular activities and also optional activities[6].Most of theﬁl-tering algorithms can be modiﬁed this way without changing their time complexities.However,this approach is a little bit weaker than the previous one,because the previous one also checked whether the addition of a optional activity would not cause an immediate fail.Cumulative resources.If we have a set of similar alternative ma-chines,this set can be modeled as a cumulative resource.This additional(redundant)constraint can improve the propagation before activities are distributed between the machines.There is also a specialﬁltering algorithm[13]designed to handle this type of alternatives.To handle optional activities we extend each activity i by a variable called existence i with the domain{true,false}.When existence i=true then i is a regular activity,when existence i∈{true,false}then i is an optional activity.Finally when existence=false we simply exclude this activity from all our considerations.To make the notation concerning optional activities easy,let R be the set of all regular activities and O the set of all optional activities.For optional activities,we would like to consider the following is-sues:1.If an optional activity should be processed by the resource(i.e.ifan optional activity is changed to a regular activity),would the resource be overloaded?The resource is overloaded if there is sucha setΩ⊆R that:estΩ+pΩ>lctΩCertainly,if a resource is overloaded then the problem has no solution.Hence if an addition of a optional activity i results in overloading then we can conclude that existence i=false.2.If the addition of an optional activity i does not result in overload-ing,what is the earliest possible start time and the latest possible completion time of the activity i with respect to regular activities on the resource?We would like to apply usualﬁltering algorithms for the activity i,however the activity i cannot cause change of any regular activity.3.If we add an optional activity i,will theﬁrst run of aﬁltering algo-rithm result in a fail?For example algorithm detectable precedences can increase est k of some activity k so much that est k+p k>lct k.In that case we can also propagate existence i=false.We will consider the item1in the next section“Overload Checking with Optional Activities”.Items2and3are discussed in section“Filtering with Optional Activities”.8.Overload Checking with Optional ActivitiesIn this section we present modiﬁed overload checking algorithm which can handle optional activities.Basically,the original overload rule(5) remains valid,however we must consider regular activities R only:∀Ω⊆R:(lctΩ−estΩ<pΩ⇒fail)In section3we showed that this rule is equivalent with:∀j∈R: ECTΘ(j)>lct j⇒fail (18) whereΘ(j)is:Θ(j)={k,k∈R&lct k≤lct j}Let us now take into account an optional activity o∈O.If the processing of this activity would result in overloading,then the activity can never be processed by the resource:∀o∈O,∀Ω⊆R:estΩ∪{o}+pΩ∪{o}>lctΩ∪{o}⇒existence o:=false (19) Let the setΛ(j)be deﬁned in the following way:Λ(j)={o,o∈O&lct o≤lct j}The rule(19)is applicable if and only if there is such an activity j∈T such thatECT(Θ(j),Λ(j))can be excluded from the resource.The following algorithm detects overloading,it also deletes all op-tional activities k such that an addition of this activity k alone causes an overload.Of course,a combination of several optional activities may still cause an overload.1(Θ,Λ):=(∅,∅);2for i∈T in ascending order5of lct i do begin3i f i is a optional activity then4Λ:=Λ∪{i};5else begin6Θ:=Θ∪{i};7i f ECTΘ>lct i then8f a i l;{No solution exists}9whileECT(Θ,Λ);11existence k:=false;12Λ:=Λ\{k};13end;14end;15end;The time complexity of the algorithm is again O(n log n).The inner while loop is repeated n times maximum because each time an activity is removed from the setΛ.The outer for loop has also n iterations, time complexity of each single line is O(log n)maximum(see the table II).9.Filtering with Optional ActivitiesThe following section is an example how to extend a certain class of ﬁltering algorithms to handle optional activities.The idea is simple: if the original algorithm usesΘ-tree,the modiﬁed algorithm usesΘ-Λ-tree instead.Optional activities are represented by gray nodes of the tree.For regular propagation,value ECTΘis used the same way as before.However,alsoECT(Θ,Λ)would result in an immediate fail then the optional activity responsible for that is excluded from the resource.Let us demonstrate this idea on the detectable precedences algo-rithm:1(Θ,Λ):=(∅,∅);2Q:=queue of all activities j∈T in ascending order of lct j−p j;3for i∈T in ascending order of est i+p i do begin4while est i+p i>lct Q.ﬁrst−p Q.ﬁrst do begin5i f i is a regular activity then6Θ:=Θ∪{Q.ﬁrst};7else8Λ:=Λ∪{Q.ﬁrst};。

does not perform short-circuit evaluation

does not perform short-circuit evaluation全文共四篇示例，供读者参考第一篇示例：在计算机编程中，短路求值（short-circuit evaluation）是一种优化技术，用于在逻辑表达式中只在必要时才评估后续的表达式。

这种技术可以提高程序的性能，并且可以避免不必要的计算。

并非所有的编程语言都支持短路求值。

有些语言中的逻辑运算符会对所有的表达式进行评估，而不管前面的表达式的结果是什么。

这种方式被称为“不执行短路求值”（does not perform short-circuit evaluation）。

在一些编程语言中，当使用逻辑运算符“&&”进行逻辑与操作时，如果第一个表达式的结果为假，那么第二个表达式将不会被执行。

这种情况下就是短路求值的应用。

但在不执行短路求值的情况下，无论第一个表达式的结果是什么，第二个表达式都会被执行。

不执行短路求值可能会导致程序的性能问题和逻辑错误。

因为不必要的表达式计算会增加程序的运行时间，同时可能会产生意料之外的结果。

所以在编写程序时，我们应该谨慎选择使用支持短路求值的编程语言，以提高程序的性能和稳定性。

不执行短路求值还会影响到程序的可读性和维护性。

在逻辑表达式中，如果所有的表达式都被执行，那么代码可能会变得冗长和复杂，使得程序难以理解和调试。

相比之下，短路求值可以使代码更加简洁和清晰，减少不必要的嵌套和重复。

尽管不执行短路求值是一种合法的编程方式，但在实际开发中，我们应该尽量避免使用不支持短路求值的编程语言。

通过选择支持短路求值的语言来编写程序，可以提高程序的性能和可维护性，同时更加符合编程规范和标准。

希望开发者们在日常的编程工作中，能够认识到短路求值的重要性，并加以合理运用。

【文章仅供参考】。

第二篇示例：在编程的世界里，我们经常会遇到逻辑运算，其中一个重要的概念就是短路评估。

短路评估是指在逻辑表达式中，如果首个操作数已经能够确定整个表达式的结果，那么就不再计算后续的操作数。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

w(n) = R?1 (n)rdx (n) xx
where
(2)
n?k d(k)x(k):
Rxx(n) =
n X k=1
n?k x(k)xt (k);
rdx (n)
n X k=1
(3)
From (2), a recurrence relation for the adaptive lter coe cients can be obtained 1]:
E xq(k)xq (k + m)] = E x(k)x(k + m)] + E xq(k)y(k + m)] = E x(k)y(k + m)]
2
(m);
(5) (6)
where (m) is the Kronecker delta and 2 is the amount of bias introduced in the zero-lag value of the autocorrelation function and y (k) is a stationary signal which may or may not be di erent from x(k). A detailed analysis on the properties of the quantized correlators can be found in 2]. It has been shown that very reliable estimates for the correlations can be obtained by using as low as 3 to 4 bits in the quantizers. Multiplications by the samples of the roughly quantized input signal can be realized by simple shifts and additions. For example, multiplication by a 3-bit quantized number requires at most 2 shifts and 2 additions. This gure can be further reduced by building simple dedicated circuitry. This implies that if we use xq (k) in the computation of k(n + 1) we can decrease the computational complexity associated with existing fast RLS algorithms by converting multiplications with xq (k) to shifts and adds. To derive di erent versions of fast RLS algorithms corresponding to the use of roughly quantized input signal, we merely replace input signal x(n) by xq (n) in the computation of adaptation gain. Denoting the adaptation gain obtained in this way as kq (n) and by using (5-6) it can be easily shown that
Abstract
1
Fast RLS Algorithms Running On Roughly Quantized Signals
Murat Belge and Orhan Ar kan Bilkent University, 06533 Ankara, TURKEY
1. Introduction
3. Fast RLS Adaptation Framework
2
The basic idea behind the proposed adaptation framework is to use a roughly quantized auxiliary input signal. By a roughly quantized signal, we mean a discrete time sequence which is quantized to at most 4 bits (16 levels). This signal has the special property that its correlation function is exactly the same as that of the original input signal, except at the zeroth lag. The proposed adaptive ltering con guration is schematically represented in Fig. 1 . Here, xq (k) is a roughly quantized signal which is obtained by the quantization of original input signal plus a zero mean i.i.d. random reference signal, (n), which is uniformly distributed within one quantization step. If no clipping occurs at the quantizer, the following equations are satis ed 2]
w(n + 1) = w(n) + k(n + 1) d(n + 1) ? wt (n)x(n + 1)
hiBiblioteka (4)where k(n) = R?1 (n)x(n) is commonly known as the adaptation gain 1]. Computation of optimal adaptive xx lter coe cients requires the computation of the adaptation gain at each time instant. FKA algorithm requires 8N operations to update k(n) while FAEST or FTF algorithms require 5N operations. In FAEST algorithm, an alternative de nition of the adaptation gain, k (n + 1) = R?1 (n)x(n + 1) is employed to further reduce the xx computational complexity.
Fast RLS Algorithms Running On Roughly Quantized Signals
Murat Belge and Orhan Ar kan** Electrical and Computer Engineering Department 409 Dana Building Northeastern University, Boston MA 02115 Phone: 617 - 373 8783 E-mail: belge@ **Department of Electrical and Electronics Engineering, Bilkent University, 06533 Ankara, TURKEY. Fax: + 90 - 312 - 2664127 Phone: + 90 - 312 - 2664307 E-mail: oarikan@.tr Edics: SPL.SP.2.6
A framework for obtaining fast RLS algorithms which use a roughly quantized auxiliary input signal for e ciently updating the required adaptation gain vector is presented. The new adaptation procedure is very similar to the conventional fast RLS adaptation. Analytically, it is shown that the lter weights are found by solving almost the same normal equations corresponding to the RLS case. Obtained results are in good agreement with those of the fast RLS algorithms with a signi cant reduction in the number of multiplications.
2. Fast Recursive Least Squares Adaptation
Jw (n) =
n X k=1 n?k
d(k) ? wt (n)x(k)
2
(1)
where d(k) is the desired signal at time k, w(n) is the vector of optimal lter coe cients at time n, x(k) = x(k) x(k ? 1) : : :x(k ? N +1)]t is the vector which contains N most recent samples of input signal and 0 < 1 is an exponential forgetting factor. Minimization of (1) yields the following least squares solution for w(n)