MIT6_041SCF13_quiz02_f09 (1)

合集下载

MIT6_042JS10_lec09_sol

So we have proved P (n + 1).
Problem 2. (a) Prove by induction that a 2n × 2n courtyard with a 1 × 1 statue of Bill in a corner can be covered with L-shaped tiles. (Do not assume or reprove the (stronger) result of Theorem 6.1.2 that Bill can be placed anywhere. The point of this problem is to show a different induction hypothesis that works.) Solution. Let P (n) be the proposition Bill can be placed in a corner of a 2n × 2n courtyard with a proper tiling of the remainder with L-shaped tiles.
Solutions to In-Class Problems Week 4, Mon.
Problem 1. Prove by induction: 1+ for all n > 1. Solution. Proof. (By Induction). The induction hypothesis is P (n) is the inequality (1). Base Case: (n = 2). The LHS of(1) in this case is 1 + 1/4 and the RHS is 2 − 1/2. Since LHS = 5/4 < 6/4 = 3/2 = RHS, inequality (1) holds, and P (2) is proved. Inductive Step: Let n be any natural number greater than 1, and assume P (n) in order to prove P (n + 1). That is, we assume (1) Adding 1/(n + 1)2 to both sides of this inequality yields 1 1 1 1 + + ··· + 2 < 2 − , 4 9 n n (1)

基于MIT的信息系统软件测试方法

基于MIT的信息系统软件测试方法软件测试在信息系统开发中起着至关重要的作用，在保证软件质量和稳定性方面具有不可替代的作用。

在软件开发过程中，测试环节是必不可少的一环，而则是一种被广泛应用的有效测试方法。

本文将深入探讨基于MIT 的信息系统软件测试方法，旨在为软件测试领域的研究与实践提供更深入的了解和指导。

一、信息系统软件测试的重要性信息系统软件作为现代社会中不可或缺的一部分，承担着数据管理、业务流程优化、决策支持等重要功能，其质量和稳定性直接影响到企业的运营效率和竞争力。

软件测试作为保证软件质量的关键环节，能够有效减少软件缺陷带来的风险和损失，确保软件系统的正常运行。

因此，信息系统软件测试的重要性不言而喻。

二、基于MIT的信息系统软件测试方法概述MIT（Massachusetts Institute of Technology）作为全球著名的科研机构之一，在软件测试领域也有着丰富的研究经验和成果。

基于MIT的信息系统软件测试方法是MIT在长期实践和研究的基础上形成的一种系统化的测试方法，具有科学性和实用性的特点。

此方法主要包括测试计划制定、测试用例设计、测试执行与分析等环节，旨在全面评估软件系统的性能和功能是否符合需求。

三、基于MIT的信息系统软件测试方法的优势1.科学性：基于MIT的信息系统软件测试方法基于科学原理和实验验证，具有较高的科学性和可靠性。

2.全面性：该方法包含了从测试计划制定到测试执行与分析的全流程测试环节，能够全面评估软件系统的性能和功能。

3.灵活性：基于MIT的信息系统软件测试方法不仅适用于各类软件系统的测试，而且能够根据具体需求进行灵活调整和优化。

4.实用性：此方法在实际软件测试中得到了广泛应用和验证，具有较强的实用性和可操作性。

四、基于MIT的信息系统软件测试方法的关键环节1.测试计划制定：在软件测试前，需要根据软件系统的需求和规范制定相应的测试计划，明确测试的目标和范围。

麻省理工MIT(微观经济学)lec01_02_orderings and utility representation

14.123 Lectures 1-2, Page 3
Introduction
• Economics is about explaining and predicting choice. • It is assumed that economic agents choose their most desirable alternative among the set of feasible ones. – Interpret it “as if”, not necessarily “deliberate”. – “This morning I took the shuttle to MIT because this was the best possible way to come in.” Discuss. • Desirability is represented by preferences and/or utility. – Attitudes may be expressed over outcomes never experienced (Would you prefer to be Superman or Spiderman?).
14.123 Lectures 1-2, Page 6
Utility Representation
• DEF. Utility fcn u : X → represents if u(x) ≥ u(y) x y. • THM: If u represents , then is complete and transitive. ■ Follows from the same properties of ≥ on real numbers. ■ • THM: If X is finite and is complete and transitive, then there exists a utility function that represents . ■ u(x) = |{y∈X : xy}|: # of alternatives that x beats weakly.■ • THM: If X is countable and is complete and transitive, then there is a utility function with a bounded range that represents . ■ X ≡ {x1,x2,…}. Let u(x-1)=0, u(x0)=1. For all n=1,2,…, set u(xn) = [max{u(xk)|xnxk,n>k} + min{u(xk)|xkxn,n>k}]/2 .■

Advanced Micro Devices, Inc.

To appear in ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling in Linked Data Structures and its Application to Multi-Chain PrefetchingSEUNGRYUL CHOIUniversity of Maryland,College ParkNICHOLAS KOHOUTEVI Technology LLC.SUMIT PAMNANIAdvanced Micro Devices,Inc.andDONGKEUN KIM and DONALD YEUNGUniversity of Maryland,College ParkThis research was supported in part by NSF Computer Systems Architecture grant CCR-0093110, and in part by NSF CAREER Award CCR-0000988.Author’s address:Seungryul Choi,University of Maryland,Department of Computer Science, College Park,MD20742.Permission to make digital/hard copy of all or part of this material without fee for personal or classroom use provided that the copies are not made or distributed for proﬁt or commercial advantage,the ACM copyright/server notice,the title of the publication,and its date appear,and notice is given that copying is by permission of the ACM,Inc.To copy otherwise,to republish, to post on servers,or to redistribute to lists requires prior speciﬁc permission and/or a fee.c 2001ACM1529-3785/2001/0700-0001$5.00ACM Transactions on Computer Systems2·Seungryul Choi et al.Pointer-chasing applications tend to traverse composite data structures consisting of multiple independent pointer chains.While the traversal of any single pointer chain leads to the seri-alization of memory operations,the traversal of independent pointer chains provides a source of memory parallelism.This article investigates exploiting such inter-chain memory parallelism for the purpose of memory latency tolerance,using a technique called multi-chain prefetching. Previous works[Roth et al.1998;Roth and Sohi1999]have proposed prefetching simple pointer-based structures in a multi-chain fashion.However,our work enables multi-chain prefetching for arbitrary data structures composed of lists,trees,and arrays.This article makesﬁve contributions in the context of multi-chain prefetching.First,we intro-duce a framework for compactly describing LDS traversals,providing the data layout and traversal code work information necessary for prefetching.Second,we present an oﬀ-line scheduling algo-rithm for computing a prefetch schedule from the LDS descriptors that overlaps serialized cache misses across separate pointer-chain traversals.Our analysis focuses on static traversals.We also propose using speculation to identify independent pointer chains in dynamic traversals.Third,we propose a hardware prefetch engine that traverses pointer-based data structures and overlaps mul-tiple pointer chains according to the computed prefetch schedule.Fourth,we present a compiler that extracts LDS descriptors via static analysis of the application source code,thus automating multi-chain prefetching.Finally,we conduct an experimental evaluation of compiler-instrumented multi-chain prefetching and compare it against jump pointer prefetching[Luk and Mowry1996], prefetch arrays[Karlsson et al.2000],and predictor-directed stream buﬀers(PSB)[Sherwood et al. 2000].Our results show compiler-instrumented multi-chain prefetching improves execution time by 40%across six pointer-chasing kernels from the Olden benchmark suite[Rogers et al.1995],and by3%across four pared to jump pointer prefetching and prefetch arrays,multi-chain prefetching achieves34%and11%higher performance for the selected Olden and SPECint2000benchmarks,pared to PSB,multi-chain prefetching achieves 27%higher performance for the selected Olden benchmarks,but PSB outperforms multi-chain prefetching by0.2%for the selected SPECint2000benchmarks.An ideal PSB with an inﬁnite markov predictor achieves comparable performance to multi-chain prefetching,coming within6% across all benchmarks.Finally,speculation can enable multi-chain prefetching for some dynamic traversal codes,but our technique loses its eﬀectiveness when the pointer-chain traversal order is highly dynamic.Categories and Subject Descriptors:B.8.2[Performance and Reliability]:Performance Anal-ysis and Design Aids;B.3.2[Memory Structures]:Design Styles—Cache Memories;C.0[Gen-eral]:Modeling of computer architecture;System Architectures; C.4[Performance of Sys-tems]:Design Studies;D.3.4[Programming Languages]:Processors—CompilersGeneral Terms:Design,Experimentation,PerformanceAdditional Key Words and Phrases:Data Prefetching,Memory parallelism,Pointer Chasing CodeA General Framework for Prefetch Scheduling·3performance platforms.The use of LDSs will likely have a negative impact on memory performance, making many non-numeric applications severely memory-bound on future systems. LDSs can be very large owing to their dynamic heap construction.Consequently, the working sets of codes that use LDSs can easily grow too large toﬁt in the processor’s cache.In addition,logically adjacent nodes in an LDS may not reside physically close in memory.As a result,traversal of an LDS may lack spatial locality,and thus may not beneﬁt from large cache blocks.The sparse memory access nature of LDS traversal also reduces the eﬀective size of the cache,further increasing cache misses.In the past,researchers have used prefetching to address the performance bot-tlenecks of memory-bound applications.Several techniques have been proposed, including software prefetching techniques[Callahan et al.1991;Klaiber and Levy 1991;Mowry1998;Mowry and Gupta1991],hardware prefetching techniques[Chen and Baer1995;Fu et al.1992;Jouppi1990;Palacharla and Kessler1994],or hybrid techniques[Chen1995;cker Chiueh1994;Temam1996].While such conventional prefetching techniques are highly eﬀective for applications that employ regular data structures(e.g.arrays),these techniques are far less successful for non-numeric ap-plications that make heavy use of LDSs due to memory serialization eﬀects known as the pointer chasing problem.The memory operations performed for array traver-sal can issue in parallel because individual array elements can be referenced inde-pendently.In contrast,the memory operations performed for LDS traversal must dereference a series of pointers,a purely sequential operation.The lack of memory parallelism during LDS traversal prevents conventional prefetching techniques from overlapping cache misses suﬀered along a pointer chain.Recently,researchers have begun investigating prefetching techniques designed for LDS traversals.These new LDS prefetching techniques address the pointer-chasing problem using several diﬀerent approaches.Stateless techniques[Luk and Mowry1996;Mehrotra and Harrison1996;Roth et al.1998;Yang and Lebeck2000] prefetch pointer chains sequentially using only the natural pointers belonging to the LDS.Existing stateless techniques do not exploit any memory parallelism at all,or they exploit only limited amounts of memory parallelism.Consequently,they lose their eﬀectiveness when the LDS traversal code contains insuﬃcient work to hide the serialized memory latency[Luk and Mowry1996].A second approach[Karlsson et al.2000;Luk and Mowry1996;Roth and Sohi1999],which we call jump pointer techniques,inserts additional pointers into the LDS to connect non-consecutive link elements.These“jump pointers”allow prefetch instructions to name link elements further down the pointer chain without sequentially traversing the intermediate links,thus creating memory parallelism along a single chain of pointers.Because they create memory parallelism using jump pointers,jump pointer techniques tolerate pointer-chasing cache misses even when the traversal loops contain insuﬃcient work to hide the serialized memory latency.However,jump pointer techniques cannot commence prefetching until the jump pointers have been installed.Furthermore,the jump pointer installation code increases execution time,and the jump pointers themselves contribute additional cache misses.ACM Transactions on Computer Systems4·Seungryul Choi et al.Finally,a third approach consists of prediction-based techniques[Joseph and Grunwald1997;Sherwood et al.2000;Stoutchinin et al.2001].These techniques perform prefetching by predicting the cache-miss address stream,for example us-ing hardware predictors[Joseph and Grunwald1997;Sherwood et al.2000].Early hardware predictors were capable of following striding streams only,but more re-cently,correlation[Charney and Reeves1995]and markov[Joseph and Grunwald 1997]predictors have been proposed that can follow arbitrary streams,thus en-abling prefetching for LDS traversals.Because predictors need not traverse program data structures to generate the prefetch addresses,they avoid the pointer-chasing problem altogether.In addition,for hardware prediction,the techniques are com-pletely transparent since they require no support from the programmer or compiler. However,prediction-based techniques lose their eﬀectiveness when the cache-miss address stream is unpredictable.This article investigates exploiting the natural memory parallelism that exists between independent serialized pointer-chasing traversals,or inter-chain memory parallelism.Our approach,called multi-chain prefetching,issues prefetches along a single chain of pointers sequentially,but aggressively pursues multiple independent pointer chains simultaneously whenever possible.Due to its aggressive exploitation of inter-chain memory parallelism,multi-chain prefetching can tolerate serialized memory latency even when LDS traversal loops have very little work;hence,it can achieve higher performance than previous stateless techniques.Furthermore,multi-chain prefetching does not use jump pointers.As a result,it does not suﬀer the overheads associated with creating and managing jump pointer state.Andﬁnally, multi-chain prefetching is an execution-based technique,so it is eﬀective even for programs that exhibit unpredictable cache-miss address streams.The idea of overlapping chained prefetches,which is fundamental to multi-chain prefetching,is not new:both Cooperative Chain Jumping[Roth and Sohi1999]and Dependence-Based Prefetching[Roth et al.1998]already demonstrate that simple “backbone and rib”structures can be prefetched in a multi-chain fashion.However, our work pushes this basic idea to its logical limit,enabling multi-chain prefetching for arbitrary data structures(our approach can exploit inter-chain memory paral-lelism for any data structure composed of lists,trees,and arrays).Furthermore, previous chained prefetching techniques issue prefetches in a greedy fashion.In con-trast,our work provides a formal and systematic method for scheduling prefetches that controls the timing of chained prefetches.By controlling prefetch arrival, multi-chain prefetching can reduce both early and late prefetches which degrade performance compared to previous chained prefetching techniques.In this article,we build upon our original work in multi-chain prefetching[Kohout et al.2001],and make the following contributions:(1)We present an LDS descriptor framework for specifying static LDS traversalsin a compact fashion.Our LDS descriptors contain data layout information and traversal code work information necessary for prefetching.(2)We develop an oﬀ-line algorithm for computing an exact prefetch schedulefrom the LDS descriptors that overlaps serialized cache misses across separate pointer-chain traversals.Our algorithm handles static LDS traversals involving either loops or recursion.Furthermore,our algorithm computes a schedule even ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling·5when the extent of dynamic data structures is unknown.To handle dynamic LDS traversals,we propose using speculation.However,our technique cannot handle codes in which the pointer-chain traversals are highly dynamic.(3)We present the design of a programmable prefetch engine that performs LDStraversal outside of the main CPU,and prefetches the LDS data using our LDS descriptors and the prefetch schedule computed by our scheduling algorithm.We also perform a detailed analysis of the hardware cost of our prefetch engine.(4)We introduce algorithms for extracting LDS descriptors from application sourcecode via static analysis,and implement them in a prototype compiler using the SUIF framework[Hall et al.1996].Our prototype compiler is capable of ex-tracting all the program-level information necessary for multi-chain prefetching fully automatically.(5)Finally,we conduct an experimental evaluation of multi-chain prefetching us-ing several pointer-intensive applications.Our evaluation compares compiler-instrumented multi-chain prefetching against jump pointer prefetching[Luk and Mowry1996;Roth and Sohi1999]and prefetch arrays[Karlsson et al.2000], two jump pointer techniques,as well as predictor-directed stream buﬀers[Sher-wood et al.2000],an all-hardware prediction-based technique.We also inves-tigate the impact of early prefetch arrival on prefetching performance,and we compare compiler-and manually-instrumented multi-chain prefetching to eval-uate the quality of the instrumentation generated by our compiler.In addition, we characterize the sensitivity of our technique to varying hardware stly,we undertake a preliminary evaluation of speculative multi-chain prefetching to demonstrate its potential in enabling multi-chain prefetching for dynamic LDS traversals.The rest of this article is organized as follows.Section2further explains the essence of multi-chain prefetching.Then,Section3introduces our LDS descriptor framework.Next,Section4describes our scheduling algorithm,Section5discusses our prefetch engine,and Section6presents our compiler for automating multi-chain prefetching.After presenting all our algorithms and techniques,Sections7and8 then report on our experimental methodology and evaluation,respectively.Finally, Section9discusses related work,and Section10concludes the article.2.MULTI-CHAIN PREFETCHINGThis section provides an overview of our multi-chain prefetching technique.Sec-tion2.1presents the idea of exploiting inter-chain memory parallelism.Then, Section2.2discusses the identiﬁcation of independent pointer chain traversals. 2.1Exploiting Inter-Chain Memory ParallelismThe multi-chain prefetching technique augments a commodity microprocessor with a programmable hardware prefetch engine.During an LDS computation,the prefetch engine performs its own traversal of the LDS in front of the processor,thus prefetching the LDS data.The prefetch engine,however,is capable of traversing multiple pointer chains simultaneously when permitted by the application.Conse-quently,the prefetch engine can tolerate serialized memory latency by overlapping cache misses across independent pointer-chain traversals.ACM Transactions on Computer Systems6·Seungryul Choi et al.<compute>ptr = A[i];ptr = ptr->next;while (ptr) {for (i=0; i < N; i++) {a)b)}<compute>ptr = ptr->next;while (ptr) {}}PD = 2INIT(ID ll);stall stall stallINIT(ID aol);stall stallFig.1.Traversing pointer chains using a prefetch engine.a).Traversal of a single linked list.b).Traversal of an array of lists data structure.To illustrate the idea of exploiting inter-chain memory parallelism,weﬁrst de-scribe how our prefetch engine traverses a single chain of pointers.Figure1a shows a loop that traverses a linked list of length three.Each loop iteration,denoted by a hashed box,contains w1cycles of work.Before entering the loop,the processor ex-ecutes a prefetch directive,INIT(ID ll),instructing the prefetch engine to initiate traversal of the linked list identiﬁed by the ID ll label.If all three link nodes suﬀer an l-cycle cache miss,the linked list traversal requires3l cycles since the link nodes must be fetched sequentially.Assuming l>w1,the loop alone contains insuﬃcient work to hide the serialized memory latency.As a result,the processor stalls for 3l−2w1cycles.To hide these stalls,the prefetch engine would have to initiate its linked list traversal3l−2w1cycles before the processor traversal.For this reason, we call this delay the pre-traversal time(P T).While a single pointer chain traversal does not provide much opportunity for latency tolerance,pointer chasing computations typically traverse many pointer chains,each of which is often independent.To illustrate how our prefetch engine exploits such independent pointer-chasing traversals,Figure1b shows a doubly nested loop that traverses an array of lists data structure.The outer loop,denoted by a shaded box with w2cycles of work,traverses an array that extracts a head pointer for the inner loop.The inner loop is identical to the loop in Figure1a.In Figure1b,the processor again executes a prefetch directive,INIT(ID aol), causing the prefetch engine to initiate a traversal of the array of lists data structure identiﬁed by the ID aol label.As in Figure1a,theﬁrst linked list is traversed sequentially,and the processor stalls since there is insuﬃcient work to hide the serialized cache misses.However,the prefetch engine then initiates the traversal of subsequent linked lists in a pipelined fashion.If the prefetch engine starts a new traversal every w2cycles,then each linked list traversal will initiate the required P T cycles in advance,thus hiding the excess serialized memory latency across multiple outer loop iterations.The number of outer loop iterations required to overlap each linked list traversal is called the prefetch distance(P D).Notice when P D>1, ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling·7 the traversals of separate chains overlap,exposing inter-chain memory parallelism despite the fact that each chain is fetched serially.2.2Finding Independent Pointer-Chain TraversalsIn order to exploit inter-chain memory parallelism,it is necessary to identify mul-tiple independent pointer chains so that our prefetch engine can traverse them in parallel and overlap their cache misses,as illustrated in Figure1.An important question is whether such independent pointer-chain traversals can be easily identi-ﬁed.Many applications perform traversals of linked data structures in which the or-der of link node traversal does not depend on runtime data.We call these static traversals.The traversal order of link nodes in a static traversal can be determined a priori via analysis of the code,thus identifying the independent pointer-chain traversals at compile time.In this paper,we present an LDS descriptor frame-work that compactly expresses the LDS traversal order for static traversals.The descriptors in our framework also contain the data layout information used by our prefetch engine to generate the sequence of load and prefetch addresses necessary to perform the LDS traversal at runtime.While compile-time analysis of the code can identify independent pointer chains for static traversals,the same approach does not work for dynamic traversals.In dynamic traversals,the order of pointer-chain traversal is determined at runtime. Consequently,the simultaneous prefetching of independent pointer chains is limited since the chains to prefetch are not known until the traversal order is computed, which may be too late to enable inter-chain overlap.For dynamic traversals,it may be possible to speculate the order of pointer-chain traversal if the order is pre-dictable.In this paper,we focus on static LDS ter in Section8.7,we illustrate the potential for predicting pointer-chain traversal order in dynamic LDS traversals by extending our basic multi-chain prefetching technique with specula-tion.3.LDS DESCRIPTOR FRAMEWORKHaving provided an overview of multi-chain prefetching,we now explore the al-gorithms and hardware underlying its implementation.We begin by introducing a general framework for compactly representing static LDS traversals,which we call the LDS descriptor framework.This framework allows compilers(and pro-grammers)to compactly specify two types of information related to LDS traversal: data structure layout,and traversal code work.The former captures memory refer-ence dependences that occur in an LDS traversal,thus identifying pointer-chasing chains,while the latter quantiﬁes the amount of computation performed as an LDS is traversed.After presenting the LDS descriptor framework,subsequent sections of this article will show how the information provided by the framework is used to perform multi-chain prefetching(Sections4and5),and how the LDS descriptors themselves can be extracted by a compiler(Section6).3.1Data Structure Layout InformationData structure layout is speciﬁed using two descriptors,one for arrays and one for linked lists.Figure2presents each descriptor along with a traversal code exampleACM Transactions on Computer Systems8·Seungryul Choi etal.a).b).Bfor (i = 0 ; i < N ; i++) {... = data[i].value;}for (ptr = root ; ptr != NULL; ) { ptr = ptr->next;}Fig.2.Two LDS descriptors used to specify data layout information.a).Array descriptor.b).Linked list descriptor.Each descriptor appears inside a box,and is accompanied by a traversal code example and an illustration of the data structure.and an illustration of the traversed data structure.The array descriptor,shown in Figure 2a,contains three parameters:base (B ),length (L ),and stride (S ).These parameters specify the base address of the array,the number of array elements traversed by the application code,and the stride between consecutive memory ref-erences,respectively.The array descriptor speciﬁes the memory address stream emitted by the processor during a constant-stride array traversal.Figure 2b illus-trates the linked list descriptor which contains three parameters similar to the array descriptor.For the linked list descriptor,the B parameter speciﬁes the root pointer of the list,the L parameter speciﬁes the number of link elements traversed by the application code,and the ∗S parameter speciﬁes the oﬀset from each link element address where the “next”pointer is located.The linked list descriptor speciﬁes the memory address stream emitted by the processor during a linked list traversal.To specify the layout of complex data structures,our framework permits descrip-tor composition.Descriptor composition is represented as a directed graph whose nodes are array or linked list descriptors,and whose edges denote address generation dependences.Two types of composition are allowed.The ﬁrst type of composition is nested composition .In nested composition,each address generated by an outer descriptor forms the B parameter for multiple instantiations of a dependent inner descriptor.An oﬀset parameter,O ,is speciﬁed in place of the inner descriptor’s B parameter to shift its base address by a constant oﬀset.Such nested descriptors cap-ture the memory reference streams of nested loops that traverse multi-dimensional data structures.Figure 3presents several nested descriptors,showing a traversal code example and an illustration of the traversed multi-dimensional data structure along with each nested descriptor.Figure 3a shows the traversal of an array of structures,each structure itself containing an array.The code example’s outer loop traverses the array “node,”ac-cessing the ﬁeld “value”from each traversed structure,and the inner loop traverses ACM Transactions on Computer SystemsA General Framework for Prefetch Scheduling·9a).b).c).for (i = 0 ; i < L 0 ; i++) {... = node[i].value;for (j = 0 ; j < L 1 ; j++) {... = node[i].data[j];}}for (i = 0 ; i < L 0 ; i++) {down = node[i].pointer;for (j = 0 ; j < L 1 ; j++) {... = down->data[j];}}node for (i = 0 ; i < L 0 ; i++) {for (j = 0 ; j < L 1 ; j++) {... = node[i].data[j];}down = node[i].pointer;for (j = 0 ; j < L 2 ; j++) {... = down->data[j];}}node Fig.3.Nested descriptor composition.a).Nesting without indirection.b).Nesting with indirection.c).Nesting multiple descriptors.Each descriptor composition appears inside a box,and is accompanied by a traversal code example and an illustration of the composite data structure.each embedded array “data.”The outer and inner array descriptors,(B,L 0,S 0)and (O 1,L 1,S 1),represent the address streams produced by the outer and inner loop traversals,respectively.(In the inner descriptor,“O 1”speciﬁes the oﬀset of each inner array from the top of each structure).Figure 3b illustrates another form of descriptor nesting in which indirection is used between nested descriptors.The data structure in Figure 3b is similar to the one in Figure 3a,except the in-ner arrays are allocated separately,and a ﬁeld from each outer array structure,“node[i].pointer,”points to a structure containing the inner array.Hence,as shown in the code example from Figure 3b,traversal of the inner array requires indirect-ing through the outer array’s pointer to compute the inner array’s base address.In our framework,this indirection is denoted by placing a “*”in front of the inner descriptor.Figure 3c,our last nested descriptor example,illustrates the nestingACM Transactions on Computer Systems10·Seungryul Choi et al.main() { foo(root, depth_limit);}foo(node, depth) { depth = depth - 1; if (depth == 0 || node == NULL)return;foo(node->child[0], depth);foo(node->child[1], depth);foo(node->child[2], depth);}Fig.4.Recursive descriptor composition.The recursive descriptor appears inside a box,and is accompanied by a traversal code example and an illustration of the tree data structure.of multiple inner descriptors underneath a single outer descriptor to represent the address stream produced by nested distributed loops.The code example from Fig-ure 3c shows the two inner loops from Figures 3a-b nested in a distributed fashion inside a common outer loop.In our framework,each one of the multiple inner array descriptors represents the address stream for a single distributed loop,with the order of address generation proceeding from the leftmost to rightmost inner descriptor.It is important to note that while all the descriptors in Figure 3show two nesting levels only,our framework allows an arbitrary nesting depth.This permits describ-ing higher-dimensional LDS traversals,for example loop nests with >2nesting depth.Also,our framework can handle non-recurrent loads using “singleton”de-scriptors.For example,a pointer to a structure may be dereferenced multiple times to access diﬀerent ﬁelds in the structure.Each dereference is a single non-recurrent load.We create a separate descriptor for each non-recurrent load,nest it under-neath its recurrent load’s descriptor,and assign an appropriate oﬀset value,O ,and length value,L =1.In addition to nested composition,our framework also permits recursive compo-sition .Recursively composed descriptors describe depth-ﬁrst tree traversals.They are similar to nested descriptors,except the dependence edge ﬂows backwards.Since recursive composition introduces cycles into the descriptor graph,our frame-work requires each backwards dependence edge to be annotated with the depth of recursion,D ,to bound the size of the data structure.Figure 4shows a simple recursive descriptor in which the backwards dependence edge originates from and terminates to a single array descriptor.The “L”parameter in the descriptor spec-iﬁes the fanout of the tree.In our example,L =3,so the traversed data structure is a tertiary tree,as shown in Figure 4.Notice the array descriptor has both B and O parameters–B provides the base address for the ﬁrst instance of the descriptor,while O provides the oﬀset for all recursively nested instances.In Figures 2and 4,we assume the L parameter for linked lists and the D parame-ter for trees are known a priori,which is generally not ter in Section 4.3,we discuss how our framework handles these unknown descriptor parameters.In addi-ACM Transactions on Computer Systems。

mit锂电池数据集 ekf算法

MIT锂电池数据集(Energy Storage Systems) 是一个公开的实验性数据集，由麻省理工学院提供，用于研究锂电池的性能。

该数据集包含了来自实验室测试的锂电池的各种参数和特性，可以帮助研究人员更好地了解锂电池的行为和性能。

EKF算法是一种常用的滤波算法，是Extended Kalman Filter的缩写。

它通过将非线性系统线性化，然后利用卡尔曼滤波器来对系统的状态进行估计和预测，常用于锂电池的状态估计和预测。

在本文中，我们将讨论基于MIT锂电池数据集的EKF算法在锂电池状态估计和预测中的应用。

1. MIT锂电池数据集的特点MIT锂电池数据集包含了从锂离子电池中获得的大量的实验数据，包括充放电曲线、温度变化曲线、电压、电流等参数。

这些数据可以帮助研究人员对锂电池的性能进行分析，了解锂电池在不同工况下的表现。

2. EKF算法的原理EKF算法是一种递归的状态估计算法，它通过对系统的状态进行不断地估计和更新，来实现对系统状态的预测和估计。

它通过对系统的动力学模型进行线性化，然后利用卡尔曼滤波器来对系统的状态进行估计。

3. MIT锂电池数据集的应用基于MIT锂电池数据集的EKF算法可以用于锂电池的状态估计和预测。

研究人员可以利用数据集中的实验数据，建立锂电池的动力学模型，然后利用EKF算法对锂电池的状态进行实时估计和预测。

这可以帮助人们更好地了解锂电池在不同工况下的性能表现，为锂电池的设计和优化提供依据。

4. 国内外研究现状国内外许多研究机构和学者都对锂电池的状态估计和预测进行了深入的研究。

他们利用不同的数据集和算法，对锂电池的性能进行了分析和研究。

其中，MIT锂电池数据集的应用也得到了广泛的关注。

5. 发展趋势随着锂电池技术的不断发展和应用，对锂电池状态估计和预测的需求也越来越高。

未来，基于MIT锂电池数据集的EKF算法将广泛应用于锂电池的性能分析和优化研究中，为锂电池行业的发展提供重要的支持。

小学上册第9次英语第二单元综合卷(含答案)

小学上册英语第二单元综合卷(含答案)英语试题一、综合题(本题有100小题，每小题1分，共100分.每小题不选、错误，均不给分)1.They are playing ______ in the park. (football)2.The ________ likes to stay in groups.3.Animals that sleep during the winter are said to __________.4.The _____ of a substance affects its state (solid, liquid, gas).5.What is the capital of Greece?A. AthensB. RomeC. IstanbulD. Cairo答案:A6.They are ______ a game of chess. (playing)7.Hamsters like to _______ on wheels.8.What is the name of the famous war fought between the North and South in the United States?A. World War IB. World War IIC. Civil WarD. Revolutionary War 答案: C9.What is the main ingredient in pizza?A. RiceB. BreadC. DoughD. Pasta答案: C10.Bees help in the _____ (授粉) of many plants.11.The chemical formula for undecylenic acid is ______.12.My _______ (仓鼠) runs on a wheel.13.My brother, ______ (我哥哥), loves to play baseball.14.This ________ (玩具) is fun to play with.15.The _____ (beach) is sandy.16.Sublimation is when a solid changes directly into a ______.17.What do we call the largest land carnivore?A. LionB. Polar bearC. TigerD. Grizzly bear答案:B.Polar bear18.The _____ (开花植物) can brighten any space.19.Did you see the _____ (小兔) hopping across the field?20.Which vegetable is orange and long?A. PotatoB. CarrotC. CucumberD. Lettuce答案:B21.I have a _____ pencil case. (blue)22.My favorite stuffed animal is a ________ (小鹿). It is brown and has big ________ (眼睛).23.She has a _______ (美丽的) smile.24.The __________ (古代文明) left behind many archaeological sites.25.My uncle is a big __________ of sports. (爱好者)26.The _____ (气候变化) affects plant distributions globally.27.My sister loves to ________.28.I want to learn how to ______ (skate) on ice.29. A _______ is a mixture made of two or more liquids that do not mix.30.I believe in the importance of education. Learning new things opens doors to opportunities. I’m grateful for my teachers who inspire me to do my best.31. (Pilgrims) settled in America in 1620. The ____32.My dream is to travel to _______ (日本).33.The chemical formula for potassium bromide is ______.34.What is the name of the event where people come together to enjoy music?A. ConcertB. FestivalC. ShowD. Gathering答案: A35.I write with a _____ (pen/pencil).36.The _____ (狐狸) is known for its cunning nature.37.What is the name of the famous rock formation in Australia?A. UluruB. StonehengeC. Grand CanyonD. Great Wall答案:A.Uluru38.The bird sings a beautiful _______ (鸟儿唱着美丽的_______).39.The __________ is a large area of flat, treeless land in the Arctic.40.The __________ (日本) was isolated until the 19th century.41.What do we call the act of discussing a topic?A. ConversationB. DialogueC. DebateD. Discussion答案:B42.What is the past tense of 'go'?A. GoesB. GoneC. WentD. Going答案:C.Went43.What does a compass help us find?A. DistanceB. SpeedC. DirectionD. Location答案: C44.The _____ (diary) helps track plant growth.45.Certain plants can produce flowers that are not only beautiful but also ______. (某些植物可以产生既美丽又有用的花朵。

Infoprint 250 導入と計画の手引き第 7 章ホスト

SUBNETMASK
255.255.255.128
Type of service...............: TOS
*NORMAL
Maximum transmission unit.....: MTU
*LIND
Autostart.....................:
AUTOSTART
*YES
: xx.xxx.xxx.xxx
: xx.xxx.xxx.xxx
*
(
)
IEEE802.3
60 1500
: xxxx
48 Infoprint 250
31. AS/400
IP
MTU
1
1
IPDS TCP
CRTPSFCFG (V3R2)
WRKAFP2 (V3R1 & V3R6)
RMTLOCNAME RMTSYS
MODEL
0
Advanced function printing............:
AFP
*YES
AFP attachment........................:
AFPATTACH
*APPC
Online at IPL.........................:
ONLINE
FORMFEED
*CONT
Separator drawer......................:
SEPDRAWER
*FILE
Separator program.....................:
SEPPGM
*NONE
Library.............................:

DS-2XS6A46G1 P-IZS C36S80 4 MP ANPR 自动 Number Plat

DS-2XS6A46G1/P-IZS/C36S804 MP ANPR Bullet Solar Power 4G Network Camera KitIt can be used in the areas that are not suitable for laying wired network and electric supply lines, or used for the scenes that feature tough environment and have high demanding for device stability. It can be used for monitoring the farms, electric power cables, water and river system, oil pipelines and key forest areas.It also can be used in the temporary monitoring scenes, such as the large-scale competitions, the sudden public activity, the temporary traffic control and the city construction.Empowered by deep learning algorithms, Hikvision AcuSense technology brings human and vehicle targets classification alarms to front- and back-end devices. The system focuses on human and vehicle targets, vastly improving alarm efficiency and effectiveness.⏹ 80 W photovoltaic panel, 360 Wh chargeable lithium battery⏹ Clear imaging against strong back light due to 120 dB trueWDR technology⏹ Focus on human and vehicle targets classification based ondeep learning⏹Support battery management, battery display, batteryhigh-low temperature protection, charge-dischargeprotection, low-battery sleep protection and remotewakeup ⏹ LTE-TDD/LTE-FDD/WCDMA/GSM 4G wireless networktransmission, support Micro SIM card⏹Water and dust resistant (IP66) *The Wi-Fi module of this product only supports AP mode on Channel 11, and does not support other modes and channels.FunctionRoad Traffic and Vehicle DetectionWith embedded deep learning based license plate capture and recognition algorithms, the camera alone can achieve plate capture and recognition. The algorithm enjoys the high recognition accuracy of common plates and complex-structured plates, which is a great step forward comparing to traditional algorithms. Blocklist and allowlist are available for plate categorization and separate alarm triggering.SpecificationCameraImage Sensor 1/1.8" Progressive Scan CMOSMax. Resolution 2560 × 1440Min. Illumination Color: 0.0005 Lux @ (F1.2, AGC ON), B/W: 0 Lux with light Shutter Time 1 s to 1/100,000 sLensLens Type Auto, Semi-auto, ManualFocal Length & FOV 2.8 to 12 mm, horizontal FOV 107.4° to 39.8°, vertical FOV 56° to 22.4°, diagonal FOV 130.1° to 45.7°8 to 32 mm, horizontal FOV 40.3° to 14.5°, vertical FOV 22.1° to 8.2°, diagonal FOV 46.9° to 16.5°Iris Type Auto-irisLens Mount All In One LensAperture 2.8 to 12 mm: F1.2, 8 to 32 mm: F1.6 DORIDORI 2.8 to 12 mm:Wide: D: 60.0 m, O: 23.8 m, R: 12.0 m, I: 6.0 m Tele: D: 149.0 m, O: 59.1 m, R: 29.8 m, I: 14.9 m 8 to 32 mm:Wide: D: 150.3 m, O: 59.7 m, R: 30.1 m, I: 15.0 m Tele: D: 400 m, O: 158.7 m, R: 80 m, I: 40 mIlluminatorSupplement Light Type IRSupplement Light Range 2.8 to 12 mm: Up to 30 m 8 to 32 mm: Up to 50 mSmart Supplement Light Yes VideoMain Stream Performance mode:50 Hz: 25 fps (2560 × 1440, 1920 × 1080, 1280 × 720) 60 Hz: 30 fps (2560 × 1440, 1920 × 1080, 1280 × 720) Proactive mode:50 Hz: 12.5 fps (2560 × 1440, 1920 × 1080, 1280 × 720) 60 Hz: 15 fps (2560 × 1440, 1920 × 1080, 1280 × 720)Sub-Stream Performance mode:50 Hz: 25 fps (640 × 480, 640 × 360) 60 Hz: 30 fps (640 × 480, 640 × 360) Proactive mode:50 Hz: 12.5 fps (640 × 480, 640 × 360) 60 Hz: 15 fps (640 × 480, 640 × 360)Third Stream 50 Hz: 1 fps (1280 × 720, 640 × 480) 60 Hz: 1 fps (1280 × 720, 640 × 480)Video Compression Main stream: H.264/H.265Sub-stream: H.264/H.265/MJPEGThird Stream: H.265/H.264*Performance mode: main stream supports H.264+, H.265+Video Bit Rate 32 Kbps to 8 MbpsH.264 Type Baseline Profile, Main Profile, High ProfileH.265 Type Main ProfileBit Rate Control CBR/VBRScalable Video Coding (SVC) H.264 and H.265 encodingRegion of Interest (ROI) 4 fixed regions for main streamAudioAudio Compression G.711/G.722.1/G.726/MP2L2/PCM/MP3/AAC-LCAudio Bit Rate 64 Kbps (G.711ulaw/G.711alaw)/16 Kbps (G.722.1)/16 Kbps (G.726)/32 to 192 Kbps (MP2L2)/8 to 320 Kbps (MP3)/16 to 64 Kbps (AAC-LC)Audio Sampling Rate 8 kHz/16 kHz/32 kHz/44.1 kHz/48 kHzEnvironment Noise Filtering YesNetworkSimultaneous Live View Up to 6 channelsAPI Open Network Video Interface (Profile S, Profile G, Profile T), ISAPI, SDK, ISUP, OTAPProtocols TCP/IP, ICMP, HTTP, HTTPS, FTP, DHCP, DNS, DDNS, RTP, RTSP, RTCP, NTP, UPnP, SMTP, SNMP, IGMP, 802.1X, QoS, IPv6, UDP, Bonjour, SSL/TLS, WebSocket, WebSocketsUser/Host Up to 32 users3 user levels: administrator, operator, and userSecurity Password protection, complicated password, HTTPS encryption, 802.1X authentication (EAP-TLS, EAP-LEAP, EAP-MD5), watermark, IP address filter, basic and digest authentication for HTTP/HTTPS, WSSE and digest authentication for Open Network Video Interface, RTP/RTSP over HTTPS, control timeout settings, TLS 1.2, TLS 1.3Network Storage NAS (NFS, SMB/CIFS), auto network replenishment (ANR)Together with high-end Hikvision memory card, memory card encryption and health detection are supported.Client Hik-Connect (proactive mode also supports), Hik-central ProfessionalWeb Browser Plug-in required live view: IE 10+Plug-in free live view: Chrome 57.0+, Firefox 52.0+ Local service: Chrome 57.0+, Firefox 52.0+Mobile CommunicationSIM Card Type MicroSIMFrequency LTE-TDD: Band38/40/41LTE-FDD: Band1/3/5/7/8/20/28 WCDMA: Band1/5/8GSM: 850/900/1800 MHzStandard LTE-TDD/LTE-FDD/WCDMA/GSM ImageWide Dynamic Range (WDR) 120 dBDay/Night Switch Day, Night, Auto, Schedule, Video Trigger Image Enhancement BLC, HLC, 3D DNR, DefogImage Parameters Switch YesImage Settings Saturation, brightness, contrast, sharpness, gain, white balance, adjustable by client software or web browserSNR ≥ 52 dBPrivacy Mask 4 programmable polygon privacy masks InterfaceAudio 1 input (line in), max. input amplitude: 3.3 Vpp, input impedance: 4.7 KΩ, interface type: non-equilibrium,1 output (line out), max. output amplitude: 3.3 Vpp, output impedance: 100 Ω, interface type: non-equilibriumAlarm 1 input, 1 output (max. 12 VDC, 1 A)On-Board Storage Built-in memory card slot, support microSD card, up to 256 GB, Built-in 8 GB eMMC storageReset Key YesEthernet Interface 1 RJ45 10 M/100 M self-adaptive Ethernet portWiegand 1 Wiegand (CardID 26bit, SHA-1 26bit, Hik 34bit, NEWG 72 bit) EventBasic Event Motion detection, video tampering alarm, exception (network disconnected, IP address conflict, illegal login, HDD error)Smart Event Line crossing detection, intrusion detection, region entrance detection, region exiting detection, unattended baggage detection, object removal detection, scene change detection, face detectionLinkage Upload to FTP/NAS/memory card, notify surveillance center, send email, trigger recording, trigger capture, trigger alarm output, audible warningDeep Learning FunctionRoad Traffic and Vehicle Detection Blocklist and allowlist: up to 10,000 records Support license plate recognition License plate recognition rate ≥95%GeneralPower 12 VDC ± 20%, 4-pin M8 waterproof connector1. Standby power consumption: 45 mW2. The average power consumption of 24 hours:3.5 W (4G transmission is excluded).3. The max. power consumption: 7 WMaterial Front cover: metal, body: metal, bracket: metalDimension 816.2 mm × 735.9 mm × 760 mm (32.1" × 28.9" × 29.9") (Max. size of the camera after it is completely assembled)Package Dimension 862 mm × 352 mm × 762 mm (33.9" × 13.9" × 30.0")Weight Approx. 31.885 kg (70.3 lb.)With Package Weight Approx. 25.650 kg (56.5 lb.)Storage Conditions -20 °C to 60 °C (-4 °F to 140 °F). Humidity 95% or less (non-condensing) Startup and OperatingConditions-20 °C to 60 °C (-4 °F to 140 °F). Humidity 95% or less (non-condensing)Language 33 languages: English, Russian, Estonian, Bulgarian, Hungarian, Greek, German, Italian, Czech, Slovak, French, Polish, Dutch, Portuguese, Spanish, Romanian, Danish, Swedish, Norwegian, Finnish, Croatian, Slovenian, Serbian, Turkish, Korean, Traditional Chinese, Thai, Vietnamese, Japanese, Latvian, Lithuanian, Portuguese (Brazil), UkrainianGeneral Function Anti-banding, heartbeat, mirror, flash log, password reset via email, pixel counter BatteryBattery Type LithiumCapacity 360 Wh (90 Wh for each battery)Max. Output Voltage 12.6 V Battery Voltage 10.8 VOperating Temperature Charging: -20 °C to 45 °C (-4 °F to 113 °F) Discharging: -20 °C to 60 °C (-4 °F to 140 °F)Cycle Lifetime Performance mode: 5 days, Proactive mode: 8 days, Standby mode: 80 days *in cloudy/rainy days (25 °C)Battery Life More than 500 cyclesBattery Weight Approx. 2.74 kg (6.0 lb.) (0.685 kg (1.5 lb.) for each battery) ApprovalEMC CE-EMC/UKCA (EN 55032:2015+A11:2020+A1:2020, EN 50130-4:2011+A1:2014); RCM (AS/NZS CISPR 32: 2015);IC (ICES-003: Issue 7)RF CE-RED/UKCA (EN 301908-1, EN 301908-2, EN 301908-13, EN 301511, EN 301489-1, EN 301489-52, EN 62133);ICASA: same as CE-RED;IC ID (RSS-132 Issue 3, RSS-133 Issue 6, RSS-139 Issue 3, RSS-130 Issue 2, RSS-102 Issue 5);MIC (Article 49-6-4 and 49-6-5 the relevant articles and MIC Notice No. 1299 of the Ordinance Regulating Radio Equipment)Safety CB (IEC 62368-1:2014+A11)CE-LVD/UKCA (EN 62368-1:2014/A11:2017) LOA (IEC/EN 60950-1)Environment CE-RoHS (2011/65/EU);WEEE (2012/19/EU);Reach (Regulation (EC) No 1907/2006)Protection Camera: IP66 (IEC 60529-2013)Wind resistance 12 level, up to 40 m/s wind speed resistance⏹Typical ApplicationHikvision products are classified into three levels according to their anti-corrosion performance. Refer to the following description to choose for your using environment.This model has NO SPECIFIC PROTECTION.Level DescriptionTop-level protection Hikvision products at this level are equipped for use in areas where professional anti-corrosion protection is a must. Typical application scenarios include coastlines, docks,chemical plants, and more.Moderate protection Hikvision products at this level are equipped for use in areas with moderate anti-corrosion demands. Typical application scenarios include coastal areas about 2kilometers (1.24 miles) away from coastlines, as well as areas affected by acid rain. No specific protection Hikvision products at this level are equipped for use in areas where no specific anti-corrosion protection is needed.⏹Available ModelDS-2XS6A46G1/P-IZS/C36S80 (2.8-12mm)DS-2XS6A46G1/P-IZS/C36S80 (8-32mm)Dimension。

uniqtag包的说明文档说明书

Package‘uniqtag’October12,2022Type PackageTitle Abbreviate Strings to Short,Unique IdentiﬁersVersion1.0.1Description For each string in a set of strings,determine a unique tag that is a sub-string ofﬁxed size k unique to that string,if it has one.If no such unique substring ex-ists,the least frequent substring is used.If multiple unique substrings exist,the lexicographi-cally smallest substring is used.This lexicographically smallest sub-string of size k is called the``UniqTag''of that string.License MIT+ﬁle LICENSEEncoding UTF-8RoxygenNote7.1.2URL https:///sjackman/uniqtagBugReports https:///sjackman/uniqtag/issuesSuggests testthatNeedsCompilation noAuthor Shaun Jackman[aut,cph,cre]Maintainer Shaun Jackman<******************>Repository CRANDate/Publication2022-06-1006:10:02UTCR topics documented:uniqtag-package (2)cumcount (2)kmers_of (3)make_unique (3)uniqtag (4)Index612cumcount uniqtag-package Abbreviate strings to short,unique identiﬁers.DescriptionFor each string in a set of strings,determine a unique tag that is a substring ofﬁxed size k unique to that string,if it has one.If no such unique substring exists,the least frequent substring is used.If multiple unique substrings exist,the lexicographically smallest substring is used.This lexicograph-ically smallest substring of size k is called the"UniqTag"of that string.Author(s)Shaun Jackman<******************>cumcount Cumulative count of strings.DescriptionReturn an integer vector counting the number of occurrences of each string up to that position in the vector.Usagecumcount(xs)Argumentsxs a character vectorValuean integer vector of the cumulative string countsExamplescumcount(abbreviate(,3,strict=TRUE))kmers_of3 kmers_of Return the k-mers of a string.DescriptionReturn the k-mers(substrings of size k)of the string x,or return the string x itself if it is shorter than k.Usagekmers_of(x,k)vkmers_of(xs,k)Argumentsx a character stringk the size of the substrings,an integerxs a character vectorValuekmers_of:a character vector of the k-mers of xvkmers_of:a list of character vectors of the k-mers of xsFunctions•kmers_of:Return the k-mers of the string x.•vkmers_of:Return the k-mers of the strings xs.make_unique Make character strings unique.DescriptionAppend sequence numbers to duplicate elements to make all elements of a character vector unique. Usagemake_unique(xs,sep="-")make_unique_duplicates(xs,sep="-")make_unique_all(xs,sep="-")make_unique_all_or_none(xs,sep="-")Argumentsxs a character vectorsep a character string used to separate a duplicate string from its sequence numberFunctions•make_unique:Append a sequence number to duplicated elements,including theﬁrst occur-rence.•make_unique_duplicates:Append a sequence number to duplicated elements,except the ﬁrst occurrence.This function behaves similarly to make.unique•make_unique_all:Append a sequence number to every element.•make_unique_all_or_none:Append a sequence number to every element or no elements.Return xs unchanged if the elements of the character vector xs are already unique.Otherwise append a sequence number to every element.See Alsomake.uniqueExamplesabcb<-c("a","b","c","b")make_unique(abcb)make_unique_duplicates(abcb)make_unique_all(abcb)make_unique_all_or_none(abcb)make_unique_all_or_none(c("a","b","c"))x<-make_unique(abbreviate(,3,strict=TRUE))x[grep("-",x)]uniqtag Abbreviate strings to short,unique identiﬁers.DescriptionAbbreviate strings to unique substrings of k characters.Usageuniqtag(xs,k=9,uniq=make_unique_all_or_none,sep="-")Argumentsxs a character vectork the size of the identiﬁer,an integeruniq a function to make the abbreviations unique,such as make_unique,make_unique_duplicates, make_unique_all_or_none,make_unique_all,make.unique,or to disable thisfunction,identity or NULLsep a character string used to separate a duplicate string from its sequence numberDetailsFor each string in a set of strings,determine a unique tag that is a substring ofﬁxed size k uniqueto that string,if it has one.If no such unique substring exists,the least frequent substring is used.Ifmultiple unique substrings exist,the lexicographically smallest substring is used.This lexicograph-ically smallest substring of size k is called the UniqTag of that string.The lexicographically smallest substring depend on the locale’s sort order.You may wish toﬁrstcall Sys.setlocale("LC_COLLATE","C")Valuea character vector of the UniqTags of the strings xSee Alsoabbreviate,locales,make.uniqueExamplesSys.setlocale("LC_COLLATE","C")states<-sub("","",)uniqtags<-uniqtag(states)uniqtags4<-uniqtag(states,k=4)uniqtags3<-uniqtag(states,k=3)uniqtags3x<-uniqtag(states,k=3,uniq=make_unique)table(nchar(states))table(nchar(uniqtags))table(nchar(uniqtags4))table(nchar(uniqtags3))table(nchar(uniqtags3x))uniqtags3[grep("-",uniqtags3x)]Indexcumcount,2kmers_of,3make_unique,3make_unique_all(make_unique),3make_unique_all_or_none(make_unique),3 make_unique_duplicates(make_unique),3 uniqtag,4uniqtag-package,2vkmers_of(kmers_of),36。

PF6 FlexSystem产品说明书

4
Easy system integration
A modular PF6 FlexSystem reduce your engineering e orts, making the integration with line equipments easier and e ective .
Improve your safety, while eliminate multiple cables and increase reliability and MTBF without worry about dirt or wet environments.
Result data collection
Operator guidance
PROFIsafe
IP 54
What customer are saying?
“I saved two whole installation days using the PF6 FlexSystem. The product is extremely fast and smooth. I also saved seven seconds in cycle times.
In conjunction with the most durable
Application Centers have the ability to combine standard Atlas Copco acessories to innovate a customized solution, such as a torque arm, wheel multiple, and more.
PF6 FlexSystem
The future of exible production

Microeconometrics using stata

Microeconometrics Using StataContentsList of tables xxxv List of figures xxxvii Preface xxxix 1Stata basics1............................................................................................1.1Interactive use 1..............................................................................................1.2 Documentation 2..........................................................................1.2.1Stata manuals 2...........................................................1.2.2Additional Stata resources 3.......................................................................1.2.3The help command 3................................1.2.4The search, findit, and hsearch commands 41.3 Command syntax and operators 5...................................................................................................................................1.3.1Basic command syntax 5................................................1.3.2 Example: The summarize command 61.3.3Example: The regress command 7..............................................................................1.3.4Abbreviations, case sensitivity, and wildcards 9................................1.3.5Arithmetic, relational, and logical operators 9.........................................................................1.3.6Error messages 10........................................................................................1.4 Do-files and log files 10.............................................................................1.4.1Writing a do-file 101.4.2Running do-files 11.........................................................................................................................................................................1.4.3Log files 12..................................................................1.4.4 A three-step process 131.4.5Comments and long lines 13......................................................................................................1.4.6Different implementations of Stata 141.5Scalars and matrices (15)1.5.1Scalars (15)1.5.2Matrices (15)1.6 Using results from Stata commands (16)1.6.1Using results from the r-class command summarize (16)1.6.2Using results from the e-class command regress (17)1.7 Global and local macros (19)1.7.1Global macros (19)1.7.2Local macros (20)1.7.3Scalar or macro? (21)1.8 Looping commands (22)1.8.1The foreach loop (23)1.8.2The forvalues loop (23)1.8.3The while loop (24)1.8.4The continue command (24)1.9 Some useful commands (24)1.10 Template do-file (25)1.11 User-written commands (25)1.12 Stata resources (26)1.13 Exercises (26)2 Data management and graphics292.1Introduction (29)2.2 Types of data (29)2.2.1Text or ASCII data (30)2.2.2Internal numeric data (30)2.2.3String data (31)2.2.4Formats for displaying numeric data (31)2.3Inputting data (32)2.3.1General principles (32)2.3.2Inputting data already in Stata format (33)2.3.3Inputting data from the keyboard (34)2.3.4Inputting nontext data (34)2.3.5Inputting text data from a spreadsheet (35)2.3.6Inputting text data in free format (36)2.3.7Inputting text data in fixed format (36)2.3.8Dictionary files (37)2.3.9Common pitfalls (37)2.4 Data management (38)2.4.1PSID example (38)2.4.2Naming and labeling variables (41)2.4.3Viewing data (42)2.4.4Using original documentation (43)2.4.5Missing values (43)2.4.6Imputing missing data (45)2.4.7Transforming data (generate, replace, egen, recode) (45)The generate and replace commands (46)The egen command (46)The recode command (47)The by prefix (47)Indicator variables (47)Set of indicator variables (48)Interactions (49)Demeaning (50)2.4.8Saving data (51)2.4.9Selecting the sample (51)2.5 Manipulating datasets (53)2.5.1Ordering observations and variables (53)2.5.2Preserving and restoring a dataset (53)2.5.3Wide and long forms for a dataset (54)2.5.4Merging datasets (54)2.5.5Appending datasets (56)2.6 Graphical display of data (57)2.6.1Stata graph commands (57)Example graph commands (57)Saving and exporting graphs (58)Learning how to use graph commands (59)2.6.2Box-and-whisker plot (60)2.6.3Histogram (61)2.6.4Kernel density plot (62)2.6.5Twoway scatterplots and fitted lines (64)2.6.6Lowess, kernel, local linear, and nearest-neighbor regression652.6.7Multiple scatterplots (67)2.7 Stata resources (68)2.8Exercises (68)3Linear regression basics713.1Introduction (71)3.2 Data and data summary (71)3.2.1Data description (71)3.2.2Variable description (72)3.2.3Summary statistics (73)3.2.4More-detailed summary statistics (74)3.2.5Tables for data (75)3.2.6Statistical tests (78)3.2.7Data plots (78)3.3Regression in levels and logs (79)3.3.1Basic regression theory (79)3.3.2OLS regression and matrix algebra (80)3.3.3Properties of the OLS estimator (81)3.3.4Heteroskedasticity-robust standard errors (82)3.3.5Cluster–robust standard errors (82)3.3.6Regression in logs (83)3.4Basic regression analysis (84)3.4.1Correlations (84)3.4.2The regress command (85)3.4.3Hypothesis tests (86)3.4.4Tables of output from several regressions (87)3.4.5Even better tables of regression output (88)3.5Specification analysis (90)3.5.1Specification tests and model diagnostics (90)3.5.2Residual diagnostic plots (91)3.5.3Influential observations (92)3.5.4Specification tests (93)Test of omitted variables (93)Test of the Box–Cox model (94)Test of the functional form of the conditional mean (95)Heteroskedasticity test (96)Omnibus test (97)3.5.5Tests have power in more than one direction (98)3.6Prediction (100)3.6.1In-sample prediction (100)3.6.2Marginal effects (102)3.6.3Prediction in logs: The retransformation problem (103)3.6.4Prediction exercise (104)3.7 Sampling weights (105)3.7.1Weights (106)3.7.2Weighted mean (106)3.7.3Weighted regression (107)3.7.4Weighted prediction and MEs (109)3.8 OLS using Mata (109)3.9Stata resources (111)3.10 Exercises (111)4Simulation1134.1Introduction (113)4.2 Pseudorandom-number generators: Introduction (114)4.2.1Uniform random-number generation (114)4.2.2Draws from normal (116)4.2.3Draws from t, chi-squared, F, gamma, and beta (117)4.2.4 Draws from binomial, Poisson, and negative binomial . . . (118)Independent (but not identically distributed) draws frombinomial (118)Independent (but not identically distributed) draws fromPoisson (119)Histograms and density plots (120)4.3 Distribution of the sample mean (121)4.3.1Stata program (122)4.3.2The simulate command (123)4.3.3Central limit theorem simulation (123)4.3.4The postfile command (124)4.3.5Alternative central limit theorem simulation (125)4.4 Pseudorandom-number generators: Further details (125)4.4.1Inverse-probability transformation (126)4.4.2Direct transformation (127)4.4.3Other methods (127)4.4.4Draws from truncated normal (128)4.4.5Draws from multivariate normal (129)Direct draws from multivariate normal (129)Transformation using Cholesky decomposition (130)4.4.6Draws using Markov chain Monte Carlo method (130)4.5 Computing integrals (132)4.5.1Quadrature (133)4.5.2Monte Carlo integration (133)4.5.3Monte Carlo integration using different S (134)4.6Simulation for regression: Introduction (135)4.6.1Simulation example: OLS with X2 errors (135)4.6.2Interpreting simulation output (138)Unbiasedness of estimator (138)Standard errors (138)t statistic (138)Test size (139)Number of simulations (140)4.6.3Variations (140)Different sample size and number of simulations (140)Test power (140)Different error distributions (141)4.6.4Estimator inconsistency (141)4.6.5Simulation with endogenous regressors (142)4.7Stata resources (144)4.8Exercises (144)5GLS regression1475.1Introduction (147)5.2 GLS and FGLS regression (147)5.2.1GLS for heteroskedastic errors (147)5.2.2GLS and FGLS (148)5.2.3Weighted least squares and robust standard errors (149)5.2.4Leading examples (149)5.3 Modeling heteroskedastic data (150)5.3.1Simulated dataset (150)5.3.2OLS estimation (151)5.3.3Detecting heteroskedasticity (152)5.3.4FGLS estimation (154)5.3.5WLS estimation (156)5.4System of linear regressions (156)5.4.1SUR model (156)5.4.2The sureg command (157)5.4.3Application to two categories of expenditures (158)5.4.4Robust standard errors (160)5.4.5Testing cross-equation constraints (161)5.4.6Imposing cross-equation constraints (162)5.5Survey data: Weighting, clustering, and stratification (163)5.5.1Survey design (164)5.5.2Survey mean estimation (167)5.5.3Survey linear regression (167)5.6Stata resources (169)5.7Exercises (169)6Linear instrumental-variables regression1716.1Introduction (171)6.2 IV estimation (171)6.2.1Basic IV theory (171)6.2.2Model setup (173)6.2.3IV estimators: IV, 2SLS, and GMM (174)6.2.4Instrument validity and relevance (175)6.2.5Robust standard-error estimates (176)6.3 IV example (177)6.3.1The ivregress command (177)6.3.2Medical expenditures with one endogenous regressor . . . (178)6.3.3Available instruments (179)6.3.4IV estimation of an exactly identified model (180)6.3.5IV estimation of an overidentified model (181)6.3.6Testing for regressor endogeneity (182)6.3.7Tests of overidentifying restrictions (185)6.3.8IV estimation with a binary endogenous regressor (186)6.4 Weak instruments (188)6.4.1Finite-sample properties of IV estimators (188)6.4.2Weak instruments (189)Diagnostics for weak instruments (189)Formal tests for weak instruments (190)6.4.3The estat firststage command (191)6.4.4Just-identified model (191)6.4.5Overidentified model (193)6.4.6More than one endogenous regressor (195)6.4.7Sensitivity to choice of instruments (195)6.5 Better inference with weak instruments (197)6.5.1Conditional tests and confidence intervals (197)6.5.2LIML estimator (199)6.5.3Jackknife IV estimator (199)6.5.4 Comparison of 2SLS, LIML, JIVE, and GMM (200)6.6 3SLS systems estimation (201)6.7Stata resources (203)6.8Exercises (203)7Quantile regression2057.1Introduction (205)7.2 QR (205)7.2.1Conditional quantiles (206)7.2.2Computation of QR estimates and standard errors (207)7.2.3The qreg, bsqreg, and sqreg commands (207)7.3 QR for medical expenditures data (208)7.3.1Data summary (208)7.3.2QR estimates (209)7.3.3Interpretation of conditional quantile coefficients (210)7.3.4Retransformation (211)7.3.5Comparison of estimates at different quantiles (212)7.3.6Heteroskedasticity test (213)7.3.7Hypothesis tests (214)7.3.8Graphical display of coefficients over quantiles (215)7.4 QR for generated heteroskedastic data (216)7.4.1Simulated dataset (216)7.4.2QR estimates (219)7.5 QR for count data (220)7.5.1Quantile count regression (221)7.5.2The qcount command (222)7.5.3Summary of doctor visits data (222)7.5.4Results from QCR (224)7.6Stata resources (226)7.7Exercises (226)8Linear panel-data models: Basics2298.1Introduction (229)8.2 Panel-data methods overview (229)8.2.1Some basic considerations (230)8.2.2Some basic panel models (231)Individual-effects model (231)Fixed-effects model (231)Random-effects model (232)Pooled model or population-averaged model (232)Two-way-effects model (232)Mixed linear models (233)8.2.3Cluster-robust inference (233)8.2.4The xtreg command (233)8.2.5Stata linear panel-data commands (234)8.3 Panel-data summary (234)8.3.1Data description and summary statistics (234)8.3.2Panel-data organization (236)8.3.3Panel-data description (237)8.3.4Within and between variation (238)8.3.5Time-series plots for each individual (241)8.3.6Overall scatterplot (242)8.3.7Within scatterplot (243)8.3.8Pooled OLS regression with cluster—robust standard errors ..2448.3.9Time-series autocorrelations for panel data (245)8.3.10 Error correlation in the RE model (247)8.4 Pooled or population-averaged estimators (248)8.4.1Pooled OLS estimator (248)8.4.2Pooled FGLS estimator or population-averaged estimator (248)8.4.3The xtreg, pa command (249)8.4.4Application of the xtreg, pa command (250)8.5 Within estimator (251)8.5.1Within estimator (251)8.5.2The xtreg, fe command (251)8.5.3Application of the xtreg, fe command (252)8.5.4Least-squares dummy-variables regression (253)8.6 Between estimator (254)8.6.1Between estimator (254)8.6.2Application of the xtreg, be command (255)8.7 RE estimator (255)8.7.1RE estimator (255)8.7.2The xtreg, re command (256)8.7.3Application of the xtreg, re command (256)8.8 Comparison of estimators (257)8.8.1Estimates of variance components (257)8.8.2Within and between R-squared (258)8.8.3Estimator comparison (258)8.8.4Fixed effects versus random effects (259)8.8.5Hausman test for fixed effects (260)The hausman command (260)Robust Hausman test (261)8.8.6Prediction (262)8.9 First-difference estimator (263)8.9.1First-difference estimator (263)8.9.2Strict and weak exogeneity (264)8.10 Long panels (265)8.10.1 Long-panel dataset (265)8.10.2 Pooled OLS and PFGLS (266)8.10.3 The xtpcse and xtgls commands (267)8.10.4 Application of the xtgls, xtpcse, and xtscc commands . . . (268)8.10.5 Separate regressions (270)8.10.6 FE and RE models (271)8.10.7 Unit roots and cointegration (272)8.11 Panel-data management (274)8.11.1 Wide-form data (274)8.11.2 Convert wide form to long form (274)8.11.3 Convert long form to wide form (275)8.11.4 An alternative wide-form data (276)8.12 Stata resources (278)8.13 Exercises (278)9Linear panel-data models: Extensions2819.1Introduction (281)9.2 Panel IV estimation (281)9.2.1Panel IV (281)9.2.2The xtivreg command (282)9.2.3Application of the xtivreg command (282)9.2.4Panel IV extensions (284)9.3 Hausman-Taylor estimator (284)9.3.1Hausman-Taylor estimator (284)9.3.2The xthtaylor command (285)9.3.3Application of the xthtaylor command (285)9.4 Arellano-Bond estimator (287)9.4.1Dynamic model (287)9.4.2IV estimation in the FD model (288)9.4.3 The xtabond command (289)9.4.4Arellano-Bond estimator: Pure time series (290)9.4.5Arellano-Bond estimator: Additional regressors (292)9.4.6Specification tests (294)9.4.7 The xtdpdsys command (295)9.4.8 The xtdpd command (297)9.5 Mixed linear models (298)9.5.1Mixed linear model (298)9.5.2 The xtmixed command (299)9.5.3Random-intercept model (300)9.5.4Cluster-robust standard errors (301)9.5.5Random-slopes model (302)9.5.6Random-coefficients model (303)9.5.7Two-way random-effects model (304)9.6 Clustered data (306)9.6.1Clustered dataset (306)9.6.2Clustered data using nonpanel commands (306)9.6.3Clustered data using panel commands (307)9.6.4Hierarchical linear models (310)9.7Stata resources (311)9.8Exercises (311)10 Nonlinear regression methods31310.1 Introduction (313)10.2 Nonlinear example: Doctor visits (314)10.2.1 Data description (314)10.2.2 Poisson model description (315)10.3 Nonlinear regression methods (316)10.3.1 MLE (316)10.3.2 The poisson command (317)10.3.3 Postestimation commands (318)10.3.4 NLS (319)10.3.5 The nl command (319)10.3.6 GLM (321)10.3.7 The glm command (321)10.3.8 Other estimators (322)10.4 Different estimates of the VCE (323)10.4.1 General framework (323)10.4.2 The vce() option (324)10.4.3 Application of the vce() option (324)10.4.4 Default estimate of the VCE (326)10.4.5 Robust estimate of the VCE (326)10.4.6 Cluster–robust estimate of the VCE (327)10.4.7 Heteroskedasticity- and autocorrelation-consistent estimateof the VCE (328)10.4.8 Bootstrap standard errors (328)10.4.9 Statistical inference (329)10.5 Prediction (329)10.5.1 The predict and predictnl commands (329)10.5.2 Application of predict and predictnl (330)10.5.3 Out-of-sample prediction (331)10.5.4 Prediction at a specified value of one of the regressors (321)10.5.5 Prediction at a specified value of all the regressors (332)10.5.6 Prediction of other quantities (333)10.6 Marginal effects (333)10.6.1 Calculus and finite-difference methods (334)10.6.2 MEs estimates AME, MEM, and MER (334)10.6.3 Elasticities and semielasticities (335)10.6.4 Simple interpretations of coefficients in single-index models (336)10.6.5 The mfx command (337)10.6.6 MEM: Marginal effect at mean (337)Comparison of calculus and finite-difference methods . . . (338)10.6.7 MER: Marginal effect at representative value (338)10.6.8 AME: Average marginal effect (339)10.6.9 Elasticities and semielasticities (340)10.6.10 AME computed manually (342)10.6.11 Polynomial regressors (343)10.6.12 Interacted regressors (344)10.6.13 Complex interactions and nonlinearities (344)10.7 Model diagnostics (345)10.7.1 Goodness-of-fit measures (345)10.7.2 Information criteria for model comparison (346)10.7.3 Residuals (347)10.7.4 Model-specification tests (348)10.8 Stata resources (349)10.9 Exercises (349)11 Nonlinear optimization methods35111.1 Introduction (351)11.2 Newton–Raphson method (351)11.2.1 NR method (351)11.2.2 NR method for Poisson (352)11.2.3 Poisson NR example using Mata (353)Core Mata code for Poisson NR iterations (353)Complete Stata and Mata code for Poisson NR iterations (353)11.3 Gradient methods (355)11.3.1 Maximization options (355)11.3.2 Gradient methods (356)11.3.3 Messages during iterations (357)11.3.4 Stopping criteria (357)11.3.5 Multiple maximums (357)11.3.6 Numerical derivatives (358)11.4 The ml command: if method (359)11.4.1 The ml command (360)11.4.2 The If method (360)11.4.3 Poisson example: Single-index model (361)11.4.4 Negative binomial example: Two-index model (362)11.4.5 NLS example: Nonlikelihood model (363)11.5 Checking the program (364)11.5.1 Program debugging using ml check and ml trace (365)11.5.2 Getting the program to run (366)11.5.3 Checking the data (366)11.5.4 Multicollinearity and near coilinearity (367)11.5.5 Multiple optimums (368)11.5.6 Checking parameter estimation (369)11.5.7 Checking standard-error estimation (370)11.6 The ml command: d0, dl, and d2 methods (371)11.6.1 Evaluator functions (371)11.6.2 The d0 method (373)11.6.3 The dl method (374)11.6.4 The dl method with the robust estimate of the VCE (374)11.6.5 The d2 method (375)11.7 The Mata optimize() function (376)11.7.1 Type d and v evaluators (376)11.7.2 Optimize functions (377)11.7.3 Poisson example (377)Evaluator program for Poisson MLE (377)The optimize() function for Poisson MLE (378)11.8 Generalized method of moments (379)11.8.1 Definition (380)11.8.2 Nonlinear IV example (380)11.8.3 GMM using the Mata optimize() function (381)11.9 Stata resources (383)11.10 Exercises (383)12 Testing methods38512.1 Introduction (385)12.2 Critical values and p-values (385)12.2.1 Standard normal compared with Student's t (386)12.2.2 Chi-squared compared with F (386)12.2.3 Plotting densities (386)12.2.4 Computing p-values and critical values (388)12.2.5 Which distributions does Stata use? (389)12.3 Wald tests and confidence intervals (389)12.3.1 Wald test of linear hypotheses (389)12.3.2 The test command (391)Test single coefficient (392)Test several hypotheses (392)Test of overall significance (393)Test calculated from retrieved coefficients and VCE (393)12.3.3 One-sided Wald tests (394)12.3.4 Wald test of nonlinear hypotheses (delta method) (395)12.3.5 The testnl command (395)12.3.6 Wald confidence intervals (396)12.3.7 The lincom command (396)12.3.8 The nlcom command (delta method) (397)12.3.9 Asymmetric confidence intervals (398)12.4 Likelihood-ratio tests (399)12.4.1 Likelihood-ratio tests (399)12.4.2 The lrtest command (401)12.4.3 Direct computation of LR tests (401)12.5 Lagrange multiplier test (or score test) (402)12.5.1 LM tests (402)12.5.2 The estat command (403)12.5.3 LM test by auxiliary regression (403)12.6 Test size and power (405)12.6.1 Simulation DGP: OLS with chi-squared errors (405)12.6.2 Test size (406)12.6.3 Test power (407)12.6.4 Asymptotic test power (410)12.7 Specification tests (411)12.7.1 Moment-based tests (411)12.7.2 Information matrix test (411)12.7.3 Chi-squared goodness-of-fit test (412)12.7.4 Overidentifying restrictions test (412)12.7.5 Hausman test (412)12.7.6 Other tests (413)12.8 Stata resources (413)12.9 Exercises (413)13 Bootstrap methods41513.1 Introduction (415)13.2 Bootstrap methods (415)13.2.1 Bootstrap estimate of standard error (415)13.2.2 Bootstrap methods (416)13.2.3 Asymptotic refinement (416)13.2.4 Use the bootstrap with caution (416)13.3 Bootstrap pairs using the vce(bootstrap) option (417)13.3.1 Bootstrap-pairs method to estimate VCE (417)13.3.2 The vce(bootstrap) option (418)13.3.3 Bootstrap standard-errors example (418)13.3.4 How many bootstraps? (419)13.3.5 Clustered bootstraps (420)13.3.6 Bootstrap confidence intervals (421)13.3.7 The postestimation estat bootstrap command (422)13.3.8 Bootstrap confidence-intervals example (423)13.3.9 Bootstrap estimate of bias (423)13.4 Bootstrap pairs using the bootstrap command (424)13.4.1 The bootstrap command (424)13.4.2 Bootstrap parameter estimate from a Stata estimationcommand (425)13.4.3 Bootstrap standard error from a Stata estimation command (426)13.4.4 Bootstrap standard error from a user-written estimationcommand (426)13.4.5 Bootstrap two-step estimator (427)13.4.6 Bootstrap Hausman test (429)13.4.7 Bootstrap standard error of the coefficient of variation . . (430)13.5 Bootstraps with asymptotic refinement (431)13.5.1 Percentile-t method (431)13.5.2 Percentile-t Wald test (432)13.5.3 Percentile-t Wald confidence interval (433)13.6 Bootstrap pairs using bsample and simulate (434)13.6.1 The bsample command (434)13.6.2 The bsample command with simulate (434)13.6.3 Bootstrap Monte Carlo exercise (436)13.7 Alternative resampling schemes (436)13.7.1 Bootstrap pairs (437)13.7.2 Parametric bootstrap (437)13.7.3 Residual bootstrap (439)13.7.4 Wild bootstrap (440)13.7.5 Subsampling (441)13.8 The jackknife (441)13.8.1 Jackknife method (441)13.8.2 The vice(jackknife) option and the jackknife command . . (442)13.9 Stata resources (442)13.10 Exercises (442)14 Binary outcome models44514.1 Introduction (445)14.2 Some parametric models (445)14.2.1 Basic model (445)14.2.2 Logit, probit, linear probability, and clog-log models . . . (446)14.3 Estimation (446)14.3.1 Latent-variable interpretation and identification (447)14.3.2 ML estimation (447)14.3.3 The logit and probit commands (448)14.3.4 Robust estimate of the VCE (448)14.3.5 OLS estimation of LPM (448)14.4 Example (449)14.4.1 Data description (449)14.4.2 Logit regression (450)14.4.3 Comparison of binary models and parameter estimates . (451)14.5 Hypothesis and specification tests (452)14.5.1 Wald tests (453)14.5.2 Likelihood-ratio tests (453)14.5.3 Additional model-specification tests (454)Lagrange multiplier test of generalized logit (454)Heteroskedastic probit regression (455)14.5.4 Model comparison (456)14.6 Goodness of fit and prediction (457)14.6.1 Pseudo-R2 measure (457)14.6.2 Comparing predicted probabilities with sample frequencies (457)14.6.3 Comparing predicted outcomes with actual outcomes . . . (459)14.6.4 The predict command for fitted probabilities (460)14.6.5 The prvalue command for fitted probabilities (461)14.7 Marginal effects (462)14.7.1 Marginal effect at a representative value (MER) (462)14.7.2 Marginal effect at the mean (MEM) (463)14.7.3 Average marginal effect (AME) (464)14.7.4 The prchange command (464)14.8 Endogenous regressors (465)14.8.1 Example (465)14.8.2 Model assumptions (466)14.8.3 Structural-model approach (467)The ivprobit command (467)Maximum likelihood estimates (468)Two-step sequential estimates (469)14.8.4 IVs approach (471)14.9 Grouped data (472)14.9.1 Estimation with aggregate data (473)14.9.2 Grouped-data application (473)14.10 Stata resources (475)14.11 Exercises (475)15 Multinomial models47715.1 Introduction (477)15.2 Multinomial models overview (477)15.2.1 Probabilities and MEs (477)15.2.2 Maximum likelihood estimation (478)15.2.3 Case-specific and alternative-specific regressors (479)15.2.4 Additive random-utility model (479)15.2.5 Stata multinomial model commands (480)15.3 Multinomial example: Choice of fishing mode (480)15.3.1 Data description (480)15.3.2 Case-specific regressors (483)15.3.3 Alternative-specific regressors (483)15.4 Multinomial logit model (484)15.4.1 The mlogit command (484)15.4.2 Application of the mlogit command (485)15.4.3 Coefficient interpretation (486)15.4.4 Predicted probabilities (487)15.4.5 MEs (488)15.5 Conditional logit model (489)15.5.1 Creating long-form data from wide-form data (489)15.5.2 The asclogit command (491)15.5.3 The clogit command (491)15.5.4 Application of the asclogit command (492)15.5.5 Relationship to multinomial logit model (493)15.5.6 Coefficient interpretation (493)15.5.7 Predicted probabilities (494)15.5.8 MEs (494)15.6 Nested logit model (496)15.6.1 Relaxing the independence of irrelevant alternatives as-sumption (497)15.6.2 NL model (497)15.6.3 The nlogit command (498)15.6.4 Model estimates (499)15.6.5 Predicted probabilities (501)15.6.6 MEs (501)15.6.7 Comparison of logit models (502)15.7 Multinomial probit model (503)15.7.1 MNP (503)15.7.2 The mprobit command (503)15.7.3 Maximum simulated likelihood (504)15.7.4 The asmprobit command (505)15.7.5 Application of the asmprobit command (505)15.7.6 Predicted probabilities and MEs (507)15.8 Random-parameters logit (508)15.8.1 Random-parameters logit (508)15.8.2 The mixlogit command (508)15.8.3 Data preparation for mixlogit (509)15.8.4 Application of the mixlogit command (509)15.9 Ordered outcome models (510)15.9.1 Data summary (511)15.9.2 Ordered outcomes (512)15.9.3 Application of the ologit command (512)15.9.4 Predicted probabilities (513)15.9.5 MEs (513)15.9.6 Other ordered models (514)15.10 Multivariate outcomes (514)15.10.1 Bivariate probit (515)15.10.2 Nonlinear SUR (517)15.11 Stata resources (518)15.12 Exercises (518)16 Tobit and selection models52116.1 Introduction (521)16.2 Tobit model (521)16.2.1 Regression with censored data (521)16.2.2 Tobit model setup (522)16.2.3 Unknown censoring point (523)。

Multilayer feedforward networks with a nonpolynomial activation function can approximate an

Center for Digital Economy Research Stern School of Business Working y different applications, with most papers reporting that they perform at least as good as their traditional competitors, e.g. linear discrimination models and Bayesian classifiers. This success has recently led several researchers to undertake a rigorous analysis of the mathematical properties that enable feedforward networks to perform well in the field. The motivation for this line of research was eloquently described by Hornik and his colleagues (1989) , as follows: "The apparent ability of sufficiently elaborate feedforward networks to approximate quit.e well nearly any function encountered in applications leads one to wonder about the ultimate capabilities of such networks. Are the successes observed to da.te reflective of some deep and fundamental approximation capabilities, or are they merely flukes, resulting from selective reporting and a. fortuitous choice of problems?" Previous research on the approximation capabilities of feedforward networks can be found in Carroll and Dickinson, le Cun (1987)

Greedy layer-wise training of deep networks

Yoshua Bengio,Pascal Lamblin,Dan Popovici,Hugo LarochelleUniversit´e de Montr´e alMontr´e al,Qu´e bec{bengioy,lamblinp,popovicd,larocheh}@iro.umontreal.caAbstractComplexity theory of circuits strongly suggests that deep architectures can be muchmore efﬁcient(sometimes exponentially)than shallow architectures,in terms ofcomputational elements required to represent some functions.Deep multi-layerneural networks have many levels of non-linearities allowing them to compactlyrepresent highly non-linear and highly-varying functions.However,until recentlyit was not clear how to train such deep networks,since gradient-based optimizationstarting from random initialization appears to often get stuck in poor solutions.Hin-ton et al.recently introduced a greedy layer-wise unsupervised learning algorithmfor Deep Belief Networks(DBN),a generative model with many layers of hiddencausal variables.In the context of the above optimization problem,we study this al-gorithm empirically and explore variants to better understand its success and extendit to cases where the inputs are continuous or where the structure of the input dis-tribution is not revealing enough about the variable to be predicted in a supervisedtask.Our experiments also conﬁrm the hypothesis that the greedy layer-wise unsu-pervised training strategy mostly helps the optimization,by initializing weights in aregion near a good local minimum,giving rise to internal distributed representationsthat are high-level abstractions of the input,bringing better generalization.1IntroductionRecent analyses(Bengio,Delalleau,&Le Roux,2006;Bengio&Le Cun,2007)of modern non-parametric machine learning algorithms that are kernel machines,such as Support Vector Machines (SVMs),graph-based manifold and semi-supervised learning algorithms suggest fundamental limita-tions of some learning algorithms.The problem is clear in kernel-based approaches when the kernel is“local”(e.g.,the Gaussian kernel),i.e.,K(x,y)converges to a constant when||x−y||increases. These analyses point to the difﬁculty of learning“highly-varying functions”,i.e.,functions that have a large number of“variations”in the domain of interest,e.g.,they would require a large number of pieces to be well represented by a piecewise-linear approximation.Since the number of pieces can be made to grow exponentially with the number of factors of variations in the input,this is connected with the well-known curse of dimensionality for classical non-parametric learning algorithms(for regres-sion,classiﬁcation and density estimation).If the shapes of all these pieces are unrelated,one needs enough examples for each piece in order to generalize properly.However,if these shapes are related and can be predicted from each other,“non-local”learning algorithms have the potential to generalize to pieces not covered by the training set.Such ability would seem necessary for learning in complex domains such as Artiﬁcial Intelligence tasks(e.g.,related to vision,language,speech,robotics). Kernel machines(not only those with a local kernel)have a shallow architecture,i.e.,only two levels of data-dependent computational elements.This is also true of feedforward neural networks with a single hidden layer(which can become SVMs when the number of hidden units becomes large(Bengio,Le Roux,Vincent,Delalleau,&Marcotte,2006)).A serious problem with shallow architectures is that they can be very inefﬁcient in terms of the number of computational units(e.g., bases,hidden units),and thus in terms of required examples(Bengio&Le Cun,2007).One way to represent a highly-varying function compactly(with few parameters)is through the composition of many non-linearities,i.e.,with a deep architecture.For example,the parity function with d inputs requires O(2d)examples and parameters to be represented by a Gaussian SVM(Bengio et al.,2006), O(d2)parameters for a one-hidden-layer neural network,O(d)parameters and units for a multi-layer network with O(logd)layers,and O(1)parameters with a recurrent neural network.More generally,2boolean functions(such as the function that computes the multiplication of two numbers from their d-bit representation)expressible by O(log d)layers of combinatorial logic with O(d)elements in each layer may require O(2d)elements when expressed with only2layers(Utgoff&Stracuzzi,2002; Bengio&Le Cun,2007).When the representation of a concept requires an exponential number of elements,e.g.,with a shallow circuit,the number of training examples required to learn the concept may also be impractical.Formal analyses of the computational complexity of shallow circuits can be found in(Hastad,1987)or(Allender,1996).They point in the same direction:shallow circuits are much less expressive than deep ones.However,until recently,it was believed too difﬁcult to train deep multi-layer neural networks.Empiri-cally,deep networks were generally found to be not better,and often worse,than neural networks with one or two hidden layers(Tesauro,1992).As this is a negative result,it has not been much reported in the machine learning literature.A reasonable explanation is that gradient-based optimization starting from random initialization may get stuck near poor solutions.An approach that has been explored with some success in the past is based on constructively adding layers.This was previously done using a supervised criterion at each stage(Fahlman&Lebiere,1990;Lengell´e&Denoeux,1996).Hinton, Osindero,and Teh(2006)recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks(DBN),a generative model with many layers of hidden causal variables.The training strategy for such networks may hold great promise as a principle to help address the problem of training deep networks.Upper layers of a DBN are supposed to represent more“abstract”concepts that explain the input observation x,whereas lower layers extract“low-level features”from x.They learn simpler conceptsﬁrst,and build on them to learn more abstract concepts.This strategy,studied in detail here,has not yet been much exploited in machine learning.We hypothesize that three aspects of this strategy are particularly important:ﬁrst,pre-training one layer at a time in a greedy way;sec-ond,using unsupervised learning at each layer in order to preserve information from the input;and ﬁnally,ﬁne-tuning the whole network with respect to the ultimate criterion of interest.Weﬁrst extend DBNs and their component layers,Restricted Boltzmann Machines(RBM),so that they can more naturally handle continuous values in input.Second,we perform experiments to better understand the advantage brought by the greedy layer-wise unsupervised learning.The basic question to answer is whether or not this approach helps to solve a difﬁcult optimization problem.In DBNs, RBMs are used as building blocks,but applying this same strategy using auto-encoders yielded similar results.Finally,we discuss a problem that occurs with the layer-wise greedy unsupervised procedure when the input distribution is not revealing enough of the conditional distribution of the target variable given the input variable.We evaluate a simple and successful solution to this problem.2Deep Belief NetsLet x be the input,and g i the hidden variables at layer i,with joint distributionP(x,g1,g2,...,g )=P(x|g1)P(g1|g2)···P(g −2|g −1)P(g −1,g ),where all the conditional layers P(g i|g i+1)are factorized conditional distributions for which compu-tation of probability and sampling are easy.In Hinton et al.(2006)one considers the hidden layer g i a binary random vector with n i elements g ij:P(g i|g i+1)=n ij=1P(g i j|g i+1)with P(g i j=1|g i+1)=sigm(b i j+n i+1 k=1W i kj g i+1k)(1)where sigm(t)=1/(1+e−t),the b ij are biases for unit j of layer i,and W i is the weight matrix forlayer i.If we denote g0=x,the generative model for theﬁrst layer P(x|g1)also follows(1).2.1Restricted Boltzmann machinesThe top-level prior P(g −1,g )is a Restricted Boltzmann Machine(RBM)between layer −1 and layer .To lighten notation,consider a generic RBM with input layer activations v(for visi-ble units)and hidden layer activations h(for hidden units).It has the following joint distribution: P(v,h)=1The layer-to-layer conditionals associated with the RBM factorize like in(1)and give rise to P(v k=1|h)=sigm(b k+ j W jk h j)and Q(h j=1|v)=sigm(c j+ k W jk v k).2.2Gibbs Markov chain and log-likelihood gradient in an RBMTo obtain an estimator of the gradient on the log-likelihood of an RBM,we consider a Gibbs Markov chain on the(visible units,hidden units)pair of variables.Gibbs sampling from an RBM proceeds by sampling h given v,then v given h,etc.Denote v t for the t-th v sample from that chain,starting at t=0with v0,the“input observation”for the RBM.Therefore,(v k,h k)for k→∞is a sample from the joint P(v,h).The log-likelihood of a value v0under the model of the RBM islog P(v0)=log h P(v0,h)=log h e−energy(v0,h)−log v,h e−energy(v,h)and its gradient with respect toθ=(W,b,c)is∂log P(v0)∂θ+ v k,h k P(v k,h k)∂energy(v k,h k)∂θ+E hk ∂energy(v k,h k)(ﬁtting p )will yield improvement on the training criterion for the previous layer (likelihood with respect to p −1).The greedy layer-wise training algorithm for DBNs is quite simple,as illustrated by the pseudo-code in Algorithm TrainUnsupervisedDBN of the Appendix.2.4Supervised ﬁne-tuningAs a last training stage,it is possible to ﬁne-tune the parameters of all the layers together.For exam-ple Hinton et al.(2006)propose to use the wake-sleep algorithm (Hinton,Dayan,Frey,&Neal,1995)to continue unsupervised training.Hinton et al.(2006)also propose to optionally use a mean-ﬁeld ap-proximation of the posteriors P (g i |g 0),by replacing the samples g i −1j at level i −1by their bit-wisemean-ﬁeld expected value µi −1j ,with µi =sigm(b i +W i µi −1).According to these propagation rules,the whole network now deterministically computes internal representations as functions of the network input g 0=x .After unsupervised pre-training of the layers of a DBN following Algorithm TrainUnsupervisedDBN (see Appendix)the whole network can be further optimized by gradient descent with respect to any deterministically computable training criterion that depends on these rep-resentations.For example,this can be used (Hinton &Salakhutdinov,2006)to ﬁne-tune a very deep auto-encoder,minimizing a reconstruction error.It is also possible to use this as initialization of all except the last layer of a traditional multi-layer neural network,using gradient descent to ﬁne-tune the whole network with respect to a supervised training criterion.Algorithm DBNSupervisedFineTuning in the appendix contains pseudo-code for supervised ﬁne-tuning,as part of the global supervised learning algorithm TrainSupervisedDBN .Note that better results were obtained when using a 20-fold larger learning rate with the supervised criterion (here,squared error or cross-entropy)updates than in the contrastive divergence updates.3Extension to continuous-valued inputsWith the binary units introduced for RBMs and DBNs in Hinton et al.(2006)one can “cheat”and handle continuous-valued inputs by scaling them to the (0,1)interval and considering each input con-tinuous value as the probability for a binary random variable to take the value 1.This has worked well for pixel gray levels,but it may be inappropriate for other kinds of input variables.Previous work on continuous-valued input in RBMs include (Chen &Murray,2003),in which noise is added to sigmoidal units,and the RBM forms a special form of Diffusion Network (Movellan,Mineiro,&Williams,2002).We concentrate here on simple extensions of the RBM framework in which only the energy function and the allowed range of values are changed.Linear energy:exponential or truncated exponentialConsider a unit with value y of an RBM,connected to units z of the other layer.p (y |z )can be obtained from the terms in the exponential that contain y ,which can be grouped in ya (z )for linear energy functions as in (2),where a (z )=b +w z with b the bias of unit y ,and w the vector of weights connecting unit y to units z .If we allow y to take any value in interval I ,the conditional densityof y becomes p (y |z )=exp (ya (z ))1y ∈Ia (z ).The conditional expectation of u given z is interesting becauseit has a sigmoidal-like saturating and monotone non-linearity:E [y |z ]=1a (z ).A sampling from the truncated exponential is easily obtained from a uniform sample U ,using the inverse cumulative F −1of the conditional density y |z :F −1(U )=log(1−U ×(1−exp (a (z ))))c l a s s i f i c a t i o n e r r o r o n t r a i n i n g s e tFigure 1:Training classiﬁcation error vs training iteration,on the Cotton price task,for deep net-work without pre-training,for DBN with unsuper-vised pre-training,and DBN with partially super-vised pre-training.Illustrates optimization difﬁ-culty of deep networks and advantage of partially supervised training.AbaloneCotton train.valid.test.train.valid.test.2.Logistic regression···44.0%42.6%45.0%4.DBN,binomial inputs,partially supervised 4.39 4.45 4.2843.3%41.1%43.7%6.DBN,Gaussian inputs,partially supervised4.234.434.1827.5%28.4%31.4%Table 1:Mean squared prediction error on Abalone task and classiﬁcation error on Cotton task,showing improvement with Gaussian units.this case the variance is unconditional,whereas the mean depends on the inputs of the unit:for a unit y with inputs z and inverse variance d 2,E [y |z ]=a (z )Training each layer as an auto-encoderWe want to verify that the layer-wise greedy unsupervised pre-training principle can be applied when using an auto-encoder instead of the RBM as a layer building block.Let x be the input vector with x i∈(0,1).For a layer with weights matrix W,hidden biases column vector b and input biases column vector c,the reconstruction probability for bit i is p i(x),with the vector of proba-bilities p(x)=sigm(c+W sigm(b+W x)).The training criterion for the layer is the average of negative log-likelihoods for predicting x from p(x).For example,if x is interpreted either as a sequence of bits or a sequence of bit probabilities,we minimize the reconstruction cross-entropy: R=− i x i log p i(x)+(1−x i)log(1−p i(x)).We report several experimental results using this training criterion for each layer,in comparison to the contrastive divergence algorithm for an RBM. Pseudo-code for a deep network obtained by training each layer as an auto-encoder is given in Ap-pendix(Algorithm TrainGreedyAutoEncodingDeepNet).One question that arises with auto-encoders in comparison with RBMs is whether the auto-encoders will fail to learn a useful representation when the number of units is not strictly decreasing from one layer to the next(since the networks could theoretically just learn to be the identity and perfectly min-imize the reconstruction error).However,our experiments suggest that networks with non-decreasing layer sizes generalize well.This might be due to weight decay and stochastic gradient descent,prevent-ing large weights:optimization falls in a local minimum which corresponds to a good transformation of the input(that provides a good initialization for supervised training of the whole net). Greedy layer-wise supervised trainingA reasonable question to ask is whether the fact that each layer is trained in an unsupervised way is critical or not.An alternative algorithm is supervised,greedy and layer-wise:train each new hidden layer as the hidden layer of a one-hidden layer supervised neural network NN(taking as input the output of the last of previously trained layers),and then throw away the output layer of NN and use the parameters of the hidden layer of NN as pre-training initialization of the new top layer of the deep net, to map the output of the previous layers to a hopefully better representation.Pseudo-code for a deep network obtained by training each layer as the hidden layer of a supervised one-hidden-layer neural network is given in Appendix(Algorithm TrainGreedySupervisedDeepNet). Experiment2.We compared the performance on the MNIST digit classiﬁcation task obtained withﬁve algorithms: (a)DBN,(b)deep network whose layers are initialized as auto-encoders,(c)above described su-pervised greedy layer-wise algorithm to pre-train each layer,(d)deep network with no pre-training (random initialization),(e)shallow network(1hidden layer)with no pre-training.Theﬁnalﬁne-tuning is done by adding a logistic regression layer on top of the network and train-ing the whole network by stochastic gradient descent on the cross-entropy with respect to the target classiﬁcation.The networks have the following architecture:784inputs,10outputs,3hidden layers with variable number of hidden units,selected by validation set performance(typically selected layer sizes are between500and1000).The shallow network has a single hidden layer.An L2weight decay hyper-parameter is also optimized.The DBN was slower to train and less experiments were performed,so that longer training and more appropriately chosen sizes of layers and learning rates could yield better results(Hinton2006,unpublished,reports1.15%error on the MNIST test set).Experiment2Experiment3train.valid.test train.valid.test DBN,unsupervised pre-training0% 1.2% 1.2%0% 1.5% 1.5%Deep net,auto-associator pre-training0% 1.4% 1.4%0% 1.4% 1.6%Deep net,supervised pre-training0% 1.7% 2.0%0% 1.8% 1.9%Deep net,no pre-training.004% 2.1% 2.4%.59% 2.1% 2.2%Shallow net,no pre-training.004% 1.8% 1.9% 3.6% 4.7% 5.0%pre-training)or a shallow network,and that,without pre-training,deep networks tend to perform worse than shallow networks.The results also suggest that unsupervised greedy layer-wise pre-training can perform signiﬁcantly better than purely supervised greedy layer-wise pre-training.A possible expla-nation is that the greedy supervised procedure is too greedy:in the learned hidden units representation it may discard some of the information about the target,information that cannot be captured easily by a one-hidden-layer neural network but could be captured by composing more hidden layers. Experiment3However,there is something troubling in the Experiment2results(Table2):all the networks,even those without greedy layer-wise pre-training,perform almost perfectly on the training set,which would appear to contradict the hypothesis that the main effect of the layer-wise greedy strategy is to help the optimization(with poor optimization one would expect poor training error).A possible explanation coherent with our initial hypothesis and with the above results is captured by the following hypothesis.Without pre-training,the lower layers are initialized poorly,but still allowing the top two layers to learn the training set almost perfectly,because the output layer and the last hidden layer form a standard shallow but fat neural network.Consider the top two layers of the deep network with pre-training:it presumably takes as input a better representation,one that allows for better generalization.Instead,the network without pre-training sees a“random”transformation of the input, one that preserves enough information about the input toﬁt the training set,but that does not help to generalize.To test that hypothesis,we performed a second series of experiments in which we constrain the top hidden layer to be small(20hidden units).The Experiment3results(Table2)clearly conﬁrm our hypothesis.With no pre-training,training error degrades signiﬁcantly when there are only20 hidden units in the top hidden layer.In addition,the results obtained without pre-training were found to have extremely large variance indicating high sensitivity to initial conditions.Overall,the results in the tables and in Figure1are consistent with the hypothesis that the greedy layer-wise procedure essentially helps to better optimize the deep networks,probably by initializing the hidden layers so that they represent more meaningful representations of the input,which also yields to better generalization. Continuous training of all layers of a DBNWith the layer-wise training algorithm for DBNs(TrainUnsupervisedDBN in Appendix),one element that we would like to dispense with is having to decide the number of training iterations for each layer.It would be good if we did not have to explicitly add layers one at a time,i.e.,if we could train all layers simultaneously,but keeping the“greedy”idea that each layer is pre-trained to model its input,ignoring the effect of higher layers.To achieve this it is sufﬁcient to insert a line in TrainUnsupervisedDBN,so that RBMupdate is called on all the layers and the stochastic hidden values are propagated all the way up.Experiments with this variant demonstrated that it works at least as well as the original algorithm.The advantage is that we can now have a single stopping criterion(for the whole network).Computation time is slightly greater,since we do more computations initially(on the upper layers),which might be wasted(before the lower layers converge to a decent representation),but time is saved on optimizing hyper-parameters.This variant may be more appealing for on-line training on very large data-sets,where one would never cycle back on the training data. 5Dealing with uncooperative input distributionsIn classiﬁcation problems such as MNIST where classes are well separated,the structure of the input distribution p(x)naturally contains much information about the target variable y.Imagine a super-vised learning task in which the input distribution is mostly unrelated with y.In regression problems, which we are interested in studying here,this problem could be much more prevalent.For example imagine a task in which x∼p(x)and the target y=f(x)+noise(e.g.,p is Gaussian and f=sinus) with no particular relation between p and f.In such settings we cannot expect the unsupervised greedy layer-wise pre-training procedure to help in training deep supervised networks.To deal with such uncooperative input distributions,we propose to train each layer with a mixed training criterion that combines the unsupervised objective(modeling or reconstructing the input)and a supervised ob-jective(helping to predict the target).A simple algorithm thus adds the updates on the hidden layer weights from the unsupervised algorithm(Contrastive Divergence or reconstruction error gradient) with the updates from the gradient on a supervised prediction error,using a temporary output layer,as with the greedy layer-wise supervised training algorithm.In our experiments it appeared sufﬁcient to perform that partial supervision with theﬁrst layer only,since once the predictive information about the target is“forced”into the representation of theﬁrst layer,it tends to stay in the upper layers.The results in Figure1and Table1clearly show the advantage of this partially supervised greedy trainingalgorithm,in the case of theﬁnancial dataset.Pseudo-code for partially supervising theﬁrst(or later layer)is given in Algorithm TrainPartiallySupervisedLayer(in the Appendix).6ConclusionThis paper is motivated by the need to develop good training algorithms for deep architectures,since these can be much more representationally efﬁcient than shallow ones such as SVMs and one-hidden-layer neural nets.We study Deep Belief Networks applied to supervised learning tasks,and the prin-ciples that could explain the good performance they have yielded.The three principal contributions of this paper are the following.First we extended RBMs and DBNs in new ways to naturally handle continuous-valued inputs,showing examples where much better predictive models can thus be ob-tained.Second,we performed experiments which support the hypothesis that the greedy unsupervised layer-wise training strategy helps to optimize deep networks,but suggest that better generalization is also obtained because this strategy initializes upper layers with better representations of relevant high-level abstractions.These experiments suggest a general principle that can be applied beyond DBNs, and we obtained similar results when each layer is initialized as an auto-associator instead of as an RBM.Finally,although we found that it is important to have an unsupervised component to train each layer(a fully supervised greedy layer-wise strategy performed worse),we studied supervised tasks in which the structure of the input distribution is not revealing enough of the conditional density of y given x.In that case the DBN unsupervised greedy layer-wise strategy appears inadequate and we proposed a simpleﬁx based on partial supervision,that can yield signiﬁcant improvements. ReferencesAllender,E.(1996).Circuit complexity before the dawn of the new millennium.In16th Annual Conference on Foundations of Software Technology and Theoretical Computer Science,pp.1–18.Lecture Notes in Computer Science1180.Bengio,Y.,Delalleau,O.,&Le Roux,N.(2006).The curse of highly variable functions for local kernel machines.In Weiss,Y.,Sch¨o lkopf,B.,&Platt,J.(Eds.),Advances in Neural Information Processing Systems18,pp.107–114.MIT Press,Cambridge,MA.Bengio,Y.,&Le Cun,Y.(2007).Scaling learning algorithms towards AI.In Bottou,L.,Chapelle,O.,DeCoste,D.,&Weston,J.(Eds.),Large Scale Kernel Machines.MIT Press.Bengio,Y.,Le Roux,N.,Vincent,P.,Delalleau,O.,&Marcotte,P.(2006).Convex neural networks.In Weiss,Y.,Sch¨o lkopf,B.,&Platt,J.(Eds.),Advances in Neural Information Processing Systems18,pp.123–130.MIT Press,Cambridge,MA.Chen,H.,&Murray,A.(2003).A continuous restricted boltzmann machine with an implementable training algorithm.IEE Proceedings of Vision,Image and Signal Processing,150(3),153–158. Fahlman,S.,&Lebiere,C.(1990).The cascade-correlation learning architecture.In Touretzky,D.(Ed.), Advances in Neural Information Processing Systems2,pp.524–532Denver,CO.Morgan Kaufmann, San Mateo.Hastad,J.T.(1987).Computational Limitations for Small Depth Circuits.MIT Press,Cambridge,MA. Hinton,G.E.,Osindero,S.,&Teh,Y.(2006).A fast learning algorithm for deep belief nets.Neural Computa-tion,18,1527–1554.Hinton,G.(2002).Training products of experts by minimizing contrastive divergence.Neural Computation, 14(8),1771–1800.Hinton,G.,Dayan,P.,Frey,B.,&Neal,R.(1995).The wake-sleep algorithm for unsupervised neural networks.Science,268,1558–1161.Hinton,G.,&Salakhutdinov,R.(2006).Reducing the dimensionality of data with neural networks.Science, 313(5786),504–507.Lengell´e,R.,&Denoeux,T.(1996).Training MLPs layer by layer using an objective function for internal representations.Neural Networks,9,83–97.Movellan,J.,Mineiro,P.,&Williams,R.(2002).A monte-carlo EM approach for partially observable diffusion processes:theory and applications to neural networks.Neural Computation,14,1501–1544. Tesauro,G.(1992).Practical issues in temporal difference learning.Machine Learning,8,257–277. Utgoff,P.,&Stracuzzi,D.(2002).Many-layered learning.Neural Computation,14,2497–2539. Welling,M.,Rosen-Zvi,M.,&Hinton,G.E.(2005).Exponential family harmoniums with an application to information retrieval.In Advances in Neural Information Processing Systems,V ol.17Cambridge,MA.MIT Press.。

MIT6_254S10_lec18

according to the conditional distribution p (θ −i | θ i ).
4
Game Theory: Lecture 18
Bayesian Games
Bayesian Nash Equilibria
Deﬁnition (Bayesian Nash Equilibrium) The strategy proﬁle s (·) is a (pure strategy) Bayesian Nash equilibrium if for all i ∈ I and for all θ i ∈ Θi , we have that si (θ i ) ∈ arg max ∑ p (θ −i | θ i )ui (si� , s−i (θ −i ), θ i , θ −i ), �
Private value auctions: valuation of each agent is independent of others’ valuations; Common value auctions: the object has a potentially common value, and each individual’s signal is imperfectly correlated with this common value.
Asu Ozdaglar
MIT
April 22, 2010
1
Game Theory: Lecture 18
Introduction
Outline
Bayesian Nash Equilibria. Auctions. Extensive form games of incomplete information. Perfect Bayesian (Nash) Equilibria.

MIT6_004s09_lec01

Easy-to-use Ecient Reliable Secure …
6.004 – Spring 2009
Low-level physical representations High-level symbols and sequences of symbols
2/3/09 L01 - Basics of Information 7 6.004 – Spring 2009
批注本地保存成功开通会员云端永久保存去开通
MIT OpenCourseWare
6.004 Computation Structures
Spring 2009
For information about citing these materials or our Terms of Use, visit: /terms.
2/3/09 L01 - Basics of Information 8
Quantifying Information
(Claude Shannon, 1948)
Encoding
Encoding describes the process of assigning representations to information Choosing an appropriate and ecient encoding is a real engineering challenge Impacts design at many levels - Mechanism (devices, # of components used) - Eciency (bits used) - Reliability (noise) - Security (encryption) Next lecture: encoding a bit. What about longer messages?

自适应mit方案

自适应MIT方案引言在软件开发过程中，如何确保软件能够适应不同的环境和设备成为了一个重要的课题。

自适应性是指软件能够根据不同的设备、操作系统或网络环境自动进行调整和适配，以达到最佳的用户体验。

本文将介绍一种自适应MIT方案，通过使用MIT（多目标迭代训练）算法来实现软件的自适应。

什么是MITMIT（多目标迭代训练）算法是一种集合了多目标优化和迭代训练的方法。

它通过对多个目标进行优化，并使用迭代训练来逐步改进模型，以达到最佳的自适应效果。

在软件开发中，MIT算法可以用于自动化调整和优化软件的响应速度、界面布局、数据传输等方面，以适应不同的设备和环境。

实施自适应MIT方案的步骤以下是实施自适应MIT方案的基本步骤：1.识别关键的适应性需求：首先需要明确软件需要适应的关键需求，例如响应速度、界面布局等。

2.收集训练数据：根据关键需求，收集软件在不同环境和设备上的性能数据。

这些数据可以包括响应时间、网络延迟等。

3.定义适应性目标：根据收集的训练数据，定义适应性目标。

例如，可以将响应时间控制在某个范围内，确保界面布局在不同设备上能够自动适配等。

4.设计MIT算法：根据适应性目标，设计MIT算法。

MIT算法可以包括多个目标函数、适应性规则和迭代训练策略。

5.实施与调试：根据设计的MIT算法，实施自适应功能，并进行调试和优化。

在这个过程中，需要不断地收集性能数据，与目标进行对比，并对算法进行调整与改进。

6.测试与评估：完成自适应功能的实施后，进行测试和评估。

可以通过对比不同环境和设备上的性能指标，以及用户的反馈来评估自适应功能的效果。

MIT算法的优势使用MIT算法实现软件的自适应具有以下优势：1.灵活性：MIT算法可以根据不同的适应性需求进行调整和优化，以适应不同的场景和环境。

2.自动化：MIT算法可以实现自动化的调整和优化过程，减少了人工干预的需求。

3.演化性：MIT算法通过迭代训练来不断改进模型，逐步提高软件的自适应能力。

mitl标准

mitl标准
MITL（Model-based Test Language）是一种基于模型的测试语言，它提供了一种用于描述和执行测试用例的方法。

MITL标准是一种用于编写和执行测试用例的规范和标准，它提供了一组标准的测试语言元素和语法规则，用于描述测试用例的输入、输出和执行过程。

MITL标准的核心是使用基于模型的描述语言来编写测试用例。

这种语言可以描述被测试系统的行为和属性，以及测试用例的输入和期望输出。

MITL标准包括一组标准的测试语言元素，如测试步骤、条件、循环、变量等，以及一组标准的测试语言语法规则，用于组合和构建测试用例。

MITL标准的应用范围非常广泛，可以用于各种不同类型的应用程序和系统的测试，包括软件、硬件、网络等。

使用MITL标准可以简化测试用例的编写和执行过程，提高测试的准确性和效率，同时也可以提高测试的可维护性和可重用性。

cron experssion must consist of 6 -回复

cron experssion must consist of 6 -回复什么是Cron 表达式？为什么它必须由6个部分组成？在计算机编程中，Cron 是一种任务调度工具，用于在指定的时间间隔内执行预定的任务。

Cron 表达式是用于定义这些任务执行时间的字符串表示形式。

每个Cron 表达式由6个部分组成，分别代表秒、分钟、小时、日期、月份和星期几。

本文将一步一步地解释Cron 表达式的每个部分以及为什么它们必须由6个部分组成。

第一个部分：秒（0-59）Cron 表达式的第一个部分表示任务执行的秒数。

取值范围为0到59，其中0表示每分钟的第一秒，59表示每分钟的最后一秒。

这个部分可以用逗号分隔来指定多个秒数，也可以使用“*”表示每秒执行。

第二个部分：分钟（0-59）第二个部分表示任务执行的分钟数。

取值范围也是0到59。

与第一个部分类似，这里也可以使用逗号分隔或者使用“*”代表每分钟执行。

第三个部分：小时（0-23）第三个部分表示任务执行的小时数。

取值范围为0到23，其中0表示午夜零点，23表示晚上11点。

同样地，逗号和星号也可以用于指定不同的小时数。

第四个部分：日期（1-31）第四个部分表示任务执行的日期。

取值范围为1到31，代表一个月中的具体某一天。

通过逗号和星号来指定日期。

第五个部分：月份（1-12或JAN-DEC）第五个部分表示任务执行所在的月份。

取值范围有两种形式：1-12，也可以使用英文缩写JAN 到DEC。

如果使用逗号分隔，就可以指定多个月份。

第六个部分：星期几（0-7或SUN-SAT）第六个部分表示任务执行所在的星期几。

取值范围也有两种形式：0-7，也可以使用英文缩写SUN 到SAT。

其中，0和7都代表星期日，1代表星期一，以此类推。

同样地，逗号可用于指定多个星期几。

以上就是Cron 表达式的6个部分。

它们按照从左到右的顺序依次描述了任务执行的时间。

通过将这6个部分组合在一起，就能够精确地指定任务的执行时间。

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

Massachusetts Institute of TechnologyDepartment of Electrical Engineering&Computer Science6.041/6.431:Probabilistic Systems Analysis155(Quiz Ϯ | Fall 2009)(Quiz Ϯ | Fall 2009)Problem2.(42points)The random variable X is exponential with parameter1.Given the value x of X,the random variable Y is exponential with parameter equal to x(and mean1/x).Note:Some useful integrals,forλ>0:∞∞122xe−λx dx=,x e−λx dx=.λ2λ300(a)(7points)Find the joint PDF of X and Y.(b)(7points)Find the marginal PDF of Y.(c)(7points)Find the conditional PDF of X,given that Y=2.(d)(7points)Find the conditional expectation of X,given that Y=2.(e)(7points)Find the conditional PDF of Y,given that X=2and Y≥3.2X(f)(7points)Find the PDF of e.Problem3.(10points)For the following questions,mark the correct answer.If you get it right,you receive5points for that question. You receive no credit if you get it wrong.A justiﬁcation is not required and will not be taken into account.Let X and Y be continuous random variables.Let N be a discrete random variable.(a)(5points)The quantity E[X|Y]is always:(i)A number.(ii)A discrete random variable.(iii)A continuous random variable.(iv)Not enough information to choose between(i)-(iii).(b)(5points)The quantity E[E[X|Y,N]|N]is always:(i)A number.(ii)A discrete random variable.(iii)A continuous random variable.(iv)Not enough information to choose between(i)-(iii).Problem4.(25points)The probability of obtaining heads in a singleﬂip of a certain coin is itself a random variable,denoted by Q, which is uniformly distributed in[0,1].Let X=1if the coinﬂip results in heads,and X=0if the coinﬂip results in tails.(a)(i)(5points)Find the mean of X.(ii)(5points)Find the variance of X.(b)(7points)Find the covariance of X and Q.(c)(8points)Find the conditional PDF of Q given that X=1.(Quiz Ϯ | Fall 2009)Problem5.(21points)Let X and Y be independent continuous random variables with marginal PDFs f X and f Y,and marginal CDFs F X and F Y,respectively.LetS=min{X,Y},L=max{X,Y}.(a)(7points)If X and Y are standard normal,ﬁnd the probability that S≥1.(b)(7points)Fix some s and£with s≤£.Give a formula forP(s≤S and L≤£)involving F X and F Y,and no integrals.(c)(7points)Assume that s≤s+δ≤£.Give a formula forP(s≤S≤s+δ,£≤L≤£+δ),as an integral involving f X and f Y.Each question is repeated in the following pages.Please write your answer onthe appropriate page.Problem 2. (42 points)The random variable X is exponential with parameter 1. Given the value x of X , the random variable Y is exponential with parameter equal to x (and mean 1/x ). Note: Some useful integrals, for λ> 0:(a) (7 points) Find the joint PDF of X and Y .(b) (7 points) Find the marginal PDF of Y .∞xe −λx1dx =0∞,λ2 x 2e −λx 2dx =0(Quiz Ϯ | Fall 2009).λ3(Quiz Ϯ | Fall 2009)(c)(7points)Find the conditional PDF of X,given that Y=2.(d)(7points)Find the conditional expectation of X,given that Y=2.(Quiz Ϯ | Fall 2009)(e)(7points)Find the conditional PDF of Y,given that X=2and Y≥3.2X(f)(7points)Find the PDF of e.(Quiz Ϯ | Fall 2009)Problem3.(10points)For the following questions,mark the correct answer.If you get it right,you receive5points for that question. You receive no credit if you get it wrong.A justiﬁcation is not required and will not be taken into account.Let X and Y be continuous random variables.Let N be a discrete random variable.(a)(5points)The quantity E[X|Y]is always:(i)A number.(ii)A discrete random variable.(iii)A continuous random variable.(iv)Not enough information to choose between(i)-(iii).(b)(5points)The quantity E[E[X|Y,N]|N]is always:(i)A number.(ii)A discrete random variable.(iii)A continuous random variable.(iv)Not enough information to choose between(i)-(iii).Problem4.(25points)The probability of obtaining heads in a singleﬂip of a certain coin is itself a random variable,denoted by Q, which is uniformly distributed in[0,1].Let X=1if the coinﬂip results in heads,and X=0if the coinﬂip results in tails.(a)(i)(5points)Find the mean of X.(Quiz Ϯ | Fall 2009) (ii)(5points)Find the variance of X.(b)(7points)Find the covariance of X and Q.(Quiz Ϯ | Fall 2009) (c)(8points)Find the conditional PDF of Q given that X=1.(Quiz Ϯ | Fall 2009)Problem5.(21points)Let X and Y be independent continuous random variables with marginal PDFs f X and f Y,and marginal CDFs F X and F Y,respectively.LetS=min{X,Y},L=max{X,Y}.(a)(7points)If X and Y are standard normal,ﬁnd the probability that S≥1.Massachusetts Institute of TechnologyDepartment of Electrical Engineering&Computer Science6.041/6.431:Probabilistic Systems Analysis(Quiz Ϯ | Fall 2009)(b)(7points)Fix some s and£with s≤£.Give a formula forP(s≤S and L≤£)involving F X and F Y,and no integrals.(c)(7points)Assume that s≤s+δ≤£.Give a formula forP(s≤S≤s+δ,£≤L≤£+δ), as an integral involving f X and f Y.12MIT OpenCourseWare6.041SC Probabilistic Systems Analysis and Applied ProbabilityFall 2013For information about citing these materials or our Terms of Use, visit: /terms.。