On MCMC sampling in hierarchical longitudinal models
兰花的遗传结构和进化
Evolution through either natural selection or genetic drift is dependent on variation at the genetic and mor-phological levels. Processes that influence the genetic structure of populations include mating systems, effective population size, mutation rates and gene flow among populations. We investigated the patterns of population genetic structure of orchids and evaluated if evolutionary processes are more likely at the indi-vidual population level than at the multipopulation/species level. We hypothesized that because orchid populations are frequently small and reproductive success is often skewed, we should observe many orchids with high population genetic substructure suggesting limited gene flow among pop-ulations. If limited gene flow among populations is a common pattern in orchids, then it may well be an important component that affects the likelihood of genetic drift and selection at the local population level. Such changes may lead to differentiation and evolu-tionary diversification.A main component in evolutionary processes is the necessary condition of isolation. The amount of gene flow among local populations will determine whether or not individual populations (demes) can evolve inde-pendently which may lead to cladogenesis. Usually one migrant per generation is sufficient to prevent populations from evolving independently from other populations when effective population sizes are large. Theoretically, if the gene flow rate, Nm (the effective number of migrants per generation; N = effective pop-ulation size, m = migration rate), is larger than two individuals per generation, then it is sufficient to pre-vent local adaptation while gene flow less than one per generation will likely result in population differen-tiation by selection or genetic drift (Merrell 1981, Roughgarden 1996). If Nm lies between one and two, there will be considerable variation in gene frequen-cies among populations (Merrell 1981). Consequently,populations will have similar genetic structure as if mating were panmictic (Nm >2). Alternatively, if gene flow is low (Nm < 1), populations will have different genetic structures that may result in evolutionary change through either adaptation to the local environ-ments via natural selection or through random effects such as genetic drift.Direct observation of gene flow can be viewed by the use of mark and recapture studies (for mobile organisms, or stained pollen) or tracking marker alle-les (paternity analysis) over a short number of genera-tions. Few orchid studies have attempted to directly observe gene flow and thus far only staining or micro-tagging pollinaria have been used (Peakall 1989, Nilsson et al.1992, Folsom 1994, Tremblay 1994, Salguero-Faría & Ackerman 1999). All these studies examined gene flow only within populations. Indirect methods for detecting gene flow are obtained from allele frequencies and are an estimate of the average long-term effect of genetic differentiation by genetic drift. The alleles are assumed to be neutral so that genetic differentiation based on these markers would be a consequence of drift rather than natural selection. Bohomak (1999) concluded that simple population genetic statistics are robust for inferring gene flow among groups of individuals.The most common approach is the degree of popula-tion differentiation at the genetic level using Wright’s F estimates on data obtained through protein elec-trophoresis or various PCR type approaches. The F statistics separate the amount of genetic variation which can be attributed to inbreeding among closely related individuals in a population: FIS is the inbreed-ing coefficient within individuals; FIT is the result of non random mating within a population and the effect of population subdivision; and a third statistic, FST, is the fixation index due to random genetic drift and the lack of panmixia among populations (Wright 1978).THE GENETIC STRUCTURE OF ORCHID POPULATIONSAND ITS EVO L U T I O N A R Y IMPORTA N C ER AYMOND L. T REMBLAY1,3&J AMES D. A CKERMAN21University of Puerto Rico – Humacao, Department of Biology, Humacao, Puerto Rico, 00791, U.S.A.2University of Puerto Rico – Río Piedras, Department of BiologyP.O. Box 23360, San Juan, Puerto Rico, 00931-3360, U.S.A.3Author for correspondence: raymond@LANKESTERIANA 7: 87-92. 2003.LANKESTERIANA SpeciesReferencesNm(W)Gst Calypso bulbosa (L.) Oakes Alexandersson & Ågren 2000 3.200.072Caladenia tentaculata TatePeakall & Beattie 19967.1010.0346Cephalanthera damasonium (Mill.) Druce Scacchi, De Angelis & Corbo 1991--5--5C ephalanthera longifolia (L.) Fritsch Scacchi, De Angelis & Corbo 1991 2.1510.104Cephalanthera rubra (L.) Rich.Scacchi, De Angelis & Corbo 19910.7610.247Cymbidium goeringii Rchb. f.Chung & Chung 1999 2.300.098Cypripedium acaule Ait.Case 19941.2710.164Cypripedium calceolus L.Case 1993, 1994 1.6310.196Cypripedium candidum Muhl. ex Willd.Case 19943.3710.069Cypripedium fasciculatum Kellogg ex S. Watson Aagaard, Harrod & Shea 1999 6.000.04Cypripedium kentuckiense C. F. Reed Case et al.1998 1.1210.182Cypripedium parviflorum Salisb.var. pubescens (Willd.) O. W. Knight Case et al.19981.2810.163Southern populations Wallace & Case 20000.940.209Northern populations1.570.137var. makasin (Farw.) Sheviak 1.000.199var parviflorum 1.430.149species level0.830.232Cypripedium reginae WalterCase 19940.4710.349Dactylorhiza romana (Sebastiani) SoóBullini et al.2001 3.3210.07Dactylorhiza sambucina (L.) SoóBullini et al.20011.3110.16Epidendrum conopseum R. Br.Bush, Kutz & Anderton 19991.4330.149Epipactis helleborine (L.) Crantz Scacchi, Lanzara & De Angelis 19877.310.033European populations Squirrell et al., 20011.0010.2000.241,40.5064North AmericanHollingsworth & Dickson 19970.09042.5310.2400.791Epipactis youngiana Richards & Porter Harris & Abbott 1997 2.4310.093Eulophia sinensis Miq.Sun & Wong 2001---0.00.1331,30.6533Gooyera procera Ker-Gawl.Wong & Sun 19990.22110.5230.3971,30.3863Gymnadenia conopsea (L.) R. Br.Scacchi & De Angelis 19900.28010.471Gymnadenia conopsea (L.) R. Br. conopsea Soliva & Widmer 19992.960.078Gymnadenia conopsea (L.) R. Br.subsp densiflora (Wahl) E.G. Camus & A. Camus Soliva & Widmer 19990.390.391Lepanthes caritensis Tremblay & Ackerman Carromero, Tremblay & Ackerman 1.300.167(unpublished)Lepanthes rupestris Stimson Tremblay & Ackerman 2001 1.840.170Lepanthes rubripetala Stimson Tremblay & Ackerman 20010.620.270Lepanthes eltoroensis Stimson Tremblay & Ackerman 20010.890.220Lepanthes sanguinea Hook.Carromero, Tremblay & Ackerman 1.450.144(unpublished)Table 1. Estimates of gene flow in orchids. Nm(W) = gene flow estimates based on Wright’s statistics; Gst coeff-cient of genic differentiation among populations. 1Nm calculated by the present authors from Gst or Fst using formula on p. 320 of Hartl & Clark (1989). 2Recalculated using previous formula, original Nm value 3.70. 3Calculated from RAPD markers. 4Calculated from cpDNA. 5No genetic differentiation found among populations. 6Calculated according to Weir and Cockerham’s statistics. 7. Estimated using RAPD’s and AMOVA.88Nº 7T REMBLAY&A CKERMAN- Genetic structure of orchid populationsConsequently, if we make the assumption that the genetic markers sampled are neutral or nearly neutral and that the observed level of FST is a measure of the current gene flow among populations (rather than a historical remnant), then we can evaluate the likelihood that populations are effectively isolated. The scale of FST is from 0 (no population subdivision) to 1.0 (com-plete genetic differentiation among populations).We gathered population genetic data for 58 species of terrestrial and epiphytic orchids from temperate and tropical species. The data are biased toward ter-restrial/temperate species (N = 44). We found only three studies of terrestrial/tropical species and ten epi-phytic/tropical. There is also a bias toward certain taxa: Orchis, Cypripedium, Pterostylis and Lepanthes account for nearly half (30) of the 61 records (Table 1), 10 species of O r c h i s, 7 species each of Cypripedium and Pterostylis, 6 species of Lepanthes,3 species of S p i r a n t h e s, Epipactis, Cephalantheraa n d G y m n a d e n i a, 2 species of D a c t y l o r h i z a, Epipactis, Vanilla and Zeuxine, and one species each of Caladenia, Calypso, Cymbidium, Epidendrum, Eulophia, Goodyera, Nigritella, Paphiopedilum, Platanthera, Tipularia, and Tolumnia.89Mayo 2003Gene flow among populations varies among species ranging from a high of 12 effective migrants per gen-eration in Orchis longicornu(Corrias et al. 1991) to lows of less then 0.2 in Zeuxine strateumatica(Sun & Wong 2001). Assembling the species in groups based on their estimates of gene flow, we note that 18 species have less then one migrant per generation, while 19 species have more than two migrants per generation, and 17 of the species have a migration rates between one and two. No genetic differentiation was found among populations for C e p h a l a n t h e r a d a m a s o n i u m(Scacchi, De Angelis & Corbo 1991) and Spiranthes hongkongensis(Sun 1996). Consequently these two species are excluded from further analysis.O r c h i s species typically have high estimates of gene flow among populations (Scacchi, De Angelis & Lanzara 1990, Corrias et al. 1991, Rossi et al. 1992) whereas Lepanthes and Pterostylis species have much lower gene flow estimates (Tremblay & Ackerman 2001, Sharma, Clements & Jones 2000; Sharma et al.2001). However even within a genus variation in gene flow can be extensive (Table 1).Are there phylogenetic associations with gene flow? The data for O r c h i s(mean Nm = 5.7), L e p a n t h e s(mean Nm = 2.1) and P t e r o s t y l i s( m e a n Nm = 1.0) are suggestive, but much more extensive sampling is needed for both temperate and tropical species. Curiously, L e p a n t h e s and O r c h i s have very different population genetic parameters yet both are species-rich genera and are likely in a state of evolu-tionary flux. It seems to us that orchids have taken more than one expressway to diversification. For the group of species which has more than 2 migrants per generation local populations will not evolve indepen-dently, but as a group, consequently local morpholog-ical and genetic differences among groups will be wiped out, and populations will become homoge-neous if gene flow continues at the level. When gene flow is high, selection studies from different popula-tions should be evaluated together (Fig. 1).For populations that have less than one migrant perLANKESTERIANAFigure 1: Distribution of mean (s.e.) gene flow (Nm) among genera of Orchids. Bars without error bars of single datap o i n t s.90Nº 7T REMBLAY&A CKERMAN- Genetic structure of orchid populationsgeneration, local populations can evolve independent-ly, and evolutionary studies should be done at the local level. In small populations, we may expect genetic drift to be present and selection coefficients should be high to counteract the effects of drift.For species with intermediate gene flow it is proba-bly wise to evaluate evolutionary processes at the local and multi-population/species level. We expect variance in migration rates to be large because of the skewed reproductive success among individuals, time periods and populations. Consequently, the outcome of the evolutionary process will likely depend on the amount and variation of the migration events and consistency in migration rates in time. If variance in gene flow through space and time is small, then the genetic dif-ferentiation will be more or less stable. But, for exam-ple, if variance in gene flow is high, with some periods having high gene flow followed by little or no gene flow for an extended period of time, it is possible that through natural selection and genetic drift local popula-tions might differentiate sufficiently for cladogenesis during the period of reduced immigration.Species with less than one migrant per population are basically unique evolutionary units evolving inde-pendently from other local populations. In popula-tions with large Ne (> 50), it is likely that natural selection will dominate evolutionary processes while if Ne is small (< 50) genetic drift and selection can both be responsible for evolution. Consequently for these species, local adaptation to specific environ-mental conditions is possible.This survey of population genetics studies of orchids shows that multiple evolutionary processes have likely been responsible for the remarkable diver-sification in orchids.L ITERATURE C ITEDAagaard J.E., R.J. Harrod & K.L. Shea. 1999. Genetic vari-ation among populations of the rare clustered lady-slip-per orchid (Cypripedium fasciculatum) from Washington State, USA. Nat. Areas J. 19: 234-238Ackerman J.D. & S. Ward. 1999. Genetic variation in a widespread epiphytic orchid: where is the evolutionary potential? Syst. Bot. 24: 282-291.Alexandersson, R. & J. Ågren. 2000. Genetic structure of the nonrewarding bumblebee pollinated Calypso bul-bosa. Heredity 85: 401-409Arduino, P., F. Verra, R. Cianchi, W. Rossi, B. Corrias, & L. Bullini. 1996. Genetic variation and natural hybridization between Orchis laxiflora and O r c h i s palustris(Orchidaceae). Pl. Syst. Evol. 202: 87-109. Arft, A.M. & T.A. Ranker. 1998. Allopolyploid origin and population genetics of the rare orchid Spiranthes diluvi-alis. Am. J. Bot. 85: 110-122.Bohomak, A.J. 1999. Dispersal, gene flow, and population structure. Quart. Rev. Biol. 74: 21-45.Bullini, L., R. Cianchi, P. Arduino, L. De Bonis, M. C. Mosco, A. Verdi, D. Porretta, B. Corrias & W. Rossi. 2001. Molecular evidence for allopolyploid speciation and a single origin of the western Mediterranean orchid Dactylorhiza insularis(Orchidaceae). Biol. J. Lin. Soc. 72: 193-201.Bush, S.T., W.E. Kutz & J.M. Anderton. 1999. RAPD variation in temperate populations of epiphytic orchid Epidendrum conopseum and the epiphytic fern Pleopeltis polypodioides. Selbyana 20: 120-124. Case, M.A. 1993. High levels of allozyme variation within Cypripedium calceolus(Orchidaceae) and low levels of divergence among its varieties. Syst. Bot. 18: 663-677. Case, M.A. 1994. Extensive variation in the levels of genetic diversity and degree of relatedness among five species of Cypripedium(Orchidaceae). Amer. J. Bot. 81: 175-184.Case, M.A., H.T. Mlodozeniec, L.E. Wallace & T.W. Weldy. 1998. Conservation genetics and taxonomic sta-tus of the rare Kentucky Lady’s slipper: C y p r i p e d i u m k e n t u c k i e n s e(Orchidaceae). Amer. J. Bot. 85: 1779-1779.Chung, M.Y. & M.G. Chung. 1999. Allozyme diversity and population structure in Korean populations of Cymbidium goeringii(Orchidaceae). J. Pl. Res. 112: 139-144.Corrias, B., W. Rossi, P. Arduino, R. Cianchi & L. Bullini. 1991. Orchis longicornu Poiret in Sardinia: genetic, morphological and chronological data. Webbia 45: 71-101.Folsom, J.P. 1994. Pollination of a fragrant orchid. Orch. Dig. 58: 83-99.Harris, S.A. & R. J. Abbott. 1997. Isozyme analysis of the reported origin of a new hybrid orchid species, Epipactis y o u n g i a n a(Young’s helleborine), in the British Isles. Heredity 79: 402-407.Hedrén, M., E. Klein & H. Teppner. 2000. Evolution of polyploids in the European orchid genus N i g r i t e l l a: Evidence from allozyme data. Phyton 40: 239-275. Hollingsworth, P.M. & J.H. Dickson. 1997. Genetic varia-tion in rural and urban populations of Epipactis helle-b o r i n e(L.) Crantz. (Orchidaceae) in Britain. Bot. J. Linn. Soc. 123: 321-331.Li, A, Y., B. Luo & S. Ge. 2002. A preliminary study on conservation genetics of an endangered orchid (Paphiopedilum micranthum) from Southwestern China. Bioch. Gen. 40: 195-201.Merrell, D.J. 1981. Ecological Genetics. University of Minnesota Press, Minneapolis, Minnesota.Nielsen, L.R. & H.R. Siegismund. 2000. Interspecific dif-ferentiation and hybridization in V a n i l l a s p e c i e s (Orchidaceae). Heredity 83: 560-567.91Mayo 2003LANKESTERIANANilsson, L.A., E. Rabakonandrianina & B. Pettersson. 1992. Exact tracking of pollen transfer and mating in plants. Nature 360: 666-667.Peakall, R. 1989. A new technique for monitoring pollen flow in orchids. Oecologia 79: 361-365.Peakall, R. & A. J. Beattie. 1996. Ecological and genetic consequences of pollination by sexual deception in the orchid Caladenia tentaculata. Ecology 50: 2207-2220. Rossi, W., B. Corrias, P. Arduino, R. Cianchi & L. Bullini L. 1992. Gene variation and gene flow in Orchis morio (Orchidaceae) from Italy. Pl. Syst. Evol. 179: 43-58. Roughgarden, J. 1996. Theory of population genetics and evolutionary ecology: An Introduction. Prentice Hall, Upper Saddle River, NJ, USA.Salguero-Faría, J.A. & J.D. Ackerman. 1999. A nectar reward: is more better? Biotropica 31: 303-311. Scacchi, R., G. De Angelis & R.M. Corbo. 1991. Effect of the breeding system ion the genetic structure in C e p h a l a n t h e r a spp. (Orchidaceae). Pl. Syst. Evol. 176: 53-61.Scacchi, R., G. De Angelis & P. Lanzara. 1990. Allozyme variation among and within eleven Orchis species (fam. Orchidaceae), with special reference to hybridizing apti-tude. Genetica 81: 143-150.Scacchi, R. and G. De Angelis. 1990. Isoenzyme polymor-phisms in G y m n a e d e n i a[sic] c o n o p s e a and its infer-ences for systematics within this species. Bioch. Syst. Ecol. 17: 25-33.Scacchi, R., P. Lanzara & G. De Angelis. 1987. Study of electrophoretic variability in Epipactis heleborine ( L.) Crantz, E. palustris(L.) Crantz and E. microphylla (Ehrh.) Swartz (fam. Orchidaceae). Genetica 72: 217-224.Sharma, I.K., M.A. Clements & D.L. Jones. 2000. Observations of high genetic variability in the endan-gered Australian terrestrial orchid Pterostylis gibbosa R. Br. (Orchidaceae). Bioch. Syst. Ecol. 28: 651-663. Sharma, I.K., D.L. Jones, A.G. Young & C.J. French. 2001. Genetic diversity and phylogenetic relatedness among six endemic P t e r o s t y l i s species (Orchidaceae; series Grandiflorae) of Western Australia, as revealed by allozyme polymorphisms. Bioch. Syst. Ecol. 29: 697-710.Smith, J.L., K.L. Hunter & R.B. Hunter. 2002. Genetic variation in the terrestrial orchid Tipularia discolor. Southeastern Nat. 1: 17-26Soliva, M. & A. Widmer A. 1999. Genetic and floral divergence among sympatric populations of Gymnadenia conopsea s.l. (Orchidaceae) with different flowering phenology. Int. J. Pl. Sci. 160: 897-905. Squirrell, J., P.M. Hollingsworth, R.M. Bateman, J.H. Dickson, M.H.S. Light, M. MacConaill & M.C. Tebbitt. 2001. Partitioning and diversity of nuclear and organelle markers in native and introduced populations of Epipactis helleborine(Orchidaceae). Amer. J. Bot. 88: 1409-1418.Sun, M. 1996. Effects of Population size, mating system, and evolution origin on genetic diversity in S p i r a n t h e s sinensis and S. hongkongensis. Cons. Biol. 10: 785-795. Sun, M. & K.C. Wong. 2001. Genetic structure of three orchid species with contrasting breeding system using RAPD and allozyme markers. Amer. J. Bot. 88: 2180-2188.Tremblay, R.L. 1994. Frequency and consequences of multi-parental pollinations in a population of Cypripedium calceolus L. var. pubescens(Orchidaceae). Lindleyana 9: 161-167.Tremblay, R.L & J.D. Ackerman. 2001. Gene flow and effective population size in Lepanthes(Orchidaceae): a case for genetic drift. Biol. J. Linn. Soc. 72: 47-62. Wallace, L.A. 2002. Examining the effects of fragmenta-tion on genetic variation in Platanthera leucophaea (Orchidaceae): Inferences from allozyme and random amplified polymorphic DNA markers. Pl. Sp. Biol 17: 37-39.Wallace, L.A. & M. A. Case. 2000. Contrasting diversity between Northern and Southern populations of Cypripedium parviflorum(Orchidaceae): Implications for Pleistocene refugia and taxonomic boundaries. Syst. Bot. 25: 281-296.Wong, K.C. & M. Sun. 1999. Reproductive biology and conservation genetics of Goodyera procera (Orchidaceae). Amer. J. Bot. 86: 1406-1413.Wright, S. 1978. Evolution and the genetics of popula-tions. Vol. 4. Variability within and among natural pop-ulations. Chicago, The University of Chicago Press.Raymond L. Tremblay is an associate professor at the University of Puerto Rico in Humacao and the graduate faculty at UPR- Río Piedras. He obtained his B.Sc. with Honours at Carleton University, Ottawa, Canada in 1990 and his PhD at the University of Puerto Rico in Rio Piedras in 1996. He is presently the chairman of the In situ Orchid Conservation Committee of the Orchid Specialist Group. He is interested in evolutionary and con-servation biology of small populations. Presently his interest revolves in determining the life history characters that limit population growth rate in orchids and evaluating probability of extinction of small orchid populations. James D. Ackerman, Ph.D., is Senior Professor of Biology at the Univesrity of Puerto Rico, Río Piedras. He is an orchidologist, studying pollination an systematics.92Nº 7。
N-adic Summation-Shrinking Generator Basic properties and empirical evidences
N-adic Summation-Shrinking GeneratorBasic properties and empirical evidencesZhaneta TashevaAssistant Prof. Eng. PhD.NMU “V. Levski”Faculty of Artillery and Air Defense, Shoumen, BulgariaPhone: +359 54 5 23 71e-mail: tashevi86@Borislav BedzhevAssoc. Prof. Eng. DSc.NMU “V. Levski”Faculty of Artillery and Air Defense, Shoumen, BulgariaPhone: +359 54 4 64 38e-mail: bedzhev@mail.pv-ma.bgBorislav StoyanovAssistant Prof. Mag. PhD. StudentShoumen UniversityFaculty of Computer Informatics, Shoumen, BulgariaPhone: +359 54 4 78 48e-mail: bpstoyanov@abv.bg.ABSTRACTThe need of software-flexible stream ciphers has led to several alternative proposals in the last few years. One of them is a new Pseudo Random Number Generator (PRNG), named N-adic Summation-Shrinking (NSumSG), which architecture is described in this paper. It uses N-1 parallel working slave summation generators and one N-adic summation generator, controlling the nonlinearity in the generator. The implementation, some properties and statistical tests of NSumSG are given.The results from statistical analysis show that the sequence generated by NSumSG is uniform, scalable, uncompressible, whit large period; consistent and unpredictable. This gives the reason consider the NSumSG as suitable for a particular cryptographic application.KEY WORDSCryptography, Encryption Algorithm, Shrinking Generator, Summation Generator, Stream Ciphers, PRNG, FCSRs.SECTION 1IntroductionThe proliferation of computers and communications systems in the 1960s brought with it a demand from the private sector for means to protect information in digital form and to provide security services. The stream ciphers are an important tool for solving this problem. Despite of their large application, it is very hard or may be impossible to describe all factors, which influence over the performance quality of the stream ciphers. Anyway, surely it depends on their crypto resistance, velocity and effectiveness of hardware implementation. Mostly the crypto resistance of a stream cipher is connected with it ability to generate pseudo random sequence (PRS or gamma) with following properties:(1) it should have enormous period;(2) it should demonstrate uniform distribution of d-tuples (for a large range of d);(3) it should exhibit a good structure (usually a lattice structure) in high dimensions.Unfortunately, the mentioned factors are in contradiction, because if the structure of the stream cipher is simple in order to provide high performance velocity and cost-effective hardware implementation, then the crypto reliability is low. For instance, the classical fast and cheap Linear Feedback Shift Registers(LFSRs) are vulnerable to the so - named “Berlekamp–Massey crypto attack” [4], [5], [8]. This attack allows finding of all bits of a LFSR output sequence, if 2n its consequent bits are known. Here n is the number of the cells connected in the LFSR. Having in mind the advantages of the stream ciphers with simple structure, recently some theorists [3], [4], [6] proposed a new approach to stream cipher design. The basic idea of this approach is building devices with high crypto reliability combining in some appropriate way crypto vulnerable, but fast and cheap elements (including LFSR). This meaning of stream cipher design leaded to introducing of a few new architectures. It should be mentioned the so-named summation generator, shrinking generator and N-adic Feedback with Carry Shift Register (N-FCSR) [2], [3], [13]. They are promising candidates for high-speed encryption applications due to their simplicity and provable properties.With regard to positive features of the summation generator, shrinking generator and N-FCSRs, our paper is focused on the problem of synthesis of a derivative structure, named summation-shrinking generator.The paper is organized as follows. First, the basics of the summation generator and shrinking generator are recalled. Second one their derivative structure, called N-adic Summation-Shrinking Generator (NSumSG) is presented. After then, the implementation and statistical analysis of NSumSG properties are given. Finally, the advantages and possible areas of application of our algorithm are discussed.SECTION 2Basic theory of the summation and shrinking generatorsPrincipally the crypto resistance of a stream cipher, based on LFSR s, can be enhanced by two alternative methods. The first method uses an appropriate combining of the outputs of some LFSR s, as it is shown on Fig.1a. These gamma generators are called “Combination Generators”. The other alternative is to generate the gamma as a non-linear function from conditions of the single LFSR triggers (Fig.1b). In this case the gamma generators are named “Filter Generators”.Fig. 1a:Combination generator Fig.1b:Filter-generatorHaving in minded that:- the filter-generators could be studied as a particular case of the combination generators when S = 1 on Fig. 1a;- the combination generators are still being applied in some real communication and information systems [5], [7];in the rest part of this report our attention shall be focused on the derivative structures of the combination generator.As mentioned, the basic idea of the combination generator method is to create a hard-to-crack gamma sequence by an appropriate combining of some PRSs, whose technical realization is simple and cost-effective. The scheme, shown on Fig.1a, is an object of intensive research since 1984, because it is easy to generate PRSs with LFSRs. As a result of these efforts [6] the cryptologist Rueppel has proved that the combination generators have maximal linear complexity L(x) if: - the all LFSR s have a feed-back loop, described with primitive irreducible polynomial (i.e. the created PRSs are maximum length sequences (shortly m-sequences));-the periods T i,i = 1, 2, …, s of the PRSs, generated by LFSR s, are different.Here linear complexity L(x ) means the length of the binary LFSR , which can beconstructed in the result of the Berlekamp-Massey crypto attack.The Rueppel conditions are easy to realizing as a s-bit adder. This means that ffrom Fig.1a must be a full adder, which has 1log 2 s triggers. In order tosimplify the explanation, we shall suppose, that the LFSRs are only two. In thiscase, during the time interval from 0.W j to 0).1(W j (here 0W is the period of theLFSR s clock-pulses) in LFSR triggers the sequences 11,...,, r j j j a a a A and11,...,, r j j j b b b B are placed. In the adder the numbers, corresponding to thesequences A and B :,2....2.,2....2.111111j j r r j j j r r j b b b b a a a a (1)are summed with carry. Then in the outputs of the adder the total sum b a z isobtained. Here:,1,...,1,,...1,...,1,,,2....2.,,...,,11111111 r j j j i b a b a r j j j i b a z z z z z z z z Z i i i i i i i i i i i j j r r j r j j j V V V V (2)and:-z j is the j th element of combination generator output sequence;-ıi is the carry from the (i-1)th digit.The basic idea of the combining generator can be realized as a shrinkinggenerator also. In the shrinking generator, a control LFSR R 0 is used to select aportion of the output sequence of a second LFSR R 1. Therefore, the producedgamma (or the keystream ) is a shrunken version (also known as an irregularlydecimated subsequence ) of the output sequence of R 1, as depicted in Fig. 2.The algorithm of shrinking generator consists of the following steps:(1) Registers R 0 and R 1 are clocked.(2) If the output of R 0 is 1, the output bit of R 1 forms a part of thekeystream.(3) If the output of R 0 is 0, the output bit of R 1 is discarded.Let R 0 and R 1 be maximum-length LFSRs of lengths L 0 and L 1, respectively, andlet z be an output sequence of the shrinking generator formed by R 0 and R 1. Ifgcd(L 0,L 1) = 1, the z has period (12L – 1). 102 L [7]. The linear complexity L (z )of z satisfies Eq. (3) [7]:1012012.)(2. d L L L z L L (3)Suppose that the connection polynomials of R 0 and R 1 are chosen uniformly atrandom from the set of all primitive polynomials of degrees L 0 and L 1 over Z 2.Then the distribution of patterns in z is almost uniform [7].For maximum security, R 0 and R 1 should be maximum-length LFSRs , and theirlengths should satisfy the condition gcd (L 0,L 1) = 1. Moreover, secret connectionshould be used. Subject to these constraints, if L 0| m and L 1| m , the shrinkinggenerator has a security level approximately equal to 22m . Thus, if L 0| 64 andL 1| 64, the generator appears to be secure against all presently known attacks [5],[7].Fig. 2:Shrinking generatorSECTION 3N-adic Summation-Shrinking Generator ArchitectureIn this section the basic architecture of new N-adic Summation-ShrinkingGenerator (NSumG ) and some basic NSumG properties will be present.The NSumG architecture, proposed recently in [12], uses an increased number ofslaved registers in comparison with Shrinking Generator as in the Shrinking-Multiplexing Generator [11]. The control and slave registers in shrinking-multiplexing generator are replaced with N -adic and 2-adic summation generatorsin the NSumG (fig. 3) respectively. The using of N-adic control summationgenerator enhances the number of the used 2-adic slave summation generatorsfrom 1 in shrinking generator to N 1 in NSumG .Every summation generator consists of two FCSRs , depicted as R j 1y R j 2,()1...,,1,0 N j . It ought to be underlined that slave FCSRs R j 1y R j 2()1...,2,1 N j are 2 FCSRs and hence, the corresponding adders m j consist onebit for m j and one bit for sign. The control FCSRs R 01and R 02 are N -FCSRs andtheir adder m 0 have 1)(0 jN m ind bits for ||0mand an extra bit for sign. clockoutput b i discard b iAs shown, a summation generator selects a portion of the output sequences of several summation generators.Definition 1. The algorithm of the N-adic Summation-Shrinking Generator consists of the following steps:FCSRs from R01y R02 to R N-1 1y R N-1 2 are clocked with clock All(1)sequence with period 0W.(2) If the N-adic output b i = j of the control summation generator is not equal to 0, the output bit of j th slave summation generator forms a part of the keystream. Otherwise, if the output b i = 0 of the control summation generator is equal to 0, the all output bits of slaved summation generators are discarded (fig. 3).Fig. 3: N-adic Summation-Shrinking GeneratorTherefore, the produced keystream is a shrunken and mixed version of the outputsequences 1...,,2,1, N i a ij of the N -1 slaved summation generators.It is straightforward that the N -adic Summation-Shrinking Generator succeeds allpositive features of the summation generator, shrinking generator and N -adicFCSR .The proposed new pseudo random number generator architecture takes advantages of feedback with carry shift registers over )/(N Z for any integerN > 1 (N -FCSRs) (see fig. 4).Definition 2 [13]. Let N > 1 be an integer and }10:{ d d N a a S . For anyinteger 1t r , the state of a feedback with carry shift register over )/(N Z consistof r integers S a a a r 110,,, and arbitrary integer 1 r M M , the memory.The state change function is determined by 1 r integers S d d d g r ,,,,21 ,such that gcd (g ,N ) = 1 and 0z r d as follows (fig. 4):(1) Compute the integer sum r r r r d a d a d a M 022111 V ;(2) Compute S a r ,Z M r such that N M ga r r V ; (3) Change the memory 1 r M to r M ;(4) Output the cell 0a and use the cell r a to shift the register loadingcells, replacing ),,(01a a r by ),,(1a a r .For r n t ,n a is defined by both the memory and the running register cells. In theentire operating ),,,,(21r d d d g are fixed. The following integer r r N d N d N d g d 221 is called the connection number. Consequently,g d 0 and ¦ ri i i N d d 0.Fig. 4:N-adic Feedback with Carry Shift RegisterFor maximum security one must choose the triples of integers ),,(N p d satisfying the next conditions:(1)d is prime; (2)12 p d and p is odd prime; (3) N is prime;(4)N is primitive modulo d and primitive modulo p .In particular case when N = 2 the 2SumSG consists of only one slave 2-adicsummation generator. Let the connection integers of two 2-FCSRs R 01y R 02 ofcontrol summation generator be d 01 and d 02. Let the slave summation generatorcombines two 2-FCSRs R 11y R 12 with connection numbers d 11 and d 12. The period of control summation generator is))1(),1((lcm ))1(),1gcd(()1)(1(0201020102010 d d d d d d T (4) and the period of slave summation generator is))1(),1((lcm ))1(),1gcd(()1)(1(1211121112111 d d d d d d T , (5) according to the [6] and the using of triples ),,(N p d with properties mentionedabove.Then the period 2S of the 2 adic Summation-Shrinking Generator is:),gcd(101*02T T T T S . (6) Here the *0T denotes the total number of ones of the control summation generator.According to [6] the linear complexities 0L and 1L of the summation generators are close to their periods, i.e. ))1)(1gcd(()1)(1(020102010 d d d d L ,))1)(1gcd(()1)(1(121112111 d d d d L .Then from [1] the linear complexity L of the 2SumSG is at most*01.T T L . (7)As one can see from equation (4)y (7), the proposed new architecture ofpseudorandom number generator even with N = 2 allows to produce PRSs withperiod and linear complexity larger than the respective parameters of the PRSsformed by a classic shrinking generator [1].SECTION 4Implementation and output files generationThe N SumSG is software implemented in Visual C++ 6.0 environment for Windows/32 bits. There are used the class p_adic to produce the output N SumSG sequence. The application and N SumSG statistical tests were executed on PC AMD Athlon™ XP 2200+ / 256 MB RAM.Two different setups are applied to generate 1 000 sequences by 1 000 000 bits each to test the N-adic Summation-Shrinking Generator:N = 2. Thereby the N SumSG consists of one controlling 2-adic (1)summation generator with connection integers d01 = 10 000 139 and d02 = 10 000 189. The slave 2-adic summation generator has first connection number d11 = 10 000 229. The second connection number d12 is in every 1 000 000 bits, taking consequently 1 000 values, which are strong 2-primes [9] in the range [81 467, 2 283 803]. So the seed of constructed N SumSG is different at every 1 000 000 bits. The size of generated N SumSG output file is 983 Mbytes.(2) N = 3. In this configuration the controlling 3-adic summation generator gets two connection numbers d01 = 5 000 011 and d02 = 5 000 201. The first slave summation 2-adic generator has a seed comprising the numbers d11 = 10 000 139 and d12 = 10 000 189. The second summation generator has the first connection number d21= 10000229. The second connection number d22 is changed in every 1 000 000 bits, taking consequently 1 000 values, which are strong2-primes in the range [981 467, 2 283 803]. In this way were generated 1 000 sequences by 1 000 000 bits, in which the seed were changed at every 1 000 000 bits. The size of generated N SumSG output file is 983 Mbytes.The connection FCSR numbers were chosen randomly in the two above mention setups.SECTION 5Statistical analysis and interpretation of empirical resultsTo test the randomness of binary sequences generated by N SumSG the so-named NIST suite, proposed by National Institute of Standards and Technology, is used. The NIST suite [7], [10] includes sixteen tests. The tests fix on a variety of different types of non-randomness that could exist in a sequence. These tests are: frequency (monobit), frequency within a block, runs, longest-run-of-ones in a block, binary matrix rank, discrete Fourier transform (spectral), non-overlapping template matching, overlapping template matching, Maurer’s “Universal statistical”, Lempel-Ziv compression, linear complexity, serial, approximate entropy, cumulative sums, random excursions, random excursions variant.The testing process consists of the following steps [7], [10]:(1) State the null hypothesis. Assume that the binary sequence is random.(2) Compute a sequence test statistic. Testing is carried out at the bit level. (3) Compute the p-value, ]1,0[value p .(4) Compare the D to value p . Fix D , where ]01.0,0001.0( D .Successis declared whenever D t value p ; otherwise, failure is declared.Given the empirical results for a particular statistical test, the NIST suitecomputes the proportion of sequences that pass. The range of acceptable proportion is determined using the confidence interval defined as,mp p p )ˆ1(ˆ3ˆ r , where D 1ˆp , and m is the number of binary tested sequences. In our two setups 1000 m . Thus the confidence interval is0094392.099.01000)01.0(99.030.99r r . The proportion should lie above 0.9805607.The distribution of p-values is examined to ensure uniformity. The intervalbetween 0 and 1 is divided into 10 sub-intervals, and the p-values that lie withineach sub-interval are counted. Uniformity may also be specified trough anapplication of a 2F test and the determination of a p-value corresponding to theGoodness-of-Fit Distributional Test on the p-values obtained for an arbitrary statistical test, p-value of the p-values. This is implemented by computing¦ 1012210/)10/(i i m m F F , where i F is the number of p-values in sub-interval i , andm is the number of tested sequences. A p-value is calculated such that )2/,2/9(value -p 2F igamc Ɍ . If 0001.0value -p !Ɍ, then the sequences canbe regarded to be uniformly distributed.Table 1 lists the results from the NIST test suite with input file from the first setup(N = 2). The detailed result of Non-overlapping template matching test, Randomexcursion test and Random excursion – variant test and the numbers of the p-values in the subintervals, when N = 2, can be found in Appendix 1.Table 1: The results from NSumSG statistical tests, when N = 2 Statistical TestResult Proportion P-value T Comment Frequency (monobit)Pass 0.9920 0.260930 Frequency within a blockPass 0.9810 0.896345 Pass 0.9870 0.524101Cumulative sums Pass 0.9910 0.832561Runs Pass 0.9830 0.326749 Longest-run-of-ones in a block Pass 0.9850 0.465415Binary matrix rank Pass 0.9890 0.757790Discrete Fourier transform (spectral) Pass 0.9970 0.186566Non-overlapping template matching Pass 0.9894 0.531028 Avg. valuesStatistical Test Result Proportion P-value T CommentOverlapping template matching Pass 0.9940 0.618385Maurer’s “Universal statistical” Pass 0.9880 0.086634Approximate entropy Pass 0.9890 0.476911Random excursions Pass 0.9870 0.598233 Avg. valuesRandom excursions variant Pass 0.9901 0.431378 Avg. valuesSerialPass 0.9930 0.227180Pass 0.9910 0.849708Lempel-Ziv compression Pass 0.9960 0.037320Linear complexity Pass 0.9960 0.355364The minimum pass rate for theRandom Excursion - (variant) test isapproximately 0.977854.The minimum pass rate for eachstatistical test with the exception ofthe Random Excursion - variant testis approximately = 0.980561.The Table 2 lists the results from the NIST test suite with input file from thesecond setup (N = 3). The detailed result of Non-overlapping template matchingtest, Random excursion test and Random excursion – variant test and the numbersof the p-values in the subintervals, when N = 3, can be found in Appendix 2.Table 2: The results from NSumSG statistical tests, when N = 3Statistical Test Result Proportion P-value T CommentFrequency (monobit) Pass 0.9890 0.881662Frequency within a block Pass 0.9880 0.254411Cumulative sumsPass 0.9820 0.534146Pass 0.9850 0.8272790.4280950.9930Runs PassLongest-run-of-ones in a block Pass 0.9870 0.187581Binary matrix rank Pass 0.9860 0.618385Discrete Fourier transform (spectral) Pass 0.9910 0.647530Non-overlapping template matching Pass 0.9899 0.476221 Avg. valuesOverlapping template matching Pass 0.9900 0.045088Maurer’s “Universal statistical” Pass 0.9850 0.662091Approximate entropy Pass 0.9950 0.508172Random excursions Pass 0.9907 0.476154 Avg. valuesRandom excursions variant Pass 0.9895 0.461205 Avg. valuesSerialPass 0.9880 0.672470Pass 0.9940 0.159910Lempel-Ziv compression Pass 0.9820 0.532132Linear complexity Pass 0.9900 0.869278The minimum pass rate for theRandom Excursion - (variant) test isapproximately 0.978117.The minimum pass rate for eachstatistical test with the exception ofthe Random Excursion - variant testis approximately = 0.980561.CONCLUSIONS AND FUTURE WORKSThe results from statistical analysis show that the sequence generated by NSumSGis uniform, scalable, uncompressible, whit large period; consistent and unpredictable.This gives the reason to consider that the NSumSG as a very interesting pseudorandom generator and it can be useful as a part of stream ciphers.We will be glad to thanks everyone who helps us to make some strong cryptanalysis of NSumSG.References:[1] D. Coppersmith, H. Krawczyk, Y. Mansour, “The Shrinking Generator”,Proceedings of Crypto 93, Springer-Verlag, 1994., pp. 22-39[2] A. Klapper, M. Goresky, “2-adic Shift Register. Fast Software Encryption”,Second International Workshop. (Lecture Notes in Computer Science, vol.950, Springer Verlag, N. Y., 1994.) pp.174-178[3] A. Klapper, J. Xu, “Algebraic Feedback Shift Registers” (submitted toElsevier Preprint), 2003.[4] R. Lidl, H. Niederreiter, “Finite Fields”, Addison – Wesley PublishingCompany, London, England, 1983.[5] P. van Oorshot, A. Menezes, S. Vanstone, “Handbook of AppliedCryptography”, CRC Press, 1997.[6] R. Rueppel, “Analysis and Design of Stream Siphers”, Springer Verlag, N.Y., 1986.[7] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson,M. Vangel, D. Banks, A. Heckert, J. Dray, S. Vo, “A Statistical Test Suite for Random and Pseudo-Random Number Generators for Cryptographic Application”, NIST Special Publication 800-22 (with revision May 15, 2001) /rng/.[8] B. Schneier, “Applied Cryptography”, John Wiley & Sons, New York, 1996.[9] Ch. Seo, S. Lee, Y. Sung, K. Han, S. Kim, “A Lower Bound on the LinearSpan an FCSR”, IEEE Transaction on Information Theory, Vol. 46, No 2, March 2000.[10] J. Soto, “Statistical Testing of Random Number Generators”,/rng/.[11] Zh. N. Tasheva, B. Y. Bedzhev, V. A. Mutkov, “An Shrinking DataEncryption Algorithm with p-adic Feedback with Carry Shift Register”, XII International Symposium of Theoretical Electrical Engineering ISTET 03, Warsaw, Poland, 6-9 July, 2003., Conference Proceedings, vol.II, pp.397 400.[12] Zh. N. Tasheva, B. Y. Bedzhev, B. P. Stoyanov, “Summation-ShrinkingGenerator”, Conference Proceeding of International Conference “Information Technology and Sequrity ITS – 2004”, June 22-26, 2004, Partenit, Crimea, Ukraine, pp.119-127.[13] Xu, J., “Stream Cipher Analysis Based on FCSRs”, PhD Dissertation,University of Kentucky, 2000.APPENDIX 1Results from setup 1The Uniformity of p-values and the Proportion of passing sequencesC1 C2 C3 C4 C5 C6 C7 C8 C9 C10p-values T Proportion Test113 117 91 111 85 97 86 10096 1040.2609300.9920 frequency91 98 112101 111 96 10198 93 99 0.8963450.9810 block-frequency114 105 96 91 82 95 10797 1061070.5241010.9870 cumulative-sums106 104 10997 88 92 96 94 1081060.8325610.9910 cumulative-sums122 90 108104 99 108 86 92 96 95 0.3267490.9830 runs108 95 94 96 118 94 84 11010398 0.4654150.9850 longest-run109 106 97 95 99 86 10211495 97 0.7577900.9890 rank94 107 109109 121 100 93 99 84 84 0.1865660.9970 fft102 92 99 113 83 92 90 11512193 0.1202070.9900 nonperiodic-templates 108 118 90 95 96 104 95 11196 87 0.4597170.9860 nonperiodic-templates99 95 87 101 106 106 96 10190 1190.5893410.9910 nonperiodic-templates106 112 10196 108 107 10181 85 1030.4317540.9940 nonperiodic-templates104 86 98 101 102 104 12067 1111070.0255350.9840 nonperiodic-templates97 117 10693 79 99 92 10992 1160.1671840.9900 nonperiodic-templates90 92 12196 121 120 87 85 87 1010.0163740.9910 nonperiodic-templates108 133 91 92 89 94 11210186 94 0.0316370.9810 nonperiodic-templates83 109 12299 95 91 10198 98 1040.3619380.9910 nonperiodic-templates89 109 10893 100 106 10510410482 0.6038410.9890 nonperiodic-templates122 91 92 111 89 99 98 10610389 0.3175650.9840 nonperiodic-templates 108 105 83 97 120 88 10194 10797 0.3298500.9860 nonperiodic-templates89 116 10195 105 93 97 99 90 1150.5221000.9930 nonperiodic-templates94 90 11391 93 109 11210110097 0.6683210.9860 nonperiodic-templates90 94 93 115 101 108 10310010096 0.8343080.9920 nonperiodic-templates85 99 106106 100 98 11695 11184 0.3838270.9910 nonperiodic-templates97 101 103111 106 81 96 10111292 0.5728470.9910 nonperiodic-templates107 89 94 95 113 103 10394 10399 0.8644940.9900 nonperiodic-templates 103 111 10196 95 98 78 99 1021170.3889900.9940 nonperiodic-templates99 89 100106 99 90 1061001031080.9311850.9870 nonperiodic-templates99 99 98 84 102 101 1041051041040.9463080.9900 nonperiodic-templates 105 98 97 111 107 97 82 10990 1040.5976200.9860 nonperiodic-templates98 91 79 88 111 102 1071171021050.2355890.9910 nonperiodic-templates94 107 11594 98 109 10510586 87 0.4885340.9910 nonperiodic-templates102 93 99 114 98 98 10896 91 1010.8977630.9910 nonperiodic-templates106 113 92 101 95 111 11289 93 88 0.4616120.9930 nonperiodic-templatesC1 C2 C3 C4 C5 C6 C7 C8 C9 C10p-values T Proportion Test99 115 80 92 104 125 10294 87 1020.0795380.9910 nonperiodic-templates 93 113 89 108 115 90 88 10699 99 0.4280950.9920 nonperiodic-templates 92 104 98 105 91 93 12310992 93 0.3821150.9890 nonperiodic-templates 97 112 102101 113 90 99 10781 98 0.4924360.9910 nonperiodic-templates 94 87 113107 97 109 96 98 96 1030.7811060.9890 nonperiodic-templates 104 98 86 99 94 105 10792 11699 0.6910810.9910 nonperiodic-templates 104 105 98 91 99 90 11190 96 1160.6163050.9890 nonperiodic-templates 102 106 10595 94 94 10710694 97 0.9673820.9930 nonperiodic-templates 103 100 91 103 92 100 96 11210598 0.9400800.9830 nonperiodic-templates 104 107 10793 98 93 98 10579 1160.3994420.9910 nonperiodic-templates 104 87 97 107 98 111 11084 95 1070.5361630.9960 nonperiodic-templates 104 107 10683 100 102 10110491 1020.8377810.9900 nonperiodic-templates 96 96 12384 89 97 1061011061020.3314080.9950 nonperiodic-templates 100 115 98 97 96 103 84 1001071000.7714690.9830 nonperiodic-templates 110 99 106117 85 93 10211288 88 0.2518370.9920 nonperiodic-templates 91 103 10194 96 103 10510393 1110.9379190.9870 nonperiodic-templates 103 74 94 108 99 102 96 99 1161090.2467500.9920 nonperiodic-templates 101 102 86 100 108 100 11210894 89 0.7095580.9850 nonperiodic-templates 105 97 98 100 97 122 95 91 92 1030.6267090.9880 nonperiodic-templates 92 110 10399 95 102 10295 98 1040.9803410.9920 nonperiodic-templates 103 100 102100 90 115 95 90 10798 0.8201430.9860 nonperiodic-templates 104 111 93 104 82 81 11890 1081090.1037530.9810 nonperiodic-templates 105 116 85 89 96 96 10610095 1120.4711460.9840 nonperiodic-templates 91 104 10795 96 90 10610810796 0.8739870.9830 nonperiodic-templates 110 100 106104 107 96 99 98 10377 0.5749030.9890 nonperiodic-templates 125 91 10794 101 111 90 91 10090 0.2167130.9890 nonperiodic-templates 96 93 11294 97 109 91 10195 1120.7538440.9920 nonperiodic-templates 102 100 95 107 106 104 99 10684 97 0.8891180.9870 nonperiodic-templates 104 98 119103 99 94 85 90 1001080.5181060.9930 nonperiodic-templates 95 84 11395 91 101 11398 1081020.5361630.9890 nonperiodic-templates 106 106 90 89 113 105 98 92 98 1030.7714690.9880 nonperiodic-templates 97 101 99 95 110 90 95 93 12397 0.4865880.9900 nonperiodic-templates 101 91 10099 97 104 90 11311392 0.7298700.9910 nonperiodic-templates 110 97 10179 104 105 10011576 1130.0752540.9860 nonperiodic-templates 107 86 105115 91 97 89 10798 1050.5503470.9850 nonperiodic-templates 88 111 102100 94 96 10010211493 0.7695270.9950 nonperiodic-templates 105 111 98 94 94 96 99 89 1081060.8676920.9870 nonperiodic-templates 86 117 99 113 100 96 12094 91 84 0.1075120.9900 nonperiodic-templates 105 107 100112 98 92 95 10794 90 0.8377810.9920 nonperiodic-templates 79 111 97 104 98 100 11310589 1040.4172190.9930 nonperiodic-templates 86 81 112104 115 104 10685 11196 0.1388600.9920 nonperiodic-templates 92 91 10795 114 100 10111489 97 0.5934780.9900 nonperiodic-templates 111 99 99 107 95 97 95 11597 85 0.6475300.9900 nonperiodic-templates 90 117 83 115 96 96 10010792 1040.3011940.9910 nonperiodic-templates 100 94 102105 96 108 92 10391 1090.9240760.9880 nonperiodic-templates 94 121 88 100 105 82 99 10811093 0.2224800.9950 nonperiodic-templates 116 93 120105 91 94 11679 99 87 0.0465680.9890 nonperiodic-templates 106 85 89 100 93 116 10211510094 0.3907210.9870 nonperiodic-templates 102 92 99 114 82 93 89 11711993 0.1031380.9900 nonperiodic-templates 103 95 101127 102 84 10089 92 1070.1825500.9900 nonperiodic-templates。
Lecture 10
EM Algorithm 1st expectation step : calculations
• Assume that the seq1 is 20 bases long and the length of the site is 20 bases.
• Suppose that the site starts in the column 1 and the first two positions are A and T.
eMOTIF
True positives
eMOTIF: search of sequences with certain emotif in the DB
Expectation Maximization (EM) Algorithm
• This algorithm is used to identify conserved areas in unaligned DNA and proteins. • Assume that a set of sequences is expected to have a common sequence pattern.
Bioinformatics
Lecture 10
• Finding signals and motifs in DNA and proteins
• Expectation Maximization Algorithm
• MEME • The Gibbs sampler
Finding signals and motifs in DNA and proteins
• An alignment of sequences is intrinsically connected with another essential task, which is finding certain signals and motifs (highly conservative ungapped blocks) shared by some sequences. • A motif is a sequence pattern that occurs repeatedly in a group of related protein or DNA sequences. Motifs are represented as position-dependent scoring matrices that describe the score of each possible letter at each position in the pattern. • Another related task is searching biological databases for sequences that contain one or more of known motifs. • These objectives are critical in analysis of genes and proteins, as any gene or protein contains a set of different motifs and signals. Complete knowledge about locations and structure of such motifs and signals leads to a comprehensive description of a gene or protein and indicates at a potential function.
Adaptive tracking control of uncertain MIMO nonlinear systems with input constraints
article
info
abstract
In this paper, adaptive tracking control is proposed for a class of uncertain multi-input and multi-output nonlinear systems with non-symmetric input constraints. The auxiliary design system is introduced to analyze the effect of input constraints, and its states are used to adaptive tracking control design. The spectral radius of the control coefficient matrix is used to relax the nonsingular assumption of the control coefficient matrix. Subsequently, the constrained adaptive control is presented, where command filters are adopted to implement the emulate of actuator physical constraints on the control law and virtual control laws and avoid the tedious analytic computations of time derivatives of virtual control laws in the backstepping procedure. Under the proposed control techniques, the closed-loop semi-global uniformly ultimate bounded stability is achieved via Lyapunov synthesis. Finally, simulation studies are presented to illustrate the effectiveness of the proposed adaptive tracking control. © 2011 Elsevier Ltd. All rights reserved.
normalized cuts and image segmentation翻译
规范化切割和图像分割摘要:为解决在视觉上的感知分组的问题,我们提出了一个新的方法。
我们目的是提取图像的总体印象,而不是只集中于局部特征和图像数据的一致性。
我们把图像分割看成一个图形的划分问题,并且提出一个新的分割图形的全球标准,规范化切割。
这一标准衡量了不同组之间的总差异和总相似。
我们发现基于广义特征值问题的一个高效计算技术可以用于优化标准。
我们已经将这种方法应用于静态图像和运动序列,发现结果是令人鼓舞的。
1简介近75年前,韦特海默推出的“格式塔”的方法奠定了感知分组和视觉感知组织的重要性。
我的目的是,分组问题可以通过考虑图(1)所示点的集合而更加明确。
Figure1:H<iw m.3Uiyps?通常人类观察者在这个图中会看到四个对象,一个圆环和内部的一团点以及右侧两个松散的点团。
然而这并不是唯一的分割情况。
有些人认为有三个对象,可以将右侧的两个认为是一个哑铃状的物体。
或者只有两个对象,右侧是一个哑铃状的物体,左侧是一个类似结构的圆形星系。
如果一个人倒行逆施,他可以认为事实上每一个点是一个不同的对象。
这似乎是一个人为的例子,但每一次图像分割都会面临一个相似的问题一将一个图像的区域D划分成子集Di会有许多可能的划分方式(包括极端的将每一个像素认为是一个单独的实体)。
我们怎样挑选“最正确”的呢?我们相信贝叶斯的观点是合适的,即一个人想要在较早的世界知识背景下找到最合理的解释。
当然,困难在于具体说明较早的世界知识一一些低层次的,例如亮度,颜色,质地或运行的一致性,但是关于物体对称或对象模型的中高层次的知识是同等重要的。
这些表明基于低层次线索的图像分割不能够也不应该旨在产生一个完整的最终的正确的分割。
目标应该是利用低层次的亮度,颜色,质地,或运动属性的一致性继续的提出分层分区。
中高层次的知识可以用于确认这些分组或者选择更深的关注。
这种关注可能会导致更进一步的再分割或分组。
关键点是图像分割是从大的图像向下进行,而不是像画家首先标示出主要区域,然后再填充细节。
Finding community structure in networks using the eigenvectors of matrices
M. E. J. Newman
Department of Physics and Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI 48109–1040
We consider the problem of detecting communities or modules in networks, groups of vertices with a higher-than-average density of edges connecting them. Previous work indicates that a robust approach to this problem is the maximization of the benefit function known as “modularity” over possible divisions of a network. Here we show that this maximization process can be written in terms of the eigenspectrum of a matrix we call the modularity matrix, which plays a role in community detection similar to that played by the graph Laplacian in graph partitioning calculations. This result leads us to a number of possible algorithms for detecting community structure, as well as several other results, including a spectral measure of bipartite structure in neteasure that identifies those vertices that occupy central positions within the communities to which they belong. The algorithms and measures proposed are illustrated with applications to a variety of real-world complex networks.
distributed representations of words and phrases and their compositionality
Tomas MikolovGoogle Inc.Mountain View mikolov@Ilya SutskeverGoogle Inc.Mountain Viewilyasu@Kai ChenGoogle Inc.Mountain Viewkai@Greg CorradoGoogle Inc.Mountain View gcorrado@Jeffrey DeanGoogle Inc.Mountain View jeff@AbstractThe recently introduced continuous Skip-gram model is an efficient method forlearning high-quality distributed vector representations that capture a large num-ber of precise syntactic and semantic word relationships.In this paper we presentseveral extensions that improve both the quality of the vectors and the trainingspeed.By subsampling of the frequent words we obtain significant speedup andalso learn more regular word representations.We also describe a simple alterna-tive to the hierarchical softmax called negative sampling.An inherent limitation of word representations is their indifference to word orderand their inability to represent idiomatic phrases.For example,the meanings of“Canada”and“Air”cannot be easily combined to obtain“Air Canada”.Motivatedby this example,we present a simple method forfinding phrases in text,and showthat learning good vector representations for millions of phrases is possible.1IntroductionDistributed representations of words in a vector space help learning algorithms to achieve better performance in natural language processing tasks by grouping similar words.One of the earliest use of word representations dates back to1986due to Rumelhart,Hinton,and Williams[13].This idea has since been applied to statistical language modeling with considerable success[1].The follow up work includes applications to automatic speech recognition and machine translation[14,7],and a wide range of NLP tasks[2,20,15,3,18,19,9].Recently,Mikolov et al.[8]introduced the Skip-gram model,an efficient method for learning high-quality vector representations of words from large amounts of unstructured text data.Unlike most of the previously used neural network architectures for learning word vectors,training of the Skip-gram model(see Figure1)does not involve dense matrix multiplications.This makes the training extremely efficient:an optimized single-machine implementation can train on more than100billion words in one day.The word representations computed using neural networks are very interesting because the learned vectors explicitly encode many linguistic regularities and patterns.Somewhat surprisingly,many of these patterns can be represented as linear translations.For example,the result of a vector calcula-tion vec(“Madrid”)-vec(“Spain”)+vec(“France”)is closer to vec(“Paris”)than to any other word vector[9,8].Figure1:The Skip-gram vector representations that are good at predictingIn this paper we We show that sub-sampling of frequent(around2x-10x),and improves accuracy of we present a simpli-fied variant of Noise model that results in faster training and better vector representations for frequent words,compared to more complex hierarchical softmax that was used in the prior work[8].Word representations are limited by their inability to represent idiomatic phrases that are not com-positions of the individual words.For example,“Boston Globe”is a newspaper,and so it is not a natural combination of the meanings of“Boston”and“Globe”.Therefore,using vectors to repre-sent the whole phrases makes the Skip-gram model considerably more expressive.Other techniques that aim to represent meaning of sentences by composing the word vectors,such as the recursive autoencoders[15],would also benefit from using phrase vectors instead of the word vectors.The extension from word based to phrase based models is relatively simple.First we identify a large number of phrases using a data-driven approach,and then we treat the phrases as individual tokens during the training.To evaluate the quality of the phrase vectors,we developed a test set of analogi-cal reasoning tasks that contains both words and phrases.A typical analogy pair from our test set is “Montreal”:“Montreal Canadiens”::“Toronto”:“Toronto Maple Leafs”.It is considered to have been answered correctly if the nearest representation to vec(“Montreal Canadiens”)-vec(“Montreal”)+ vec(“Toronto”)is vec(“Toronto Maple Leafs”).Finally,we describe another interesting property of the Skip-gram model.We found that simple vector addition can often produce meaningful results.For example,vec(“Russia”)+vec(“river”)is close to vec(“V olga River”),and vec(“Germany”)+vec(“capital”)is close to vec(“Berlin”).This compositionality suggests that a non-obvious degree of language understanding can be obtained by using basic mathematical operations on the word vector representations.2The Skip-gram ModelThe training objective of the Skip-gram model is tofind word representations that are useful for predicting the surrounding words in a sentence or a document.More formally,given a sequence of training words w1,w2,w3,...,w T,the objective of the Skip-gram model is to maximize the average log probability1training time.The basic Skip-gram formulation defines p(w t+j|w t)using the softmax function:exp v′w O⊤v w Ip(w O|w I)=-2-1.5-1-0.5 0 0.511.5 2-2-1.5-1-0.5 0 0.5 1 1.5 2Country and Capital Vectors Projected by PCAChinaJapanFranceRussiaGermanyItalySpainGreece TurkeyBeijingParis Tokyo PolandMoscow Portugal Berlin Rome Athens MadridAnkara Warsaw LisbonFigure 2:Two-dimensional PCA projection of the 1000-dimensional Skip-gram vectors of countries and their capital cities.The figure illustrates ability of the model to automatically organize concepts and learn implicitly the relationships between them,as during the training we did not provide any supervised information about what a capital city means.which is used to replace every log P (w O |w I )term in the Skip-gram objective.Thus the task is to distinguish the target word w O from draws from the noise distribution P n (w )using logistic regres-sion,where there are k negative samples for each data sample.Our experiments indicate that values of k in the range 5–20are useful for small training datasets,while for large datasets the k can be as small as 2–5.The main difference between the Negative sampling and NCE is that NCE needs both samples and the numerical probabilities of the noise distribution,while Negative sampling uses only samples.And while NCE approximately maximizes the log probability of the softmax,this property is not important for our application.Both NCE and NEG have the noise distribution P n (w )as a free parameter.We investigated a number of choices for P n (w )and found that the unigram distribution U (w )raised to the 3/4rd power (i.e.,U (w )3/4/Z )outperformed significantly the unigram and the uniform distributions,for both NCE and NEG on every task we tried including language modeling (not reported here).2.3Subsampling of Frequent WordsIn very large corpora,the most frequent words can easily occur hundreds of millions of times (e.g.,“in”,“the”,and “a”).Such words usually provide less information value than the rare words.For example,while the Skip-gram model benefits from observing the co-occurrences of “France”and “Paris”,it benefits much less from observing the frequent co-occurrences of “France”and “the”,as nearly every word co-occurs frequently within a sentence with “the”.This idea can also be applied in the opposite direction;the vector representations of frequent words do not change significantly after training on several million examples.To counter the imbalance between the rare and frequent words,we used a simple subsampling ap-proach:each word w i in the training set is discarded with probability computed by the formulaP (w i )=1− f (w i )(5)Method Syntactic[%]Semantic[%]NEG-563549761 HS-Huffman53403853NEG-561583661 HS-Huffman5259/p/word2vec/source/browse/trunk/questions-words.txtNewspapersNHL TeamsNBA TeamsAirlinesCompany executives.(6)count(w i)×count(w j)Theδis used as a discounting coefficient and prevents too many phrases consisting of very infre-quent words to be formed.The bigrams with score above the chosen threshold are then used as phrases.Typically,we run2-4passes over the training data with decreasing threshold value,allow-ing longer phrases that consists of several words to be formed.We evaluate the quality of the phrase representations using a new analogical reasoning task that involves phrases.Table2shows examples of thefive categories of analogies used in this task.This dataset is publicly available on the web2.4.1Phrase Skip-Gram ResultsStarting with the same news data as in the previous experiments,wefirst constructed the phrase based training corpus and then we trained several Skip-gram models using different hyper-parameters.As before,we used vector dimensionality300and context size5.This setting already achieves good performance on the phrase dataset,and allowed us to quickly compare the Negative Sampling and the Hierarchical Softmax,both with and without subsampling of the frequent tokens. The results are summarized in Table3.The results show that while Negative Sampling achieves a respectable accuracy even with k=5, using k=15achieves considerably better performance.Surprisingly,while we found the Hierar-chical Softmax to achieve lower performance when trained without subsampling,it became the best performing method when we downsampled the frequent words.This shows that the subsampling can result in faster training and can also improve accuracy,at least in some cases.Dimensionality10−5subsampling[%]30027NEG-152730047Table3:Accuracies of the Skip-gram models on the phrase analogy dataset.The models were trained on approximately one billion words from the news dataset.HS with10−5subsamplingLingsugurGreat Rift ValleyRebbeca NaomiRuegenchess grandmasterVietnam+capital Russian+riverkoruna airline Lufthansa Juliette Binoche Check crown carrier Lufthansa Vanessa Paradis Polish zoltyflag carrier Lufthansa Charlotte Gainsbourg CTK Lufthansa Cecile De Table5:Vector compositionality using element-wise addition.Four closest tokens to the sum of two vectors are shown,using the best Skip-gram model.To maximize the accuracy on the phrase analogy task,we increased the amount of the training data by using a dataset with about33billion words.We used the hierarchical softmax,dimensionality of1000,and the entire sentence for the context.This resulted in a model that reached an accuracy of72%.We achieved lower accuracy66%when we reduced the size of the training dataset to6B words,which suggests that the large amount of the training data is crucial.To gain further insight into how different the representations learned by different models are,we did inspect manually the nearest neighbours of infrequent phrases using various models.In Table4,we show a sample of such comparison.Consistently with the previous results,it seems that the best representations of phrases are learned by a model with the hierarchical softmax and subsampling. 5Additive CompositionalityWe demonstrated that the word and phrase representations learned by the Skip-gram model exhibit a linear structure that makes it possible to perform precise analogical reasoning using simple vector arithmetics.Interestingly,we found that the Skip-gram representations exhibit another kind of linear structure that makes it possible to meaningfully combine words by an element-wise addition of their vector representations.This phenomenon is illustrated in Table5.The additive property of the vectors can be explained by inspecting the training objective.The word vectors are in a linear relationship with the inputs to the softmax nonlinearity.As the word vectors are trained to predict the surrounding words in the sentence,the vectors can be seen as representing the distribution of the context in which a word appears.These values are related logarithmically to the probabilities computed by the output layer,so the sum of two word vectors is related to the product of the two context distributions.The product works here as the AND function:words that are assigned high probabilities by both word vectors will have high probability,and the other words will have low probability.Thus,if“V olga River”appears frequently in the same sentence together with the words“Russian”and“river”,the sum of these two word vectors will result in such a feature vector that is close to the vector of“V olga River”.6Comparison to Published Word RepresentationsMany authors who previously worked on the neural network based representations of words have published their resulting models for further use and comparison:amongst the most well known au-thors are Collobert and Weston[2],Turian et al.[17],and Mnih and Hinton[10].We downloaded their word vectors from the web3.Mikolov et al.[8]have already evaluated these word representa-tions on the word analogy task,where the Skip-gram models achieved the best performance with a huge margin.Model Redmond ninjutsu capitulate (training time)Collobert(50d)conyers reiki abdicate (2months)lubbock kohona accedekeene karate rearmJewell gunfireArzu emotionOvitz impunityMnih(100d)Podhurst-Mavericks (7days)Harlang-planning Agarwal-hesitatedVaclav Havel spray paintpresident Vaclav Havel grafittiVelvet Revolution taggers/p/word2vecReferences[1]Yoshua Bengio,R´e jean Ducharme,Pascal Vincent,and Christian Janvin.A neural probabilistic languagemodel.The Journal of Machine Learning Research,3:1137–1155,2003.[2]Ronan Collobert and Jason Weston.A unified architecture for natural language processing:deep neu-ral networks with multitask learning.In Proceedings of the25th international conference on Machine learning,pages160–167.ACM,2008.[3]Xavier Glorot,Antoine Bordes,and Yoshua Bengio.Domain adaptation for large-scale sentiment classi-fication:A deep learning approach.In ICML,513–520,2011.[4]Michael U Gutmann and Aapo Hyv¨a rinen.Noise-contrastive estimation of unnormalized statistical mod-els,with applications to natural image statistics.The Journal of Machine Learning Research,13:307–361, 2012.[5]Tomas Mikolov,Stefan Kombrink,Lukas Burget,Jan Cernocky,and Sanjeev Khudanpur.Extensions ofrecurrent neural network language model.In Acoustics,Speech and Signal Processing(ICASSP),2011 IEEE International Conference on,pages5528–5531.IEEE,2011.[6]Tomas Mikolov,Anoop Deoras,Daniel Povey,Lukas Burget and Jan Cernocky.Strategies for TrainingLarge Scale Neural Network Language Models.In Proc.Automatic Speech Recognition and Understand-ing,2011.[7]Tomas Mikolov.Statistical Language Models Based on Neural Networks.PhD thesis,PhD Thesis,BrnoUniversity of Technology,2012.[8]Tomas Mikolov,Kai Chen,Greg Corrado,and Jeffrey Dean.Efficient estimation of word representationsin vector space.ICLR Workshop,2013.[9]Tomas Mikolov,Wen-tau Yih and Geoffrey Zweig.Linguistic Regularities in Continuous Space WordRepresentations.In Proceedings of NAACL HLT,2013.[10]Andriy Mnih and Geoffrey E Hinton.A scalable hierarchical distributed language model.Advances inneural information processing systems,21:1081–1088,2009.[11]Andriy Mnih and Yee Whye Teh.A fast and simple algorithm for training neural probabilistic languagemodels.arXiv preprint arXiv:1206.6426,2012.[12]Frederic Morin and Yoshua Bengio.Hierarchical probabilistic neural network language model.In Pro-ceedings of the international workshop on artificial intelligence and statistics,pages246–252,2005. [13]David E Rumelhart,Geoffrey E Hintont,and Ronald J Williams.Learning representations by back-propagating errors.Nature,323(6088):533–536,1986.[14]Holger Schwenk.Continuous space language puter Speech and Language,vol.21,2007.[15]Richard Socher,Cliff C.Lin,Andrew Y.Ng,and Christopher D.Manning.Parsing natural scenes andnatural language with recursive neural networks.In Proceedings of the26th International Conference on Machine Learning(ICML),volume2,2011.[16]Richard Socher,Brody Huval,Christopher D.Manning,and Andrew Y.Ng.Semantic CompositionalityThrough Recursive Matrix-Vector Spaces.In Proceedings of the2012Conference on Empirical Methods in Natural Language Processing(EMNLP),2012.[17]Joseph Turian,Lev Ratinov,and Yoshua Bengio.Word representations:a simple and general method forsemi-supervised learning.In Proceedings of the48th Annual Meeting of the Association for Computa-tional Linguistics,pages384–394.Association for Computational Linguistics,2010.[18]Peter D.Turney and Patrick Pantel.From frequency to meaning:Vector space models of semantics.InJournal of Artificial Intelligence Research,37:141-188,2010.[19]Peter D.Turney.Distributional semantics beyond words:Supervised learning of analogy and paraphrase.In Transactions of the Association for Computational Linguistics(TACL),353–366,2013.[20]Jason Weston,Samy Bengio,and Nicolas Usunier.Wsabie:Scaling up to large vocabulary image annota-tion.In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence-Volume Volume Three,pages2764–2770.AAAI Press,2011.。
Inference of Population Structure Using Multilocus Genotype Data
Copyright©2000by the Genetics Society of AmericaInference of Population Structure Using Multilocus Genotype DataJonathan K.Pritchard,Matthew Stephens and Peter DonnellyDepartment of Statistics,University of Oxford,Oxford OX13TG,United KingdomManuscript received September23,1999Accepted for publication February18,2000ABSTRACTWe describe a model-based clustering method for using multilocus genotype data to infer populationstructure and assign individuals to populations.We assume a model in which there are K populations(where K may be unknown),each of which is characterized by a set of allele frequencies at each locus.Individuals in the sample are assigned(probabilistically)to populations,or jointly to two or more popula-tions if their genotypes indicate that they are admixed.Our model does not assume a particular mutationprocess,and it can be applied to most of the commonly used genetic markers,provided that they are notclosely linked.Applications of our method include demonstrating the presence of population structure,assigning individuals to populations,studying hybrid zones,and identifying migrants and admixed individu-als.We show that the method can produce highly accurate assignments using modest numbers of loci—e.g.,seven microsatellite loci in an example using genotype data from an endangered bird species.The softwareused for this article is available from /فpritch/home.html.I N applications of population genetics,it is often use-populations based on these subjective criteria representsa natural assignment in genetic terms,and it would beful to classify individuals in a sample into popula-tions.In one scenario,the investigator begins with a useful to be able to confirm that subjective classifications sample of individuals and wants to say something aboutare consistent with genetic information and hence ap-the properties of populations.For example,in studies propriate for studying the questions of interest.Further, of human evolution,the population is often consideredthere are situations where one is interested in“cryptic”to be the unit of interest,and a great deal of work has population structure—i.e.,population structure that isdifficult to detect using visible characters,but may be focused on learning about the evolutionary relation-ships of modern populations(e.g.,Cavalli et al.1994).significant in genetic terms.For example,when associa-In a second scenario,the investigator begins with a settion mapping is used tofind disease genes,the presence of predefined populations and wishes to classify individ-of undetected population structure can lead to spurious uals of unknown origin.This type of problem arisesassociations and thus invalidate standard tests(Ewens in many contexts(reviewed by Davies et al.1999).A and Spielman1995).The problem of cryptic population standard approach involves sampling DNA from mem-structure also arises in the context of DNAfingerprint-bers of a number of potential source populations and ing for forensics,where it is important to assess thedegree of population structure to estimate the probabil-using these samples to estimate allele frequencies inity of false matches(Balding and Nichols1994,1995; each population at a series of unlinked ing theForeman et al.1997;Roeder et al.1998).estimated allele frequencies,it is then possible to com-Pritchard and Rosenberg(1999)considered how pute the likelihood that a given genotype originated ingenetic information might be used to detect the pres-each population.Individuals of unknown origin can beence of cryptic population structure in the association assigned to populations according to these likelihoodsmapping context.More generally,one would like to be Paetkau et al.1995;Rannala and Mountain1997).able to identify the actual subpopulations and assign In both situations described above,a crucialfirst stepindividuals(probabilistically)to these populations.In is to define a set of populations.The definition of popu-this article we use a Bayesian clustering approach to lations is typically subjective,based,for example,ontackle this problem.We assume a model in which there linguistic,cultural,or physical characters,as well as theare K populations(where K may be unknown),each of geographic location of sampled individuals.This subjec-which is characterized by a set of allele frequencies at tive approach is usually a sensible way of incorporatingeach locus.Our method attempts to assign individuals diverse types of information.However,it may be difficultto populations on the basis of their genotypes,while to know whether a given assignment of individuals tosimultaneously estimating population allele frequen-cies.The method can be applied to various types ofmarkers[e.g.,microsatellites,restriction fragment Corresponding author:Jonathan Pritchard,Department of Statistics,length polymorphisms(RFLPs),or single nucleotide University of Oxford,1S.Parks Rd.,Oxford OX13TG,United King-dom.E-mail:pritch@ polymorphisms(SNPs)],but it assumes that the marker Genetics155:945–959(June2000)946J.K.Pritchard,M.Stephens and P.Donnellyloci are unlinked and at linkage equilibrium with one observations from each cluster are random draws another within populations.It also assumes Hardy-Wein-from some parametric model.Inference for the pa-berg equilibrium within populations.(We discuss these rameters corresponding to each cluster is then done assumptions further in background on clusteringjointly with inference for the cluster membership of methods and the discussion.)each individual,using standard statistical methods Our approach is reminiscent of that taken by Smouse(for example,maximum-likelihood or Bayesian et al.(1990),who used the EM algorithm to learn about methods).the contribution of different breeding populations to aDistance-based methods are usually easy to apply and sample of salmon collected in the open ocean.It is alsoare often visually appealing.In the genetics literature,it closely related to the methods of Foreman et al.(1997)has been common to adapt distance-based phylogenetic and Roeder et al.(1998),who were concerned withalgorithms,such as neighbor-joining,to clustering estimating the degree of cryptic population structuremultilocus genotype data(e.g.,Bowcock et al.1994). to assess the probability of obtaining a false match atHowever,these methods suffer from many disadvan-DNAfingerprint loci.Consequently they focused ontages:the clusters identified may be heavily dependent estimating the amount of genetic differentiation amongon both the distance measure and graphical representa-the unobserved populations.In contrast,our primarytion chosen;it is difficult to assess how confident we interest lies in the assignment of individuals to popula-should be that the clusters obtained in this way are tions.Our approach also differs in that it allows for themeaningful;and it is difficult to incorporate additional presence of admixed individuals in the sample,whoseinformation such as the geographic sampling locations genetic makeup is drawn from more than one of the Kof individuals.Distance-based methods are thus more populations.suited to exploratory data analysis than tofine statistical In the next section we provide a brief descriptioninference,and we have chosen to take a model-based of clustering methods in general and describe someapproach here.advantages of the model-based approach we take.TheThefirst challenge when applying model-based meth-details of the models and algorithms used are given inods is to specify a suitable model for observations from models and methods.We illustrate our method witheach cluster.To make our discussion more concrete we several examples in applications to data:both onintroduce very briefly some of our model and notation simulated data and on sets of genotype data from anhere;a fuller treatment is given later.Assume that each endangered bird species and from humans.incorpo-cluster(population)is modeled by a characteristic set rating population information describes how ourof allele frequencies.Let X denote the genotypes of the method can be extended to incorporate geographicsampled individuals,Z denote the(unknown)popula-information into the inference process.This may betions of origin of the individuals,and P denote the useful for testing whether particular individuals are mi-(unknown)allele frequencies in all populations.(Note grants or to assist in classifying individuals of unknownthat X,Z,and P actually represent multidimensional origin(as in Rannala and Mountain1997,for exam-vectors.)Our main modeling assumptions are Hardy-ple).Background on the computational methods usedWeinberg equilibrium within populations and complete in this article is provided in the appendix.linkage equilibrium between loci within populations.Under these assumptions each allele at each locus ineach genotype is an independent draw from the appro-BACKGROUND ON CLUSTERING METHODSpriate frequency distribution,and this completely speci-Consider a situation where we have genetic data fromfies the probability distribution Pr(X|Z,P)(given later a sample of individuals,each of whom is assumed toin Equation2).Loosely speaking,the idea here is that have originated from a single unknown population(nothe model accounts for the presence of Hardy-Weinberg admixture).Suppose we wish to cluster together individ-or linkage disequilibrium by introducing population uals who are genetically similar,identify distinct clusters,structure and attempts tofind population groupings and perhaps see how these clusters relate to geographi-that(as far as possible)are not in disequilibrium.While cal or phenotypic data on the individuals.There areinference may depend heavily on these modeling as-broadly two types of clustering methods we might use:sumptions,we feel that it is easier to assess the validityof explicit modeling assumptions than to compare the 1.Distance-based methods.These proceed by calculatingrelative merits of more abstract quantities such as dis-a pairwise distance matrix,whose entries give thetance measures and graphical representations.In situa-distance(suitably defined)between every pair of in-tions where these assumptions are deemed unreason-dividuals.This matrix may then be represented usingable then alternative models should be built.some convenient graphical representation(such as aHaving specified our model,we must decide how to tree or a multidimensional scaling plot)and clustersperform inference for the quantities of interest(Z and may be identified by eye.2.Model-based methods.These proceed by assuming that P).Here,we have chosen to adopt a Bayesian approach,947Inferring Population Structureby specifying models(priors)Pr(Z)and Pr(P),for both Assume that before observing the genotypes we haveZ and P.The Bayesian approach provides a coherent no information about the population of origin of eachframework for incorporating the inherent uncertainty individual and that the probability that individual i origi-of parameter estimates into the inference procedure nated in population k is the same for all k,and for evaluating the strength of evidence for the in-Pr(z(i)ϭk)ϭ1/K,(3) ferred clustering.It also eases the incorporation of vari-ous sorts of prior information that may be available,independently for all individuals.(In cases where somesuch as information about the geographic sampling lo-populations may be more heavily represented in thecation of individuals.sample than others,this assumption is inappropriate;itHaving observed the genotypes,X,our knowledge would be straightforward to extend our model to dealabout Z and P is then given by the posterior distribution with such situations.)We follow the suggestion of Balding and Nichols Pr(Z,P|X)ϰPr(Z)Pr(P)Pr(X|Z,P).(1)(1995)(see also Foreman et al.1997and Rannala While it is not usually possible to compute this distribu-and Mountain1997)in using the Dirichlet distri-tion exactly,it is possible to obtain an approximate bution to model the allele frequencies at each locus sample(Z(1),P(1)),(Z(2),P(2)),...,(Z(M),P(M))from Pr(Z,within each population.The Dirichlet distributionP|X)using Markov chain Monte Carlo(MCMC)meth-D(1,2,...,J)is a distribution on allele frequenciesods described below(see Gilks et al.1996b,for more pϭ(p1,p2,...,p J)with the property that these frequen-general background).Inference for Z and P may then cies sum to1.We use this distribution to specify the be based on summary statistics obtained from this sam-probability of a particular set of allele frequencies pkl·ple(see Inference for Z,P,and Q below).A brief introduc-for population k at locus l,tion to MCMC methods and Gibbs sampling may befound in the appendix.pkl·فD(1,2,...,J l),(4)independently for each k,l.The expected frequency of MODELS AND METHODS allele j is proportional toj,and the variance of thisfrequency decreases as the sum of thej increases.We We now provide a more detailed description of ourtake1ϭ2ϭ···ϭJ lϭ1.0,which gives a uniform modeling assumptions and the algorithms used to per-distribution on the allele frequencies;alternatives are form inference,beginning with the simpler case wherediscussed in the discussion.each individual is assumed to have originated in a singleMCMC algorithm(without admixture):Equations2, population(no admixture).3,and4define the quantities Pr(X|Z,P),Pr(Z),and The model without admixture:Suppose we genotypePr(P),respectively.By settingϭ(1,2)ϭ(Z,P)and N diploid individuals at L loci.In the case without admix-letting(Z,P)ϭPr(Z,P|X)we can use the approach ture,each individual is assumed to originate in one ofoutlined in Algorithm A1to construct a Markov chain K populations,each with its own characteristic set ofwith stationary distribution Pr(Z,P|X)as follows: allele frequencies.Let the vector X denote the observedAlgorithm1:Starting with initial values Z(0)for Z(by genotypes,Z the(unknown)populations of origin ofdrawing Z(0)at random using(3)for example),iterate the the individuals,and P the(unknown)allele frequenciesfollowing steps for mϭ1,2,....in the populations.These vectors consist of the follow-ing elements,Step1.Sample P(m)from Pr(P|X,Z(mϪ1)).(x(i,1)l,x(i,2)l)ϭgenotype of the i th individual at the l th locus,Step2.Sample Z(m)from Pr(Z|X,P(m)).where iϭ1,2,...,N and lϭ1,2,...,L;z(i)ϭpopulation from which individual i originated;Informally,step1corresponds to estimating the allele p kljϭfrequency of allele j at locus l in population k,frequencies for each population assuming that the pop-where kϭ1,2,...,K and jϭ1,2,...,J l,ulation of origin of each individual is known;step2 where J l is the number of distinct alleles observed at corresponds to estimating the population of origin of locus l,and these alleles are labeled1,2,...,J l.each individual,assuming that the population allele fre-Given the population of origin of each individual,quencies are known.For sufficiently large m and c,(Z(m), the genotypes are assumed to be generated by drawing P(m)),(Z(mϩc),P(mϩc)),(Z(mϩ2c),P(mϩ2c)),...will be approxi-alleles independently from the appropriate population mately independent random samples from Pr(Z,P|X). frequency distributions,The distributions required to perform each step aregiven in the appendix.Pr(x(i,a)lϭj|Z,P)ϭp z(i)lj(2)The model with admixture:We now expand ourmodel to allow for admixed individuals by introducing independently for each x(i,a)l.(Note that p z(i)lj is the fre-a vector Q to denote the admixture proportions for each quency of allele j at locus l in the population of originof individual i.)individual.The elements of Q are948J.K.Pritchard,M.Stephens and P.Donnellyq (i )k ϭproportion of individual i ’s genome thattion of origin of each allele copy in each individual isknown;step 2corresponds to estimating the population originated from population k.of origin of each allele copy,assuming that the popula-It is also necessary to modify the vector Z to replace the tion allele frequencies and the admixture proportions assumption that each individual i originated in some are known.As before,for sufficiently large m and c ,unknown population z (i )with the assumption that each (Z (m ),P (m ),Q (m )),(Z (m ϩc ),P (m ϩc ),Q (m ϩc )),(Z (m ϩ2c ),P (m ϩ2c ),observed allele copy x (i ,a )l originated in some unknown Q (m ϩ2c )),...will be approximately independent random population z (i ,a )l :samples from Pr(Z ,P ,Q |X ).The distributions required to perform each step are given in the appendix.z (i ,a )l ϭpopulation of origin of allele copy x (i ,a )l .Inference:Inference for Z,P,and Q:We now discuss how We use the term “allele copy”to refer to an allele carried the MCMC output can be used to perform inference on at a particular locus by a particular individual.Z ,P ,and Q.For simplicity,we focus our attention on Q ;Our primary interest now lies in estimating Q.We inference for Z or P is similar.proceed in a manner similar to the case without admix-Having obtained a sample Q (1),...,Q (M )(using suitably ture,beginning by specifying a probability model for large burn-in m and thinning interval c )from the poste-(X ,Z ,P ,Q ).Analogues of (2)and (3)arerior distribution of Q ϭ(q 1,...,q N )given X using the MCMC method,it is desirable to summarize the Pr(x (i ,a )l ϭj |Z ,P ,Q )ϭp z (i ,a )l lj(5)information contained,perhaps by a point estimate of andQ.A seemingly obvious estimate is the posterior meanPr(z (i ,a )l ϭk |P ,Q )ϭq (i )k ,(6)E (q i |X )≈1M ͚M m ϭ1q (m )i .(8)with (4)being used to model P as before.To complete our model we need to specify a distribution for Q ,which However,the symmetry of our model implies that the in general will depend on the type and amount of admix-posterior mean of q i is (1/K ,1/K ,...,1/K )for all i ,ture we expect to see.Here we model the admixturewhatever the value of X.For example,suppose that there proportions q (i )ϭ(q (i )1,...,q (i )K )of individual i using are just two populations and 10individuals and that the the Dirichlet distributiongenotypes of these individuals contain strong informa-tion that the first 5are in one population and the second q (i )فD (␣,␣,...,␣)(7)5are in the other population.Then eitherindependently for each individual.For large values of ␣(ӷ1),this models each individual as having allele q 1...q 5≈(1,0)and q 6...q 10≈(0,1)(9)copies originating from all K populations in equal pro-orportions.For very small values of ␣(Ӷ1),it models each individual as originating mostly from a single popu-q 1...q 5≈(0,1)and q 6...q 10≈(1,0),(10)lation,with each population being equally likely.As with these two “symmetric modes”being equally likely,␣→0this model becomes the same as our model leading to the expectation of any given q i being (0.5,without admixture (although the implementation of the 0.5).This is essentially a problem of nonidentifiability MCMC algorithm is somewhat different).We allow ␣caused by the symmetry of the model [see Stephens to range from 0.0to 10.0and attempt to learn about ␣(2000b)for more discussion].from the data (specifically we put a uniform prior on In general,if there are K populations then there will ␣[0,10]and use a Metropolis-Hastings update step be K !sets of symmetric modes.Typically,MCMC to integrate out our uncertainty in ␣).This model may schemes find it rather difficult to move between such be considered suitable for situations where little is modes,and the algorithms we describe will usually ex-known about admixture;alternatives are discussed in plore only one of the symmetric modes,even when run the discussion.for a very large number of iterations.Fortunately this MCMC algorithm (with admixture):The following does not bother us greatly,since from the point of algorithm may be used to sample from Pr(Z ,P ,Q |X ).view of clustering all the symmetric modes are the same Algorithm 2:Starting with initial values Z (0)for Z (by drawing Z (0)at random using (3)for example),iterate the [compare the clusterings corresponding to (9)and following steps for m ϭ1,2,....(10)].If our sampler explores only one symmetric mode then the sample means (8)will be very poor estimates Step 1.Sample P (m ),Q (m )from Pr(P ,Q |X ,Z (m Ϫ1)).of the posterior means for the q i ,but will be much better Step 2.Sample Z (m )from Pr(Z |X ,P (m ),Q (m )).estimates of the modes of the q i ,which in this case turn Step 3.Update ␣using a Metropolis-Hastings step.out to be a much better summary of the information in the data.Ironically then,the poor mixing of the Informally,step 1corresponds to estimating the allele MCMC sampler between the symmetric modes gives frequencies for each population and the admixture pro-portions of each individual,assuming that the popula-the asymptotically useless estimator (8)some practical949Inferring Population Structure value.Where the MCMC sampler succeeds in moving Simulated data:To test the performance of the clus-tering method in cases where the “answers”are known,between symmetric modes,or where it is desired to combine results from samples obtained using different we simulated data from three population models,using standard coalescent techniques (Hudson 1990).We as-starting points (which may involve combining results corresponding to different modes),more sophisticated sumed that sampled individuals were genotyped at a series of unlinked microsatellite loci.Data were simu-methods [such as those described by Stephens (2000b)]may be required.lated under the following models.Inference for the number of populations:The problem of Model 1:A single random-mating population of con-inferring the number of clusters,K ,present in a data stant size.set is notoriously difficult.In the Bayesian paradigm the Model 2:Two random-mating populations of constant way to proceed is theoretically straightforward:place a effective population size 2N.These were assumed to prior distribution on K and base inference for K on the have split from a single ancestral population,also of posterior distributionsize 2N at a time N generations in the past,with no subsequent migration.Pr(K |X )ϰPr(X |K )Pr(K ).(11)Model 3:Admixture of populations.Two discrete popu-However,this posterior distribution can be peculiarly lations of equal size,related as in model 2,were fused dependent on the modeling assumptions made,even to produce a single random-mating population.Sam-where the posterior distributions of other quantities (Q ,ples were collected after two generations of random Z ,and P ,say)are relatively robust to these assumptions.mating in the merged population.Thus,individuals Moreover,there are typically severe computational chal-have i grandparents from population 1,and 4Ϫi lenges in estimating Pr(X |K ).We therefore describe an grandparents from population 2with probability alternative approach,which is motivated by approximat-(4i )/16,where i {0,4}.All loci were simulated inde-ing (11)in an ad hoc and computationally convenient pendently.way.We present results from analyzing data sets simulated Arguments given in the appendix (Inference on K,the under each model.Data set 1was simulated under number of populations )suggest estimating Pr(X |K )usingmodel 1,with 5microsatellite loci.Data sets 2A and 2B Pr(X |K )≈exp(Ϫˆ/2Ϫˆ2/8),(12)were simulated under model 2,with 5and 15microsatel-lite loci,respectively.Data set 3was simulated under wheremodel 3,with 60loci (preliminary analyses with fewer loci showed this to be a much harder problem than ˆϭ1M ͚M m ϭ1Ϫ2log Pr(X |Z (m ),P (m ),Q (m ))(13)models 1and 2).Microsatellite mutation was modeled by a simple stepwise mutation process,with the mutation andparameter 4N set at 16.0per locus (i.e.,the expected variance in repeat scores within populations was 8.0).ˆ2ϭ1M ͚Mm ϭ1(Ϫ2log Pr(X |Z (m ),P (m ),Q (m ))Ϫˆ)2.We did not make use of the assumed mutation model in analyzing the simulated data.(14)Our analysis consists of two phases.First,we consider We use (12)to estimate Pr(X |K )for each K and substi-the issue of model choice—i.e.,how many populations tute these estimates into (11)to approximate the poste-are most appropriate for interpreting the data.Then,rior distribution Pr(K |X ).we examine the clustering of individuals for the inferred In fact,the assumptions underlying (12)are dubious number of populations.at best,and we do not claim (or believe)that our proce-Choice of K for simulated data:For each model,we dure provides a quantitatively accurate estimate of the ran a series of independent runs of the Gibbs sampler posterior distribution of K.We see it merely as an ad for each value of K (the number of populations)be-hoc guide to which models are most consistent with the tween 1and 5.The results presented are based on runs data,with the main justification being that it seems of 106iterations or more,following a burn-in period of to give sensible answers in practice (see next section for at least 30,000iterations.To choose the length of the examples).Notwithstanding this,for convenience we burn-in period,we printed out log(Pr(X |P (m ),Q (m ))),and continue to refer to “estimating”Pr(K |X )and Pr(X |K ).several other summary statistics during the course of a series of trial runs,to estimate how long it took to reach (approximate)stationarity.To check for possible prob-APPLICATIONS TO DATAlems with mixing,we compared the estimates of P (X |K )and other summary statistics obtained over several inde-We now illustrate the performance of our method on both simulated data and real data (from an endangered pendent runs of the Gibbs sampler,starting from differ-ent initial points.In general,substantial differences be-bird species and from humans).The analyses make use of the methods described in The model with admixture.tween runs can indicate that either the runs should950J.K.Pritchard,M.Stephens and P.DonnellyTABLE 1Estimated posterior probabilities of K ,for simulated data sets 1,2A,2B,and 3(denoted X 1,X 2A ,X 2B ,and X 3,respectively)K log P (K |X 1)P (K |X 2A )P (K |X 2B )P (K |X 3)1ف1.0ف0.0ف0.0ف0.02ف0.00.210.999ف1.03ف0.00.580.0009ف0.04ف0.00.21ف0.0ف0.05ف0.0ف0.0ف0.0ف0.0The numbers should be regarded as a rough guide to which models are consistent with the data,rather than accurate esti-mates of posterior probabilities.Figure 1.—Summary of the clustering results for simulated data sets 2A and 2B,respectively.For each individual,webe longer to obtain more accurate estimates or that computed the mean value of q (i )1(the proportion of ancestry independent runs are getting stuck in different modes in population 1),over a single run of the Gibbs sampler.Thein the parameter space.(Here,we consider the K !dashed line is a histogram of mean values of q (i )1for individuals from population 0;the solid line is for individuals from popula-modes that arise from the nonidentifiability of the K tion 1.populations to be equivalent,since they arise from per-muting the K population labels.)We found that in most cases we obtained consistent and Q estimating the number of grandparents from estimates of P (X |K )across independent runs.However,each of the two original populations,for each individual.when analyzing data set 2A with K ϭ3,the Gibbs sampler Intuitively it seems that another plausible clustering found two different modes.This data set actually con-would be with K ϭ5,individuals being assigned to tains two populations,and when K is set to 3,one of clusters according to how many grandparents they have the populations expands to fill two of the three clusters.from each population.In biological terms,the solution It is somewhat arbitrary which of the two populations with K ϭ2is more natural and is indeed the inferred expands to fill the extra cluster:this leads to two modes value of K for this data set using our ad hoc guide [the of slightly different heights.The Gibbs sampler did not estimated value of Pr(X |K )was higher for K ϭ5than manage to move between the two modes in any of our for K ϭ3,4,or 6,but much lower than for K ϭ2].runs.However,this raises an important point:the inferred In Table 1we report estimates of the posterior proba-value of K may not always have a clear biological inter-bilities of values of K ,assuming a uniform prior on K pretation (an issue that we return to in the discussion ).between 1and 5,obtained as described in Inference for Clustering of simulated data:Having considered the the number of populations.We repeat the warning given problem of estimating the number of populations,we there that these numbers should be regarded as rough now examine the performance of the clustering algo-guides to which models are consistent with the data,rithm in assigning particular individuals to the appro-rather than accurate estimates of the posterior probabil-priate populations.In the case where the populations ities.In the case where we found two modes (data set are discrete,the clustering performs very well (Figure 2A,K ϭ3),we present results based on the mode that 1),even with just 5loci (data set 2A),and essentially gave the higher estimate of Pr(X |K ).perfectly with 15loci (data set 2B).With all four simulated data sets we were able to The case with admixture (Figure 2)appears to be correctly infer whether or not there was population more difficult,even using many more loci.However,structure (K ϭ1for data set 1and K Ͼ1otherwise).the clustering algorithm did manage to identify the In the case of data set 2A,which consisted of just 5population structure appropriately and estimated the loci,there is not a clear estimate of K ,as the posterior ancestry of individuals with reasonable accuracy.Part probability is consistent with both the correct value,K ϭof the reason that this problem is difficult is that it is 2,and also with K ϭ3or 4.However,when the number hard to estimate the original allele frequencies (before of loci was increased to 15(data set 2B),virtually all of admixture)when almost all the individuals (7/8)are the posterior probability was on the correct number of admixed.A more fundamental problem is that it is diffi-populations,K ϭ2.cult to get accurate estimates of q (i )for particular individ-Data set 3was simulated under a more complicated uals because (as can be seen from the y -axis of Figure model,where most individuals have mixed ancestry.In 2)for any given individual,the variance of how many this case,the population was formed by admixture of two populations,so the “true”clustering is with K ϭ2,of its alleles are actually derived from each population。
Recruitment_order_of_quadriceps_motor_units_femoral_nerve_vs._direct_quadriceps_stimulation1
Eur J Appl Physiol (2013) 113:3069–3077DOI 10.1007/s00421-013-2736-2Recruitment order of quadriceps motor units: femoral nerve vs. direct quadriceps stimulationJavier Rodriguez‑Falces · Nicolas PlaceReceived: 4 July 2013 / Accepted: 23 September 2013 / Published online: 6 October 2013 © Springer-Verlag Berlin Heidelberg 2013Conclusions Femoral nerve stimulation activated MUs according to the size principle, whereas the recruitment order during direct quadriceps stimulation was more complex, depending ultimately on the architecture of the peripheral nerve and its terminal branches below the stimu-lating electrodes for each muscle. For the VM, MUs were orderly recruited for both stimulation geometries, whereas, for the VL muscle, MUs were orderly recruited for femo-ral nerve stimulation, but followed no particular order for direct quadriceps stimulation.Keywords Transcutaneous electrical stimulation ·Femoral nerve stimulation · Direct quadriceps stimulation · Motor unit recruitment order · Time-to-peak twitch · M-wave latencyIntroductionTranscutaneous electrical stimulation of skeletal muscles has been largely adopted for treatment of paralysed patients (Gerrits et al. 1999), for rehabilitation purposes (Glinsky et al. 2007; Newsam and Baker 2004) and it has become an invaluable technique for the diagnosis of neuromuscu-lar disorders (Henderson et al. 2006; Blok et al. 2007), and for the evaluation of neuromuscular fatigue (Millet et al. 2011). In the quadriceps muscle, transcutaneous stimula-tion can be applied using large electrodes placed over the muscle belly, so that electrical pulses excite both the par-ent axons and their terminal branches (hereafter referred to as direct quadriceps stimulation), while the stimulating electrodes could also be positioned over a major peripheral nerve, thereby exciting only the parent axons (from now on referred to as femoral nerve stimulation) (Rodriguez-Falces et al. 2013).AbstractIntroduction To investigate potential differences in the recruitment order of motor units (MUs) in the quadriceps femoris when electrical stimulation is applied over the quadriceps belly versus the femoral nerve.Methods M-waves and mechanical twitches were evoked using femoral nerve stimulation and direct quadriceps stim-ulation of gradually increasing intensity from 20 young, healthy subjects. Recruitment order was investigated by analysing the time-to-peak twitch and the time interval from the stimulus artefact to the M-wave positive peak (M-wave latency) for the vastus medialis (VM) and vastus lateralis (VL) muscles.Results During femoral nerve stimulation, time-to-peak twitch and M-wave latency decreased consistently (P < 0.05) with increasing stimulus intensity, whereas, during graded direct quadriceps stimulation, time-to-peak twitch and VL M-wave latency did not show a clear trend (P > 0.05). For the VM muscle, M-wave latency decreased with increasing stimulation level for both femoral nerve and direct quadriceps stimulation, whereas, for the VL muscle, the variation of M-wave latency with stimulus intensity was different for the two stimulation geometries (P < 0.05).Communicated by Toshio Moritani.J. Rodriguez-Falces (*)Department of Electrical and Electronical Engineering,Universidad Pública de Navarra D.I.E.E, Campus de Arrosadía s/n., 31006 Pamplona, Spaine-mail: javier.rodriguez.falces@N. PlaceFaculty of Medicine, Institute of Movement Sciences and Sport Medicine, University of Geneva, Geneva, SwitzerlandIn the last decades, much research has been directed towards understanding how the motor units (MUs) within a muscle are spatially recruited, and in what order they are recruited during electrical stimulation of increasing inten-sity. The order of activation of MUs is a more critical issue than their spatial activation when the main goal of electrical stimulation is to restore function of paralysed muscles. This is so because, when stimulated, paretic muscles fatigue very rapidly and a proposed method to reduce this exag-gerated fatigue is by selectively recruiting the more fatigue-resistant MUs (Godfrey et al. 2002; Thomas et al. 2002). However, during transcutaneous stimulation, determination of the MU activation order is difficult since the likelihood that an axon (or axonal branch) is depolarized depends not only on the diameter of the axon (McNeal 1976; Rat-tay 1990), but also, and especially, on geometrical factors, such as its orientation with respect to the current field, and the distance from the stimulation electrode (Knaflitz et al. 1990; Grill and Mortimer 1995; Feiereisen et al. 1997; Nilsson et al. 1997). Not surprisingly, experiments aimed at elucidating the MU recruitment order in humans have yielded conflicting results. For example, when stimulat-ing at the level of the nerve trunk, some studies support the view that MUs are recruited according to the size prin-ciple (Godfrey et al. 2002; Thomas et al. 2002), whereas others indicate a reverse order (Heyters et al. 1994). Simi-larly, when stimulating at the level of the muscle (mainly at the motor point), some reports show that MUs tend to be activated from slow to fast (Knaflitz et al. 1990; Farina et al. 2004), others suggest a reverse order (Heyters et al. 1994; Feiereisen et al. 1997), whereas the most recent stud-ies agree that MUs are activated with no specific order, i.e., randomly (Gregory and Bickel 2005; Jubeau et al. 2007; Bickel et al. 2011). It is the first objective of the present study to investigate possible differences in the recruitment order of MUs in the quadriceps femoris between femoral nerve and direct quadriceps stimulation.In the present study, the recruitment order of MUs was examined for the vastus medialis (VM) and vastus lateralis (VL) muscles. These muscles differ in their anatomical and structural specificities (Lieber and Fridén 2000; Blazevich et al. 2006) as well as in the spatial organization of the axon terminal branches (Sung et al. 2003; Becker et al. 2010; Botter et al. 2011). In fact, our recent study shows impor-tant differences in the spatial recruitment of MUs in the VM and VL muscles during direct quadriceps stimulation (Rodriguez-Falces et al. 2013). Thus, our second objective is to examine possible differences in the MU recruitment order between these muscles.The order of MU activation was investigated by studying the relationship between time-to-peak twitch and the inten-sity of the electrical current during gradually increased contractions, as done by Trimble and Enoka (1991) and Heyters et al. (1994). Activation order in the VM and VL muscles was analysed by calculating the time interval from the stimulus artefact to the positive peak of the M-wave (from now on referred to as M-wave latency, see Fig. 1). Thus, M-wave latency reflects propagation along both the nerve axon and muscle fibres. Since large MUs are inner-vated by large axons (which have a greater propagation velocity in comparison to smaller ones), M-wave latency should be related to the size of the recruited MUs. In addi-tion, the M-wave latency parameter is expected to be highly correlated with time-to-peak twitch and so to provide fur-ther insight into MU recruitment order.Materials and methodsSubjectsTwenty-two healthy subjects [5 women, age (mean ± SD): 29 ± 2 years; height: 173 ± 6 cm; weight: 71 ± 7 kg] vol-unteered to participate in the study. The study was con-ducted in accordance with the Declaration of Helsinki, and was approved by the Ethics Committee of the University Hospitals of Geneva (protocol 11-062). Written informed consent was obtained from all participants before inclusion. Subjects were asked not to take part in vigorous physical activity for 2 days prior to their test date.Experimental setupAll the assessments were completed on the dominant quadriceps muscle under isometric conditions. Subjects were seated on a custom-built chair with a knee angle of 90° and a trunk-thigh angle of 100°. Extraneous move-ments of the upper body were limited by two crossover shoulder harnesses and a belt across the lower abdomen. Quadriceps force was recorded using a strain gauge (STS, SWJ, China, linear range 0–2,452 N, sensitivity 2 mV/V and 0.0017 V/N) that was attached to the chair and securely strapped to the ankle with a custom-made mould. Force sig-nal was recorded at 1 kHz using an AD conversion system (MP150; BIOPAC, Goleta, CA, USA). The experimental procedure (electrode positioning and stimulation protocol) has been described in detail in a previous work (Rodriguez-Falces et al. 2013) and will be summarized briefly below. ElectromyographyM-waves were recorded from the VL and VM muscles in response to femoral nerve stimulation and direct quadriceps stimulation (see next subsections for details). Pairs of sil-ver chloride, circular (recording diameter, 20 mm) surface electrodes (Kendall Meditrace 100, Tyco, Canada) werepositioned lengthwise over respective muscle bellies, with an interelectrode distance (center-to-center) of 20 mm. For the VL, the distal electrode was placed ~5 cm above the lateral side of the patella, and for the VM, the distal electrode was ~3 cm above the medial side of the patella. The ground electrode was fixed over the ipsilateral patella. These recording sites were slightly adjusted by eliciting the largest M-wave via femoral nerve stimulation at the begin-ning of each test (Place et al. 2006). Light skin abrasion followed by skin cleansing kept electrical impedance below 10 kΩ. EMG signals were amplified with a pass-band of 10 Hz–1 kHz and digitized online at a sampling frequency of 5 kHz.Stimulation protocolStimulus pulses (single shocks) of increasing intensity were delivered approximately every 10 s for both stimula-tion geometries (femoral nerve and direct quadriceps stim-ulation). Specifically, current intensity was progressively increased from 0 mA to the value beyond which there was no further increase in M-wave amplitude (I max), in steps of 10 mA. Since the I max of VM and VL muscles did not coincide in most of the cases, stimulus intensity had to be increased until the M-waves of both muscles reached a pla-teau. For both stimulation geometries, rectangular pulses of 1 ms were produced via a constant current stimulator (Model DS7AH; Digitimer, Hertfordshire, UK).Femoral nerve stimulationThe femoral nerve was stimulated using a circular (diam-eter 5.08 cm) self-adhesive electrode (Dermatrode, Ameri-can Imex, Irvine, CA, USA) positioned in the femoral triangle, 3–5 cm below the inguinal ligament. A large (5 × 10 cm) rectangular self-adhesive electrode (Compex, Ecublens, Switzerland) was fixed over the gluteal fold to close the stimulation current loop.Direct quadriceps stimulationThe quadriceps muscle was stimulated using two large (15 × 9 cm) self-adhesive electrodes (Dermatrode, Ameri-can Imex, Irvine, CA, USA) placed 5–10 cm below the inguinal crease (proximal electrode) and 10–15 cm above the superior border of the patella (distal electrode). For sub-jects with short limbs, the short distance between the distal stimulation electrode and the EMG recording electrodesFig. 1 Representative examples of M-waves and of twitches elicited by femoral nerve stimulation (first row) and direct quadriceps stim-ulation (second row) in one subject at intensities of 20, 40, 60, 80, and 100 % I max M-waves were recorded from the VM (first column) and vastus lateralis (second column) muscles. The time courses of the evoked responses have been truncated in order to appreciate more clearly the latency of the M-waves and the time-to-peak of the twitch. For the same reason, a black dot has been drawn at the M-wave first peak and also at the twitch peakincreased the likelihood of overlapping between the stimu-lus artefact and the M-wave. To avoid this, M-waves were carefully inspected and, if necessary, electrodes were slightly relocated. For two subjects, the problem of over-lapping could not be solved and therefore these subjects were discarded.Data analysisData were first analysed with a commercially available soft-ware (Acqknowledge software, Biopac Systems, Goleta, CA, USA) to monitor for any abnormality in M-wave and twitch traces. Subsequently, data were exported to Matlab (version R2009b; The Math-Works, MA, USA) for quanti-tative analysis using a number of ad-hoc scripts.To investigate the order of MU activation during graded electrical stimulation several authors have analysed how time-to-peak twitch varies with increasing level of stim-ulus current (Trimble and Enoka 1991; Heyters et al. 1994; Jubeau et al. 2007). The present study followed this approach and so, for each twitch evoked by femoral nerve and direct quadriceps stimulation, time-to-peak was calculated and related to its corresponding current inten-sity (Fig. 1). Time-to-peak twitch was defined as the time from the resting response level to the peak force value. To improve the accuracy of time-to-peak determination, the measurement was obtained from the smoothed twitch (smoothing over ten points) as done by Heyters et al. (1994).Further insight into MU recruitment order during electri-cal stimulation was obtained by studying the relationship between M-wave latency (Fig. 1) and the level of stimula-tion current. The M-wave latency represents the time taken for the excitation to propagate from the stimulation site to the position along the muscle fibres under the vertical of the first (proximal) EMG electrode (Farina et al. 2004). The advantage of this measurement is that it is more reliable (and easier) to determine than other parameters such as the onset of the M-wave (Nodera et al. 2006). The hypothesis that M-wave latency is closely related to the size or the activated MUs is grounded on the good correlation between the size of the stimulated axon, its conduction velocity and the size of the innervated MU (Burke et al. 1973). Confir-mation of this hypothesis can be obtained from the obser-vation of Fig. 2 of Thomas et al. (2002), where M-wave latency can be seen to decrease with increasing size of the MUs. Further support for the validity of the M-wave latency as an indicator of the MU size comes from the sim-ulation study of Farina et al. (2004).In order to obtain the average variation of time-to-peak twitch and M-wave latency with increasing stimulus inten-sity for the whole study group, we first normalized, for each subject, the current intensity to the range [0, 100] (where 100 is I max), and then calculated the average variation curve. For the sake of clarity, only the stimulation intensi-ties of 20, 40, 60, 80, and 100 % I max were considered for statistical analysis (Place et al. 2010). Data of time-to-peak twitch and M-wave latency were expressed both in absolute terms and relative to the value obtained at 20 % I max.Peak twitch force and M-wave amplitude were calcu-lated for all stimulation levels described above for both femoral nerve and direct quadriceps stimulation. These data were needed in order to check that full MU recruitment was achieved at 100 % I max for both stimulation geometries.StatisticsStatistical analyses were performed on group data using SPSS software. Kolmogorov–Smirnov test showed that all data were normally distributed. Differences in absolute values of M-wave amplitude and peak twitch force evoked at 100 % I max between femoral nerve and direct quadri-ceps stimulation were examined using paired t-tests. Pear-son’s correlation (r) was used to determine the relationshipFig. 2 Average variation of VM M-wave latency (a), VL M-wave latency (b), and time-to-peak twitch (c) with different levels of stimu-lus intensity. Data are expressed as mean ± SEM (N= 20). *P < 0.05 indicates difference between muscle and nerve stimulation for the same intensity. #P < 0.05 indicates difference from the value at 20 % I max, where I max maximal stimulation intensitybetween changes in M-wave latency and time-to-peak twitch. Three-way repeated-measures ANOV A [stimula-tion geometry (femoral nerve vs. direct quadriceps stim-ulation) × stimulation level (20, 40, 60, 80 vs. 100 % I max ) × muscle (VL vs. VM)] was applied on M-wave latency. Two-way repeated-measures ANOV A [stimulation geometry (femoral nerve vs. direct quadriceps stimula-tion) × stimulation level (20, 40, 60, 80 vs. 100 % I max )] was applied on time-to-peak twitch. When main effects or interactions were significant, the Tukey’s Honestly Significant Difference post hoc test for pairwise compari-sons was applied. Significance was set at P < 0.05. Data were presented as mean ± SD in the tables and text and as mean ± SEM in the figures.ResultsWhen stimulus current was gradually increased via femo-ral nerve and direct quadriceps stimulation, a plateau in M-wave amplitude and peak twitch force was obtained at 100 % I max for all participants. No significant differences in peak twitch force were observed at 100 % I max between nerve and direct quadriceps stimulation (58.1 ± 6.2 vs. 59.8 ± 7.1 N, respectively, P = 0.38). Similarly, for both VM and VL muscles, the M-waves evoked at 100 % I max using the two stimulation geometries had comparable amplitudes (for the VM 8.9 ± 1.3 vs. 9.1 ± 1.1 mV , respec-tively, P = 0.28 and, for the VL 5.9 ± 1.5 vs. 5.6 ± 1.4 mV , respectively, P = 0.32). More details about how M-wave amplitude and peak twitch force changed with increasing stimulation level for both stimulation geometries can be found elsewhere (Rodriguez-Falces et al. 2013).Figure 1 provides representative traces of M-waves and twitches recorded in one subject during femoral nerve and direct quadriceps stimulation (first and second rows, respectively) at stimulation intensities of 20, 40, 60, 80, and 100 % I max (different levels of grey).As can be seen in Fig. 1a, b, for the VM and VL M-waves evoked by femoral nerve stimulation, M-wave latency decreased monotonically with increasing stimulus inten-sity. Also, for the twitches elicited via nerve stimulation (Fig. 1c), time-to-peak decreased as stimulation level was raised from 20 to 100 % I max . However, when direct quadri-ceps stimulation was used to evoke the M-waves in the VL (Fig. 1e), M-wave latency remained largely unchanged between consecutive stimulation levels. Similarly, for the twitches evoked via direct quadriceps stimulation (Fig. 1f), time-to-peak did not show a clear trend when stimulation level was raised stepwise from 20 to 100 % I max . Notewor-thy, for the M-waves evoked in the VM muscle using this stimulation geometry (Fig. 1d), M-wave latency decreased consistently with increasing stimulus intensity.Figure 2a, b shows, for the VM and VL muscles, respec-tively, the average variation of M-wave latency with stimu-lation intensity during muscle and nerve stimulation. The average variation of time-to-peak twitch with stimulus cur-rent for the aggregate data is shown in Fig. 2c.As can be seen in Fig. 2a, for the VM muscle, M-wave latency decreased monotonically with increasing stimu-lus intensity for both femoral nerve and direct quadriceps stimulation. Note that, for both stimulation geometries, the M-wave latencies at 60, 80, and 100 % I max were signifi-cantly lower as compared to the latency at 20 % I max . For this muscle, no stimulation geometry × stimulation level interaction (P > 0.05) was found for M-wave latency. With regard to the absolute values, M-wave latency at 100 % I max was significantly longer for femoral nerve than for direct quadriceps stimulation (see Table 1).In the case of the VL muscle (Fig. 2b), the latency of the M-waves evoked via direct quadriceps stimulation first slightly increased from 20 to 40 % I max and then under-went a steady decrease throughout the rest of stimulation levels, but none of these changes reached statistical signifi-cance. Contrary to this, the latency of the M-waves evoked by femoral nerve stimulation decreased monotonicallyTable 1 Mean (SD) values of M-wave latency (in ms) for the VM and VL muscles and of time-to-peak twitch (in ms) at different stimulation levels Nerve and muscle stimulation refer to femoral nerve and direct quadriceps stimulation, respectively P -values refer to the differences between the stimulation geometries for the same intensityParameters (% I max )M-wave latency (VM)M-wave latency (VL)Time-to-peak twitch (quadriceps)Nerve stimulation Muscle stimulation P-value Nerve stimulation Muscle stimulation P -value Nerve stimulation Muscle stimulation P -value 2011.8 (2.6)8.6 (3.1)<0.00113.3 (2.2)9.8 (3.8)<0.00192.5 (7.0)85.5 (7.1)0.0024011.4 (2.6)8.2 (3.0)<0.00112.6 (2.1)10.2 (3.4)<0.00188.2 (6.8)87.6 (6.9)0.196011.1 (2.7)7.7 (3.1)<0.00111.8 (2.1)9.6 (3.9)<0.00186.2 (6.6)85.2 (6.9)0.178010.7 (2.7)7.4 (3.0)<0.00111.4 (2.0)9.3 (3.7)<0.00183.1 (6.2)82.8 (6.6)0.4110010.3 (2.8)7.3 (3.1)<0.00111.1 (2.9)9.0 (3.8)<0.00181.9 (6.0)83.0 (6.1)0.32throughout all stimulation levels and the values at 60, 80 and 100 % I max were significantly lower than that at 20 % I max For this muscle, a significant stimulation geome-try × stimulation level interaction was found (P < 0.001). As for the absolute values, M-wave latency at 100 % I max was significantly longer for femoral nerve than for direct quadriceps stimulation (see Table 1).With regard to the twitch analysis (Fig. 2c), the time-to-peak of the twitches elicited via femoral nerve stimulation decreased monotonically throughout all stimulation levels and this decrease was found to be statistically significant (the time-to-peak values at 40, 60, 80, and 100 % I max were significantly lower than that at 20 % I max). In contrast, for direct quadriceps stimulation, time-to-peak twitch did not show a clear trend: first increased from 20 to 40 % I max, then underwent a mild decrease from 40 to 80 % I max, and finally, it increased again from 80 to 100 % I max (none of these changes reached statistical significance). For time-to-peak twitch, there was a significant stimulation geom-etry × stimulation level interaction (P < 0.001). In absolute terms, time-to-peak twitch at 100 % I max was rather simi-lar for femoral nerve than for direct quadriceps stimulation (see Table 1).For femoral nerve stimulation, the correlation between M-wave latency and time-to-peak twitch was significant for both the VM (r= 0.98, P= 0.0015) and VL (r= 0.98, P= 0.0013) muscles. The high correlation for this stimula-tion geometry can be visually appreciated in the scatter plot of Fig. 3a. For direct quadriceps stimulation, the correla-tion between M-wave latency and time-to-peak twitch was significant for the VL (r= 0.96, P= 0.008), but not for the VM (r= 0.74, P= 0.154). This is in visual conformity with the scatter plot of Fig. 3b.DiscussionThe main findings of the present study were: (1) with fem-oral nerve stimulation, time-to-peak twitch and M-wave latency decreased consistently with increasing stimulus intensity, whereas, during graded direct quadriceps stimula-tion, time-to-peak twitch and VL M-wave latency did not show a clear trend, and (2) for the VM muscle, M-wave latency showed the same behaviour (i.e., decreased) with increasing stimulation level for both femoral nerve and direct quadriceps stimulation, whereas, for the VL muscle, the variation of M-wave latency with stimulus intensity was different for the two stimulation geometries.Factors affecting the MU recruitment orderTheoretical fundamentals established that mainly two fac-tors (the excitability threshold of motoneuron axons and the location of the axons in respect to the current field) determine MU activation order (Blair and Erlanger 1933; Rattay 1990; Grill and Mortimer 1995) during electri-cal stimulation. When the stimulation is applied via the skin surface (as in the present study), the current den-sity field decreases rapidly with depth and, for this rea-son, the location and orientation of the axon in relation to applied current are more decisive factors than its excit-ability threshold in determining whether an axon is depo-larized (Knaflitz et al. 1990; Grill and Mortimer 1995). In these circumstances, the stimulation geometry adopted becomes a critical aspect. In monopolar configuration (as our femoral nerve stimulation), the current lines traverse from the cathode to the anode across the muscle, whereas for the bipolar arrangement (as our direct quadriceps stimulation), the current lines run through the longitudi-nal direction of the muscle (Merletti et al. 1992). Alterna-tive configurations have been used by other authors such as Heyters et al. (1994) and Trimble and Enoka (1991). This variety of stimulation geometries, together with the narrow range of axon diameters and conduction velocities in some nerves (Westling et al. 1990; Thomas and Wes-tling 1995), would partly explain the conflicting results reported in the literature regarding the MU recruitment order.Fig. 3 Scatter plots of the mean absolute values of M-wave latency vs. mean absolute val-ues of time-to-peak twitch for femoral nerve stimulation (a) and direct quadriceps stimula-tion (b) at different levels of stimulus intensity. The solid lines are the linear regression lines of best fit. For the sakeof clarity, data of SEM are not reported in the figure but these values are displayed in Fig. 2Recruitment order for femoral nerve and direct quadriceps stimulationOur results showed that time-to-peak twitch exhibited a steady significant decrease with increasing stimulus inten-sity with femoral nerve stimulation, which strongly indi-cates that MUs were recruited according to the size prin-ciple (i.e., from slow-twitch to fast-twitch). It could be argued that, besides the characteristics of the recruited MUs, changes in muscle compliance could have also con-tributed to time-to-peak twitch reduction for these sub-jects (Heyters et al. 1994). However, we also found a fairly consistent decrease in both VM and VL M-wave latency with increasing stimulation level, which, on the basis of simulations (Farina et al. 2004), reinforces our hypothesis of an orderly recruitment of MUs with nerve stimulation. Remarkably, the investigations of Thomas et al. (2002) and Godfrey et al. (2002) involving monopolar stimulation of the median nerve revealed that a slow-twitch to fast-twitch activation order generally occurred for the entire MU population in thenar muscles, which is in concordance with our results. In addition, our results are very close to those obtained by Knaflitz et al. (1990) who, like us, used a monopolar configuration in their experiments (although they stimulated the motor point, not the nerve trunk). Our findings are, however, in disagreement with the twitch analysis performed by Heyters et al. (1994) on the adduc-tor pollicis. These discrepancies are probably related to the fact that the authors did not use a monopolar arrangement to stimulate the ulnar nerve at the wrist.When using direct quadriceps stimulation with the bipo-lar configuration, there was little evidence that MUs were recruited from slow to fast: only for two subjects did time-to-peak twitch and M-wave latency decreased throughout all intensity levels. A reverse order of MU recruitment could not be proved either. Our data revealed that time-to-peak twitch and VL M-wave latency did not show a con-sistent behaviour with increasing stimulus intensity, which leads us to regard the MU activation during direct quadri-ceps stimulation as “random”. Thus, our results provide confirmation of the conclusions reported by recent publica-tions (Gregory and Bickel 2005; Jubeau et al. 2007; Bickel et al. 2011).Inter-muscle similarities and differencesFor the VM it was found that the variation of M-wave latency with increasing stimulus current was similar for femoral nerve and direct quadriceps stimulation, which strongly suggest that MUs were recruited in roughly the same order with the two stimulation geometries throughout all intensity levels (from 20 to 100 % I max). The compara-ble response of VM M-wave latency to femoral nerve and direct quadriceps stimulation is consistent with our recent study (Rodriguez-Falces et al. 2013) which showed that, for this particular muscle, the spatial distribution of the MUs that are activated at a given stimulation amplitude was similar for both stimulation geometries. The most plausible explanation for these similarities is that both femoral nerve stimulation and direct quadriceps stimulation would depo-larize the nerve trunk innervating the VM. This is based on the observation that the motor branch of the VM out from the femoral nerve does not have many subdivisions (Bot-ter et al. 2011) and so stimulation over the quadriceps belly would depolarize this nerve trunk just more distally. As a result, stimulating the VM at the level of the muscle belly would activate MUs in essentially the same order (and with the same spatial distribution) as when the stimulation is applied at the level of the femoral nerve.For the VL muscle, there were strong evidences sug-gesting that the MU recruitment occurred in a different order for the two stimulation geometries tested. For femo-ral nerve stimulation, M-wave latency decreased consist-ently with increasing stimulation level and this decrease was found statistically significant, thereby indicating that MUs were activated according to the size principle (Farina et al. 2004). On the contrary, during stimulation over the quadriceps belly, M-wave latency did not show a consist-ent behaviour with stimulation current, supporting the view that MUs were recruited randomly. The different recruitment order of MUs found in the VL with femoral nerve and direct quadriceps stimulation is in good con-cordance with the different spatial recruitment of MUs encountered in this muscle with these stimulation geom-etries (Rodriguez-Falces et al. 2013). The most likely explanation for the above differences is that the motor branch of the VL out from the femoral nerve trunk has a complex architecture, including many subdivisions (Sung et al. 2003; Becker et al. 2010; Botter et al. 2011), which facilitates the scattering of the motor nerve endings of the VL throughout the cross section of this muscle. As a result of this branching, during direct quadriceps stimula-tion of increasing intensity, the current density injected in the VL would excite the nerve endings from the superficial to the deeper regions. Such spatial activation of the axonal branches is believed to lead to a random recruitment of MUs (Gregory and Bickel 2005; Jubeau et al. 2007; Bickel et al. 2011).From the above, two conclusions can be derived. First, during direct quadriceps stimulation, MU recruitment order (as well as MU spatial recruitment) ultimately depends on the architecture of the peripheral nerve and its terminal branches below the stimulating electrodes. Second, the fact that, both during graded femoral nerve and direct quadri-ceps stimulation, time-to-peak twitch and VL M-wave latency exhibit the same behaviour, further reinforces the。
Anomaly Detection A Survey(综述)
A modified version of this technical report will appear in ACM Computing Surveys,September2009. Anomaly Detection:A SurveyVARUN CHANDOLAUniversity of MinnesotaARINDAM BANERJEEUniversity of MinnesotaandVIPIN KUMARUniversity of MinnesotaAnomaly detection is an important problem that has been researched within diverse research areas and application domains.Many anomaly detection techniques have been specifically developed for certain application domains,while others are more generic.This survey tries to provide a structured and comprehensive overview of the research on anomaly detection.We have grouped existing techniques into different categories based on the underlying approach adopted by each technique.For each category we have identified key assumptions,which are used by the techniques to differentiate between normal and anomalous behavior.When applying a given technique to a particular domain,these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain.For each category,we provide a basic anomaly detection technique,and then show how the different existing techniques in that category are variants of the basic tech-nique.This template provides an easier and succinct understanding of the techniques belonging to each category.Further,for each category,we identify the advantages and disadvantages of the techniques in that category.We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains.We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic,and how techniques developed in one area can be applied in domains for which they were not intended to begin with.Categories and Subject Descriptors:H.2.8[Database Management]:Database Applications—Data MiningGeneral Terms:AlgorithmsAdditional Key Words and Phrases:Anomaly Detection,Outlier Detection1.INTRODUCTIONAnomaly detection refers to the problem offinding patterns in data that do not conform to expected behavior.These non-conforming patterns are often referred to as anomalies,outliers,discordant observations,exceptions,aberrations,surprises, peculiarities or contaminants in different application domains.Of these,anomalies and outliers are two terms used most commonly in the context of anomaly detection; sometimes interchangeably.Anomaly detectionfinds extensive use in a wide variety of applications such as fraud detection for credit cards,insurance or health care, intrusion detection for cyber-security,fault detection in safety critical systems,and military surveillance for enemy activities.The importance of anomaly detection is due to the fact that anomalies in data translate to significant(and often critical)actionable information in a wide variety of application domains.For example,an anomalous traffic pattern in a computerTo Appear in ACM Computing Surveys,092009,Pages1–72.2·Chandola,Banerjee and Kumarnetwork could mean that a hacked computer is sending out sensitive data to an unauthorized destination[Kumar2005].An anomalous MRI image may indicate presence of malignant tumors[Spence et al.2001].Anomalies in credit card trans-action data could indicate credit card or identity theft[Aleskerov et al.1997]or anomalous readings from a space craft sensor could signify a fault in some compo-nent of the space craft[Fujimaki et al.2005].Detecting outliers or anomalies in data has been studied in the statistics commu-nity as early as the19th century[Edgeworth1887].Over time,a variety of anomaly detection techniques have been developed in several research communities.Many of these techniques have been specifically developed for certain application domains, while others are more generic.This survey tries to provide a structured and comprehensive overview of the research on anomaly detection.We hope that it facilitates a better understanding of the different directions in which research has been done on this topic,and how techniques developed in one area can be applied in domains for which they were not intended to begin with.1.1What are anomalies?Anomalies are patterns in data that do not conform to a well defined notion of normal behavior.Figure1illustrates anomalies in a simple2-dimensional data set. The data has two normal regions,N1and N2,since most observations lie in these two regions.Points that are sufficiently far away from the regions,e.g.,points o1 and o2,and points in region O3,are anomalies.Fig.1.A simple example of anomalies in a2-dimensional data set. Anomalies might be induced in the data for a variety of reasons,such as malicious activity,e.g.,credit card fraud,cyber-intrusion,terrorist activity or breakdown of a system,but all of the reasons have a common characteristic that they are interesting to the analyst.The“interestingness”or real life relevance of anomalies is a key feature of anomaly detection.Anomaly detection is related to,but distinct from noise removal[Teng et al. 1990]and noise accommodation[Rousseeuw and Leroy1987],both of which deal To Appear in ACM Computing Surveys,092009.Anomaly Detection:A Survey·3 with unwanted noise in the data.Noise can be defined as a phenomenon in data which is not of interest to the analyst,but acts as a hindrance to data analysis. Noise removal is driven by the need to remove the unwanted objects before any data analysis is performed on the data.Noise accommodation refers to immunizing a statistical model estimation against anomalous observations[Huber1974]. Another topic related to anomaly detection is novelty detection[Markou and Singh2003a;2003b;Saunders and Gero2000]which aims at detecting previously unobserved(emergent,novel)patterns in the data,e.g.,a new topic of discussion in a news group.The distinction between novel patterns and anomalies is that the novel patterns are typically incorporated into the normal model after being detected.It should be noted that solutions for above mentioned related problems are often used for anomaly detection and vice-versa,and hence are discussed in this review as well.1.2ChallengesAt an abstract level,an anomaly is defined as a pattern that does not conform to expected normal behavior.A straightforward anomaly detection approach,there-fore,is to define a region representing normal behavior and declare any observation in the data which does not belong to this normal region as an anomaly.But several factors make this apparently simple approach very challenging:—Defining a normal region which encompasses every possible normal behavior is very difficult.In addition,the boundary between normal and anomalous behavior is often not precise.Thus an anomalous observation which lies close to the boundary can actually be normal,and vice-versa.—When anomalies are the result of malicious actions,the malicious adversaries often adapt themselves to make the anomalous observations appear like normal, thereby making the task of defining normal behavior more difficult.—In many domains normal behavior keeps evolving and a current notion of normal behavior might not be sufficiently representative in the future.—The exact notion of an anomaly is different for different application domains.For example,in the medical domain a small deviation from normal(e.g.,fluctuations in body temperature)might be an anomaly,while similar deviation in the stock market domain(e.g.,fluctuations in the value of a stock)might be considered as normal.Thus applying a technique developed in one domain to another is not straightforward.—Availability of labeled data for training/validation of models used by anomaly detection techniques is usually a major issue.—Often the data contains noise which tends to be similar to the actual anomalies and hence is difficult to distinguish and remove.Due to the above challenges,the anomaly detection problem,in its most general form,is not easy to solve.In fact,most of the existing anomaly detection techniques solve a specific formulation of the problem.The formulation is induced by various factors such as nature of the data,availability of labeled data,type of anomalies to be detected,etc.Often,these factors are determined by the application domain inTo Appear in ACM Computing Surveys,092009.4·Chandola,Banerjee and Kumarwhich the anomalies need to be detected.Researchers have adopted concepts from diverse disciplines such as statistics ,machine learning ,data mining ,information theory ,spectral theory ,and have applied them to specific problem formulations.Figure 2shows the above mentioned key components associated with any anomaly detection technique.Anomaly DetectionTechniqueApplication DomainsMedical InformaticsIntrusion Detection...Fault/Damage DetectionFraud DetectionResearch AreasInformation TheoryMachine LearningSpectral TheoryStatisticsData Mining...Problem CharacteristicsLabels Anomaly Type Nature of Data OutputFig.2.Key components associated with an anomaly detection technique.1.3Related WorkAnomaly detection has been the topic of a number of surveys and review articles,as well as books.Hodge and Austin [2004]provide an extensive survey of anomaly detection techniques developed in machine learning and statistical domains.A broad review of anomaly detection techniques for numeric as well as symbolic data is presented by Agyemang et al.[2006].An extensive review of novelty detection techniques using neural networks and statistical approaches has been presented in Markou and Singh [2003a]and Markou and Singh [2003b],respectively.Patcha and Park [2007]and Snyder [2001]present a survey of anomaly detection techniques To Appear in ACM Computing Surveys,092009.Anomaly Detection:A Survey·5 used specifically for cyber-intrusion detection.A substantial amount of research on outlier detection has been done in statistics and has been reviewed in several books [Rousseeuw and Leroy1987;Barnett and Lewis1994;Hawkins1980]as well as other survey articles[Beckman and Cook1983;Bakar et al.2006].Table I shows the set of techniques and application domains covered by our survey and the various related survey articles mentioned above.12345678TechniquesClassification Based√√√√√Clustering Based√√√√Nearest Neighbor Based√√√√√Statistical√√√√√√√Information Theoretic√Spectral√ApplicationsCyber-Intrusion Detection√√Fraud Detection√Medical Anomaly Detection√Industrial Damage Detection√Image Processing√Textual Anomaly Detection√Sensor Networks√Table parison of our survey to other related survey articles.1-Our survey2-Hodge and Austin[2004],3-Agyemang et al.[2006],4-Markou and Singh[2003a],5-Markou and Singh [2003b],6-Patcha and Park[2007],7-Beckman and Cook[1983],8-Bakar et al[2006]1.4Our ContributionsThis survey is an attempt to provide a structured and a broad overview of extensive research on anomaly detection techniques spanning multiple research areas and application domains.Most of the existing surveys on anomaly detection either focus on a particular application domain or on a single research area.[Agyemang et al.2006]and[Hodge and Austin2004]are two related works that group anomaly detection into multiple categories and discuss techniques under each category.This survey builds upon these two works by significantly expanding the discussion in several directions. We add two more categories of anomaly detection techniques,viz.,information theoretic and spectral techniques,to the four categories discussed in[Agyemang et al.2006]and[Hodge and Austin2004].For each of the six categories,we not only discuss the techniques,but also identify unique assumptions regarding the nature of anomalies made by the techniques in that category.These assumptions are critical for determining when the techniques in that category would be able to detect anomalies,and when they would fail.For each category,we provide a basic anomaly detection technique,and then show how the different existing techniques in that category are variants of the basic technique.This template provides an easier and succinct understanding of the techniques belonging to each category.Further, for each category we identify the advantages and disadvantages of the techniques in that category.We also provide a discussion on the computational complexity of the techniques since it is an important issue in real application domains.To Appear in ACM Computing Surveys,092009.6·Chandola,Banerjee and KumarWhile some of the existing surveys mention the different applications of anomaly detection,we provide a detailed discussion of the application domains where anomaly detection techniques have been used.For each domain we discuss the notion of an anomaly,the different aspects of the anomaly detection problem,and the challenges faced by the anomaly detection techniques.We also provide a list of techniques that have been applied in each application domain.The existing surveys discuss anomaly detection techniques that detect the sim-plest form of anomalies.We distinguish the simple anomalies from complex anoma-lies.The discussion of applications of anomaly detection reveals that for most ap-plication domains,the interesting anomalies are complex in nature,while most of the algorithmic research has focussed on simple anomalies.1.5OrganizationThis survey is organized into three parts and its structure closely follows Figure 2.In Section2we identify the various aspects that determine the formulation of the problem and highlight the richness and complexity associated with anomaly detection.We distinguish simple anomalies from complex anomalies and define two types of complex anomalies,viz.,contextual and collective anomalies.In Section 3we briefly describe the different application domains where anomaly detection has been applied.In subsequent sections we provide a categorization of anomaly detection techniques based on the research area which they belong to.Majority of the techniques can be categorized into classification based(Section4),nearest neighbor based(Section5),clustering based(Section6),and statistical techniques (Section7).Some techniques belong to research areas such as information theory (Section8),and spectral theory(Section9).For each category of techniques we also discuss their computational complexity for training and testing phases.In Section 10we discuss various contextual anomaly detection techniques.We discuss various collective anomaly detection techniques in Section11.We present some discussion on the limitations and relative performance of various existing techniques in Section 12.Section13contains concluding remarks.2.DIFFERENT ASPECTS OF AN ANOMALY DETECTION PROBLEMThis section identifies and discusses the different aspects of anomaly detection.As mentioned earlier,a specific formulation of the problem is determined by several different factors such as the nature of the input data,the availability(or unavailabil-ity)of labels as well as the constraints and requirements induced by the application domain.This section brings forth the richness in the problem domain and justifies the need for the broad spectrum of anomaly detection techniques.2.1Nature of Input DataA key aspect of any anomaly detection technique is the nature of the input data. Input is generally a collection of data instances(also referred as object,record,point, vector,pattern,event,case,sample,observation,entity)[Tan et al.2005,Chapter 2].Each data instance can be described using a set of attributes(also referred to as variable,characteristic,feature,field,dimension).The attributes can be of different types such as binary,categorical or continuous.Each data instance might consist of only one attribute(univariate)or multiple attributes(multivariate).In To Appear in ACM Computing Surveys,092009.Anomaly Detection:A Survey·7 the case of multivariate data instances,all attributes might be of same type or might be a mixture of different data types.The nature of attributes determine the applicability of anomaly detection tech-niques.For example,for statistical techniques different statistical models have to be used for continuous and categorical data.Similarly,for nearest neighbor based techniques,the nature of attributes would determine the distance measure to be used.Often,instead of the actual data,the pairwise distance between instances might be provided in the form of a distance(or similarity)matrix.In such cases, techniques that require original data instances are not applicable,e.g.,many sta-tistical and classification based techniques.Input data can also be categorized based on the relationship present among data instances[Tan et al.2005].Most of the existing anomaly detection techniques deal with record data(or point data),in which no relationship is assumed among the data instances.In general,data instances can be related to each other.Some examples are sequence data,spatial data,and graph data.In sequence data,the data instances are linearly ordered,e.g.,time-series data,genome sequences,protein sequences.In spatial data,each data instance is related to its neighboring instances,e.g.,vehicular traffic data,ecological data.When the spatial data has a temporal(sequential) component it is referred to as spatio-temporal data,e.g.,climate data.In graph data,data instances are represented as vertices in a graph and are connected to other vertices with ter in this section we will discuss situations where such relationship among data instances become relevant for anomaly detection. 2.2Type of AnomalyAn important aspect of an anomaly detection technique is the nature of the desired anomaly.Anomalies can be classified into following three categories:2.2.1Point Anomalies.If an individual data instance can be considered as anomalous with respect to the rest of data,then the instance is termed as a point anomaly.This is the simplest type of anomaly and is the focus of majority of research on anomaly detection.For example,in Figure1,points o1and o2as well as points in region O3lie outside the boundary of the normal regions,and hence are point anomalies since they are different from normal data points.As a real life example,consider credit card fraud detection.Let the data set correspond to an individual’s credit card transactions.For the sake of simplicity, let us assume that the data is defined using only one feature:amount spent.A transaction for which the amount spent is very high compared to the normal range of expenditure for that person will be a point anomaly.2.2.2Contextual Anomalies.If a data instance is anomalous in a specific con-text(but not otherwise),then it is termed as a contextual anomaly(also referred to as conditional anomaly[Song et al.2007]).The notion of a context is induced by the structure in the data set and has to be specified as a part of the problem formulation.Each data instance is defined using following two sets of attributes:To Appear in ACM Computing Surveys,092009.8·Chandola,Banerjee and Kumar(1)Contextual attributes.The contextual attributes are used to determine thecontext(or neighborhood)for that instance.For example,in spatial data sets, the longitude and latitude of a location are the contextual attributes.In time-series data,time is a contextual attribute which determines the position of an instance on the entire sequence.(2)Behavioral attributes.The behavioral attributes define the non-contextual char-acteristics of an instance.For example,in a spatial data set describing the average rainfall of the entire world,the amount of rainfall at any location is a behavioral attribute.The anomalous behavior is determined using the values for the behavioral attributes within a specific context.A data instance might be a contextual anomaly in a given context,but an identical data instance(in terms of behavioral attributes)could be considered normal in a different context.This property is key in identifying contextual and behavioral attributes for a contextual anomaly detection technique.TimeFig.3.Contextual anomaly t2in a temperature time series.Note that the temperature at time t1is same as that at time t2but occurs in a different context and hence is not considered as an anomaly.Contextual anomalies have been most commonly explored in time-series data [Weigend et al.1995;Salvador and Chan2003]and spatial data[Kou et al.2006; Shekhar et al.2001].Figure3shows one such example for a temperature time series which shows the monthly temperature of an area over last few years.A temperature of35F might be normal during the winter(at time t1)at that place,but the same value during summer(at time t2)would be an anomaly.A similar example can be found in the credit card fraud detection domain.A contextual attribute in credit card domain can be the time of purchase.Suppose an individual usually has a weekly shopping bill of$100except during the Christmas week,when it reaches$1000.A new purchase of$1000in a week in July will be considered a contextual anomaly,since it does not conform to the normal behavior of the individual in the context of time(even though the same amount spent during Christmas week will be considered normal).The choice of applying a contextual anomaly detection technique is determined by the meaningfulness of the contextual anomalies in the target application domain. To Appear in ACM Computing Surveys,092009.Anomaly Detection:A Survey·9 Another key factor is the availability of contextual attributes.In several cases defining a context is straightforward,and hence applying a contextual anomaly detection technique makes sense.In other cases,defining a context is not easy, making it difficult to apply such techniques.2.2.3Collective Anomalies.If a collection of related data instances is anomalous with respect to the entire data set,it is termed as a collective anomaly.The indi-vidual data instances in a collective anomaly may not be anomalies by themselves, but their occurrence together as a collection is anomalous.Figure4illustrates an example which shows a human electrocardiogram output[Goldberger et al.2000]. The highlighted region denotes an anomaly because the same low value exists for an abnormally long time(corresponding to an Atrial Premature Contraction).Note that that low value by itself is not an anomaly.Fig.4.Collective anomaly corresponding to an Atrial Premature Contraction in an human elec-trocardiogram output.As an another illustrative example,consider a sequence of actions occurring in a computer as shown below:...http-web,buffer-overflow,http-web,http-web,smtp-mail,ftp,http-web,ssh,smtp-mail,http-web,ssh,buffer-overflow,ftp,http-web,ftp,smtp-mail,http-web...The highlighted sequence of events(buffer-overflow,ssh,ftp)correspond to a typical web based attack by a remote machine followed by copying of data from the host computer to remote destination via ftp.It should be noted that this collection of events is an anomaly but the individual events are not anomalies when they occur in other locations in the sequence.Collective anomalies have been explored for sequence data[Forrest et al.1999; Sun et al.2006],graph data[Noble and Cook2003],and spatial data[Shekhar et al. 2001].To Appear in ACM Computing Surveys,092009.10·Chandola,Banerjee and KumarIt should be noted that while point anomalies can occur in any data set,collective anomalies can occur only in data sets in which data instances are related.In contrast,occurrence of contextual anomalies depends on the availability of context attributes in the data.A point anomaly or a collective anomaly can also be a contextual anomaly if analyzed with respect to a context.Thus a point anomaly detection problem or collective anomaly detection problem can be transformed toa contextual anomaly detection problem by incorporating the context information.2.3Data LabelsThe labels associated with a data instance denote if that instance is normal or anomalous1.It should be noted that obtaining labeled data which is accurate as well as representative of all types of behaviors,is often prohibitively expensive. Labeling is often done manually by a human expert and hence requires substantial effort to obtain the labeled training data set.Typically,getting a labeled set of anomalous data instances which cover all possible type of anomalous behavior is more difficult than getting labels for normal behavior.Moreover,the anomalous behavior is often dynamic in nature,e.g.,new types of anomalies might arise,for which there is no labeled training data.In certain cases,such as air traffic safety, anomalous instances would translate to catastrophic events,and hence will be very rare.Based on the extent to which the labels are available,anomaly detection tech-niques can operate in one of the following three modes:2.3.1Supervised anomaly detection.Techniques trained in supervised mode as-sume the availability of a training data set which has labeled instances for normal as well as anomaly class.Typical approach in such cases is to build a predictive model for normal vs.anomaly classes.Any unseen data instance is compared against the model to determine which class it belongs to.There are two major is-sues that arise in supervised anomaly detection.First,the anomalous instances are far fewer compared to the normal instances in the training data.Issues that arise due to imbalanced class distributions have been addressed in the data mining and machine learning literature[Joshi et al.2001;2002;Chawla et al.2004;Phua et al. 2004;Weiss and Hirsh1998;Vilalta and Ma2002].Second,obtaining accurate and representative labels,especially for the anomaly class is usually challenging.A number of techniques have been proposed that inject artificial anomalies in a normal data set to obtain a labeled training data set[Theiler and Cai2003;Abe et al.2006;Steinwart et al.2005].Other than these two issues,the supervised anomaly detection problem is similar to building predictive models.Hence we will not address this category of techniques in this survey.2.3.2Semi-Supervised anomaly detection.Techniques that operate in a semi-supervised mode,assume that the training data has labeled instances for only the normal class.Since they do not require labels for the anomaly class,they are more widely applicable than supervised techniques.For example,in space craft fault detection[Fujimaki et al.2005],an anomaly scenario would signify an accident, which is not easy to model.The typical approach used in such techniques is to 1Also referred to as normal and anomalous classes.To Appear in ACM Computing Surveys,092009.Anomaly Detection:A Survey·11 build a model for the class corresponding to normal behavior,and use the model to identify anomalies in the test data.A limited set of anomaly detection techniques exist that assume availability of only the anomaly instances for training[Dasgupta and Nino2000;Dasgupta and Majumdar2002;Forrest et al.1996].Such techniques are not commonly used, primarily because it is difficult to obtain a training data set which covers every possible anomalous behavior that can occur in the data.2.3.3Unsupervised anomaly detection.Techniques that operate in unsupervised mode do not require training data,and thus are most widely applicable.The techniques in this category make the implicit assumption that normal instances are far more frequent than anomalies in the test data.If this assumption is not true then such techniques suffer from high false alarm rate.Many semi-supervised techniques can be adapted to operate in an unsupervised mode by using a sample of the unlabeled data set as training data.Such adaptation assumes that the test data contains very few anomalies and the model learnt during training is robust to these few anomalies.2.4Output of Anomaly DetectionAn important aspect for any anomaly detection technique is the manner in which the anomalies are reported.Typically,the outputs produced by anomaly detection techniques are one of the following two types:2.4.1Scores.Scoring techniques assign an anomaly score to each instance in the test data depending on the degree to which that instance is considered an anomaly. Thus the output of such techniques is a ranked list of anomalies.An analyst may choose to either analyze top few anomalies or use a cut-offthreshold to select the anomalies.2.4.2Labels.Techniques in this category assign a label(normal or anomalous) to each test instance.Scoring based anomaly detection techniques allow the analyst to use a domain-specific threshold to select the most relevant anomalies.Techniques that provide binary labels to the test instances do not directly allow the analysts to make such a choice,though this can be controlled indirectly through parameter choices within each technique.3.APPLICATIONS OF ANOMALY DETECTIONIn this section we discuss several applications of anomaly detection.For each ap-plication domain we discuss the following four aspects:—The notion of anomaly.—Nature of the data.—Challenges associated with detecting anomalies.—Existing anomaly detection techniques.To Appear in ACM Computing Surveys,092009.。
On Sequential Monte Carlo Sampling Methods for Bayesian Filtering
methods, see (Akashi et al., 1975)(Handschin et. al, 1969)(Handschin 1970)(Zaritskii et al., 1975). Possibly owing to the severe computational limitations of the time these Monte Carlo algorithms have been largely neglected until recently. In the late 80’s, massive increases in computational power allowed the rebirth of numerical integration methods for Bayesian filtering (Kitagawa 1987). Current research has now focused on MC integration methods, which have the great advantage of not being subject to the assumption of linearity or Gaussianity in the model, and relevant work includes (M¨ uller 1992)(West, 1993)(Gordon et al., 1993)(Kong et al., 1994)(Liu et al., 1998). The main objective of this article is to include in a unified framework many old and more recent algorithms proposed independently in a number of applied science areas. Both (Liu et al., 1998) and (Doucet, 1997) (Doucet, 1998) underline the central rˆ ole of sequential importance sampling in Bayesian filtering. However, contrary to (Liu et al., 1998) which emphasizes the use of hybrid schemes combining elements of importance sampling with Markov Chain Monte Carlo (MCMC), we focus here on computationally cheaper alternatives. We describe also how it is possible to improve current existing methods via Rao-Blackwellisation for a useful class of dynamic models. Finally, we show how to extend these methods to compute the prediction and fixed-interval smoothing distributions as well as the likelihood. The paper is organised as follows. In section 2, we briefly review the Bayesian filtering problem and classical Bayesian importance sampling is proposed for its solution. We then present a sequential version of this method which allows us to obtain a general recursive MC filter: the sequential importance sampling (SIS) filter. Under a criterion of minimum conditional variance of the importance weights, we obtain the optimal importance function for this method. Unfortunately, for numerous models of applied interest the optimal importance function leads to non-analytic importance weights, and hence we propose several suboptimal distributions and show how to obtain as special cases many of the algorithms presented in the literature. Firstly we consider local linearisation methods of either the state space model 3
最相似近邻法-概述说明以及解释
最相似近邻法-概述说明以及解释1.引言1.1 概述最相似近邻法是一种常用的机器学习算法,也被称为k近邻算法。
它是一种基于实例的学习方法,通过计算待预测样本与训练集中样本的相似度,来进行分类或回归预测。
该算法的核心思想是利用输入样本与训练集中已有样本的特征信息进行对比,找出与输入样本最相似的k个样本,并根据它们的标签信息来对输入样本进行分类或回归预测。
这种基于相似度的方法能够很好地捕捉样本之间的关系,适用于各种不规则分布的数据集。
最相似近邻法在实际应用中具有广泛的适用性,包括图像识别、推荐系统、医学诊断等领域。
尽管该算法存在一定的计算复杂度和需要大量存储空间的缺点,但其简单直观的原理和良好的泛化能力使其成为机器学习领域中不可或缺的一部分。
1.2 文章结构本文分为引言、正文和结论三个部分。
在引言部分,将对最相似近邻法进行概述,并介绍文章的结构和目的。
在正文部分,将详细介绍什么是最相似近邻法,以及它在不同应用领域的具体应用情况。
同时,将梳理最相似近邻法的优缺点,为读者提供全面的了解。
最后,在结论部分,将总结本文的主要内容,展望最相似近邻法的未来发展前景,并给出结论性的观点和建议。
整个文章将通过逻辑清晰的结构,带领读者深入理解和认识最相似近邻法的重要性和应用。
1.3 目的最相似近邻法是一种常用的机器学习算法,其主要目的是通过比较不同数据点之间的相似度,找出与目标数据点最相似的邻居。
通过这种方法,我们可以实现数据分类、推荐系统、图像识别等多种应用。
本文旨在深入探讨最相似近邻法的原理、应用领域以及优缺点,希望读者能更全面地了解这一算法,并在实际应用中取得更好的效果。
同时,我们也将展望最相似近邻法在未来的发展前景,为读者提供对未来研究方向的参考。
通过本文的阐述,希望读者能够更深入地理解最相似近邻法,为其在实际应用中提供更好的指导。
2.正文2.1 什么是最相似近邻法最相似近邻法是一种常用的机器学习算法,它通过计算数据样本之间的相似度来进行分类或回归预测。
euler ancestral sampling 解析 -回复
euler ancestral sampling 解析-回复Euler ancestral sampling 解析Euler ancestral sampling 是一种基于欧拉遗传算法的采样方法,可以用于从给定的目标分布中生成样本。
它基于马尔可夫链蒙特卡洛(MCMC)方法,在概率图模型中得到了广泛应用。
本文将逐步解析Euler ancestral sampling 的原理及应用,并探讨其优缺点。
1. 欧拉遗传算法欧拉遗传算法是一种基于遗传算法的优化方法,其灵感来源于生物学中的遗传进化。
该算法主要通过模拟生物个体遗传和变异过程来寻找最优解。
而Euler ancestral sampling 是将欧拉遗传算法与概率图模型相结合的一种技术。
2. 概率图模型概率图模型是一种用于描述随机变量之间依赖关系的图模型。
它将概率分布表示为一个图,其中节点代表随机变量,边表示变量之间的依赖关系。
常见的概率图模型有贝叶斯网络和马尔可夫随机场等。
利用概率图模型可以对复杂的概率分布进行建模,并进行各种推断和预测。
3. Euler ancestral sampling 的原理Euler ancestral sampling 是一种基于概率图模型的采样方法,在给定概率图模型的条件下,生成符合目标分布的样本。
它通过构建马尔可夫链蒙特卡洛采样算法,在模型中进行状态转移,最终生成样本。
具体而言,Euler ancestral sampling 的过程如下:- 初始化:随机选择一个初始状态,作为当前样本。
- 遗传变异:根据概率图模型中的条件概率,对当前样本进行变异,得到新的样本。
- 选择操作:根据变异后的样本与目标概率分布之间的距离评估,选择保留样本或丢弃样本。
- 迭代操作:通过遗传变异和选择操作,循环进行,直到满足采样需求的样本数量。
4. Euler ancestral sampling 的应用Euler ancestral sampling 在许多领域都得到了广泛的应用:- 受限玻尔兹曼机(RBM)的采样:RBM 是一种无向概率图模型,常用于降维、特征选择和生成模型。
An Introduction To Compressive Sampling全文翻译
图 1 当然,这个原则是大多数先进有损编码器的理论基础,这些有损编码器包括 JPEG-2000和其他压缩 格式,而一个简单的数据压缩方法就是由 f 计算得到 x ,然后(自适应的)编码求出 S 的位置和重要 系数的值。 由于重要信息段的位置可能预先未知 (它们与信号有关) , 这一过程需要知道所有 n 个系数 x , 因此这样一个过程需要所有 n 个系数 x 已知;在我们的例子中,那些重要信息往往聚集在图像的边缘位 置。一般而言,稀疏性是一个基本的建模要素,它能够带来高效率的基本信号处理;例如,精确的统计 估计和分类,有效的数据压缩,等等。本文所研究的内容大胆新颖且具有深远意义,其中稀疏性对于信 号采集起着重要支撑作用,稀疏性决定了如何有效、非自适应地采集信号。
2log n 。通过扩充具
有独立同分布元素的随机波形 k (t ) 也展示出与固定表示 有非常低的相干性,例如高斯分布或 1 二 进制项。注意到这里有一个非常奇特的暗示:如果非相干系统感知是良好的,那么有效的机制就应该获 得与随机波形的关联,例如白噪声!
欠采样和稀疏信号的重构
理论上, 我们希望可以测量 f 的所有 n 个系数, 但实际上我们只观测它的一个子集, 采集的数据为:
等于或接近于 1,那么 S log n 次采样就足够了,而不需要 n 次采样。 3) 在事先不知道 x 非零坐标个数、位置的条件下,信号 f 可以利用极小化凸泛函得到的压缩数据 集来重构,关于他们的幅值事先完全未知。 这个定理确实提供了一种非常具体的捕获方案: 在非相干域的非自适应采样和在采集之后的线性规 划。按照这一方法,可以获得压缩形式的信号。所需要的是一个解码器去解压数据。这就是 l1 -极小化 所起的作用。 事实上,这个非相干采样是早先谱稀疏信号采样结果的推广,由此展现了随机性是一个可靠的证明,并 可以成为一个非常有效的传感机制,也许正是因此引发了现在 CS 蓬勃的发展。假设我们对超宽带采样 感兴趣,但谱稀疏信号的形式为:
博士生发一篇information fusion
博士生发一篇information fusion Information Fusion: Enhancing Decision-Making through the Integration of Data and KnowledgeIntroduction:Information fusion, also known as data fusion or knowledge fusion, is a rapidly evolving field in the realm of decision-making. It involves the integration and analysis of data and knowledge from various sources to generate meaningful and accurate information. In this article, we will delve into the concept of information fusion, explore its key components, discuss its application in different domains, and highlight its significance in enhancingdecision-making processes.1. What is Information Fusion?Information fusion is the process of combining data and knowledge from multiple sources to provide a comprehensive and accurate representation of reality. The goal is to overcome the limitations inherent in individual sources and derive improved insights and predictions. By assimilating diverse information,information fusion enhances situational awareness, reduces uncertainty, and enables intelligent decision-making.2. Key Components of Information Fusion:a. Data Sources: Information fusion relies on various data sources, which can include sensors, databases, social media feeds, and expert opinions. These sources provide different types of data, such as text, images, audio, and numerical measurements.b. Data Processing: Once data is collected, it needs to be processed to extract relevant features and patterns. This step involves data cleaning, transformation, normalization, and aggregation to ensure compatibility and consistency.c. Information Extraction: Extracting relevant information is a crucial step in information fusion. This includes identifying and capturing the crucial aspects of the data, filtering out noise, and transforming data into knowledge.d. Knowledge Representation: The extracted information needs to be represented in a meaningful way for integration and analysis.Common methods include ontologies, semantic networks, and knowledge graphs.e. Fusion Algorithms: To integrate the information from various sources, fusion algorithms are employed. These algorithms can be rule-based, model-based, or data-driven, and they combine multiple pieces of information to generate a unified and coherent representation.f. Decision-Making Processes: The ultimate goal of information fusion is to enhance decision-making. This requires the fusion of information with domain knowledge and decision models to generate insights, predictions, and recommendations.3. Applications of Information Fusion:a. Defense and Security: Information fusion plays a critical role in defense and security applications, where it improves intelligence analysis, surveillance, threat detection, and situational awareness. By integrating information from multiple sources, such as radars, satellites, drones, and human intelligence, it enables effective decision-making in complex and dynamic situations.b. Health Monitoring: In healthcare, information fusion is used to monitor patient health, combine data from different medical devices, and provide real-time decision support to medical professionals. By fusing data from wearables, electronic medical records, and physiological sensors, it enables early detection of health anomalies and improves patient care.c. Smart Cities: Information fusion offers enormous potential for the development of smart cities. By integrating data from multiple urban systems, such as transportation, energy, and public safety, it enables efficient resource allocation, traffic management, and emergency response. This improves the overall quality of life for citizens.d. Financial Markets: In the financial sector, information fusion helps in the analysis of large-scale and diverse datasets. By integrating data from various sources, such as stock exchanges, news feeds, and social media mentions, it enables better prediction of market trends, risk assessment, and investmentdecision-making.4. Significance of Information Fusion:a. Enhanced Decision-Making: Information fusion enables decision-makers to obtain comprehensive and accurate information, reducing uncertainty and improving the quality of decisions.b. Improved Situational Awareness: By integrating data from multiple sources, information fusion enhances situational awareness, enabling timely and informed responses to dynamic and complex situations.c. Risk Reduction: By combining information from diverse sources, information fusion improves risk assessment capabilities, enabling proactive and preventive measures.d. Resource Optimization: Information fusion facilitates the efficient utilization of resources by providing a holistic view of the environment and enabling optimization of resource allocation.Conclusion:In conclusion, information fusion is a powerful approach to enhance decision-making by integrating data and knowledge from multiple sources. Its key components, including data sources, processing, extraction, knowledge representation, fusion algorithms, and decision-making processes, together create a comprehensive framework for generating meaningful insights. By applying information fusion in various domains, such as defense, healthcare, smart cities, and financial markets, we can maximize the potential of diverse information sources to achieve improved outcomes.。
bayesian colocalization analysis -回复
bayesian colocalization analysis -回复Bayesian colocalization analysis is a statistical method used to determine whether genetic variants associated with different traits are likely to share a common causal variant. This powerful tool has gained popularity in the field of genetics as it provides a way to understand the genetic basis of complex traits and diseases.Colocalization analysis is particularly useful when researchers are interested in studying multiple traits that may be influenced by shared genetic factors. Traditionally, colocalization analysis has been carried out using frequentist methods. However, Bayesian methods provide a more flexible framework that allows researchers to incorporate prior knowledge and uncertainty into their analysis.In this article, we will explore the steps involved in performing a Bayesian colocalization analysis.Step 1: Gather DataThe first step in any colocalization analysis is to gather the necessary data. This typically includes summary statistics for each of the traits under investigation. These summary statistics provide information about the association between individual geneticvariants and the traits of interest. Additionally, genotypic data for individuals in the study population may also be required for more detailed analysis.Step 2: Model SpecificationIn Bayesian colocalization analysis, the next step is to specify a statistical model that describes the relationships between the genetic variants and the traits. This typically involves defining prior distributions for the parameters of interest and specifying the likelihood function, which describes the conditional distribution of the data given the parameters.There are various model specifications that can be used in colocalization analysis, including the use of hierarchical models and multi-trait models. The choice of model depends on the specific research question and the available data.Step 3: Model FittingOnce the model has been specified, the next step is to fit the model to the data. This involves estimating the posterior distribution of the parameters using Bayesian inference methods, such as Markov Chain Monte Carlo (MCMC) sampling. MCMC sampling generates asequence of samples from the posterior distribution, allowing for the estimation of summary statistics, such as mean, variance, and quantiles.Model fitting can be computationally intensive, especially when dealing with large datasets or complex models. Advanced techniques, such as parallel computing or approximate Bayesian computation, may be employed to expedite the process.Step 4: Model AssessmentOnce the model has been fitted, it is important to assess its performance. Model assessment helps determine whether the model adequately captures the relationships between the genetic variants and the traits. This can be done through various approaches, such as posterior predictive checks, hypothesis testing, or model comparison using information criteria.If the model does not provide a good fit to the data, modifications or alternative models may need to be considered.Step 5: Interpretation of ResultsThe final step in a Bayesian colocalization analysis is to interpret theresults. This involves examining the posterior distributions of the parameters and summarizing the findings in a meaningful way. For example, researchers may be interested in identifying genetic variants that have a high probability of colocalizing with multiple traits or estimating the proportion of trait variance explained by shared genetic factors.Interpreting the results of a colocalization analysis requires careful consideration of the underlying biology and previous knowledge in the field. It is important to consider the limitations of the analysis, such as potential confounding factors or assumptions made in the model.In conclusion, Bayesian colocalization analysis is a powerful approach for investigating the shared genetic basis of multiple traits. By integrating prior knowledge and uncertainty into the analysis, researchers can gain a deeper understanding of the complex mechanisms underlying human traits and diseases. However, it is important to carefully design the analysis, assess model performance, and interpret the results in the context ofexisting knowledge.。
基于MICO+FCM的MRI脑组织分割
第37卷第3期计算机仿真2020年3月文章编号:1006-9348(2020)03-0238-05基于MICO+FCM的MRI脑组织分割姜虎成,林科(桂林电子科技大学计算机与信息安全学院,广西桂林541004)摘要:针对MRI中存在的强度不均匀问题以及颅骨组织对于脑部组织提取所造成的影响,为了解决对特定脑部组织的研究问题,提出一个MICO+FCM脑组织分割算法。
算法首先利用基于MICO的能量最小化算法对脑部MRI进行强度不均匀性估计和矫正,并且通过该算法完成对图像的初步分割,然后通过区域生长算法对图像中的颅骨组织进行去除,再利用FCM算法完成脑部组织中脑白质和脑灰的分割提取。
通过仿真表明,相对于传统FCM算法及其它图像分割算法,提出的MICO+FCM脑组织分割算法在分割准确率和分割效率上均有所提升。
关键词:脑部图像分割;强度不均匀性;乘法内部组件中图分类号:TP317.4;TP391.9文献标识码:BMRI Brain Tissue Segmentation Based on MICO+FCMJIANG Hu-cheng,LIN Ke(School of Computer and Information Security,Guilin University of Electronic Technology,Guilin Guangxi5410040,China) ABSTRACT:In view of the intensity inhomogeneity problem in MRI and the influence of skull tissue on brain tissueextraction,in order to solve the research problem of specific brain tissue,this paper proposes a MICO+FCM brain tissue segmentation algorithm.Firstly,the MICO-based energy minimization algorithm was used to estimate and correctthe intensity non-uniformity of the brain MRI,and the initial segmentation of the image is completed by the algorithm,and then the skull tissue in the image was removed by the region growing algorithm.The FCM algorithm wasused to complete the segmentation and extraction of white matter and brain gray in brain tissue.Simulation experiments show that compared with the traditional FCM algorithm and other image segmentation algorithms,the MIC0+FCM brain segmentation algorithm proposed in this paper has improved segmentation accuracy and segmentation efficiency.KEYWORDS:Brain image segmentation;Intensity non-uniformity;Multiplication internal components1引言图像分割是医学图像处理中的一个基本问题。
Applications of electronic noses and tongues
ReviewApplications of electronic noses and tonguesin food analysisAnil K.Deisingh,*†David C.Stone&Michael ThompsonDepartment of Chemistry,University of Toronto,80St George Street,Toronto,Ontario M5S3H6,Canada(Received23June2003;Accepted in revised form25February2004)Summary This reviewexamines the applications of electronic noses and tongues in food analysis.A brief history of the development of sensors is included and this is illustrated bydescriptions of the different types of sensors utilized in these devices.As patternrecognition techniques are widely used to analyse the data obtained from thesemultisensor arrays,a discussion of principal components analysis and artificial neuralnetworks is essential.An introduction to the integration of electronic tongues and noses isalso incorporated and the strengths and weaknesses of both are described.Applicationsdescribed include identification and classification offlavour and aroma and othermeasurements of quality using the electronic nose.The uses of the electronic tongue inmodel analyses and other food,beverage and water monitoring applications arediscussed.Keywords Beverage analysis,biosensor,food analysis,pattern recognition,sensor arrays.IntroductionIn the past decade,many papers have appeared in the literature describing the uses of electronic noses and,more recently,electronic tongues. These devices are typically array of sensors used to characterize complex samples.Arrays of gas sensors are termedÔelectronic nosesÕwhile arrays of liquid sensors are referred to asÔelectronic tonguesÕ(Stetter&Penrose,2002).The former group are used in quality control and process operations in the food industry while the latter are widely used in taste studies.In this review,we will discuss the principles behind the design of elec-tronic noses and tongues,describe the senses of taste and smell and evaluate the various uses of these devices.Development of sensorsThe principles of sensor design and technology will be described in this section.This is necessary to both fully understand the subject and to also appreciate their impact upon the development of the electronic noses and tongues.A chemical sensor is a device which responds to a particular analyte in a selective way by means of a reversible chemical interaction and can be used for the quantitative or qualitative determination of the analyte(Cattrall,1997).All sensors are com-posed of two main regions:thefirst is where the selective chemistry occurs and the second is the transducer.The transducer allows the conversion of one form of energy to another.The chemical reaction produces a signal such as a colour change,fluorescence,production of heat or a change in the oscillator frequency of a crystal(Cattrall,1997). Other parts of a sensor include the signal process-ing electronics and a signal display unit.The major regions of a typical sensor are shown in Fig.1.*Correspondent:Fax:(868)662-7177;e-mail:anildeisingh@†Present address:Caribbean Industrial Research Institute,University of the West Indies Campus,St Augustine,Trinidad and Tobago,West IndiesInternational Journal of Food Science and Technology2004,39,587–604587doi:10.1111/j.1365-2621.2004.00821.xÓ2004Blackwell Publishing LtdSeveral categories of transducers are available and these include:1Electrochemical,such as ion-selective electrodes (ISE),ion-selective field effect transistors (FET),solid electrolyte gas sensors and semiconductor-based gas sensors.2Piezoelectric,e.g.surface acoustic wave (SAW)sensors.Piezoelectric materials are sensitive to changes in mass,density or viscosity and,there-fore,frequency can be used as a sensitive trans-duction parameter (Hall,1990).Quartz is the most widely used piezoelectric material because it can act as a mass-to-frequency transducer.3Optical,such as optical fibres,as well as the more traditional absorbance,reflectance,luminescence and surface plasmon resonance (SPR)techniques.4Thermal systems,in which the heat of a chemical reaction involving the analyte is monitored with a transducer such as a thermistor.A subdivision of the sensor grouping is the biosensors.These incorporate a biological sensing element positioned close to the transducer to give a reagentless sensing system specific for the target analyte (Hall,1990).This ensures specificity of the biological molecules for target species.What follows is a brief review of the evolution of sensors as this illustrates howdevelopments in the field arose.The first sensor was the glass pH electrode,which appeared in 1930.Initial devel-opments were slow and it was not until 1956that a significant invention was reported.This was theClark oxygen electrode,which spurred research in biomedical areas (Thompson &Stone,1997).The piezoelectric mass deposition sensor (quartz crystal microbalance,QCM)was produced in 1959.In 1961,a solid electrolyte sensor was reported while in 1962,the first biosensor (an enzyme electrode)was described by Clark &Lyons (1962).In this case,glucose oxidase was held between membranes and the concentration of oxygen in an internal solution was measured.The platinum electrode responded to the peroxide produced by the reaction of the enzyme with its substrate:glucose þoxygen !gluconic acid þhydrogen peroxideThis investigation led to the production of the first glucose analyzer for measuring glucose in blood.Also in 1962,a metal oxide semiconductor gas sensor (the Taguchi sensor)was reported and this was followed,in 1964,by the piezoelectric bulk acoustic wave (BAW)chemical vapour sensor.In 1966,the first glucose sensor and a fluoride ISE were reported in the literature.Many ISEs are available and they are very popular as the basis of electronic tongues.The advantages of ISEs include the linear response that they showand the ability to obtain direct potentiometric measurements.They also measure ion activity of the ions rather than the total content.The 1970s and 1980s sawmany developments including the ion-selective FET (1970),a fibre optic gas sensor (1970),a palladium gate FET hydrogen sensor (a metal oxide semiconductor FET or MOSFET,1975),an enzyme FET bio-sensor (1977)and the SAW vapour/thin film sensor (1979).In 1980,a liquid-phase BAW operation was described while in 1982the SPR sensor made its debut.An evanescent wave fibre optic sensor was developed in 1984and in 1986there was the production of a BAW liquid-phase immunosensor.Further details of these systems are given in the succeeding sections.Since then,there has been the refinement of the various device technologies leading to the produc-tion of arrays of sensors (as used in electronic noses and tongues),microarrays and the development of micro-Total Analytical Systems (l m-TAS).The last category is also known as lab-on-a-chip orasFood analysis using arrays of biosensors A.K.Deisingh et al.588International Journal of Food Science and Technology 2004,39,587–604Ó2004Blackwell Publishing Ltdintegrated systems as they incorporate all analyt-ical operations on a single chip.Sensor arraysFour major categories have been involved in the development of electronic noses and each will be briefly described.1Catalytic or tin oxide sensor:A commercially available Taguchi Gas Sensor(TGS)can be and is widely used as the core-sensing element in array-based odour detectors.This consists of an elec-trically heated ceramic pellet upon which a thin film of tin(II)oxide doped with precious metals is deposited(Persaud&Travers,1997).Tin(II) oxide is an n-type semiconductor and when oxygen adsorbs on the surface,one of the negat-ively charged oxygen species is generated depend-ing on the temperature.This results in the surface potential becoming increasingly negative and the electron donors within the material become pos-itively charged.When an oxidizable material comes into contact with the sensor surfaces the adsorbed oxygen is consumed in the resulting chemical reaction.This reduces the surface poten-tial and increases the conductivity of thefilm. Several recent developments with tin oxide detec-tors have led to further advantages over the Taguchi sensor,which generally requires high power consumption and high temperatures.These include the fabrication of thin-film tin(II)oxide arrays using planar microelectronic technology leading to reduced size and lower power use, the production of thin-film sensors by chemical vapour deposition and the use of screen printing to make thick-film sensors(Persaud&Travers, 1997).2Conducting polymer sensors:Many other materials are conducting(or semiconducting) and showa variation in conductivity w ith sorption of different gases and vapours.Conducting poly-mers are very popular in the development of gas-and liquid-phase sensors with polypyrrole and polyaniline being the favoured choices.Materials used to make conducting polymers tend to have some common features,including the ability to form them through either chemical or electrochemical polymerization and the ability to change their conductivity through oxidation or reduction.Conducting polymers are widely usedas odour-sensing devices,the major reasons forthis being(Persaud&Travers,1997):(a)the sensors display rapid adsorption anddesorption phenomena at room temperature;(b)power consumption is low;(c)specificity can be achieved by modifying thestructure of the polymer;(d)they are not easily inactivated by contami-nants;(e)they are very sensitive to humidity.3Acoustic wave sensors:AT-cut quartz crystals(+35°15¢orientation of the plate with respect tothe crystal plane)are favoured as piezoelectric sensors because of their excellent temperaturecoefficients.The type of acoustic wave generatedin piezoelectric materials is determined by thecrystal cut,thickness of the material used and bythe geometry and configuration of the metal electrodes employed to produce the electricfield (Thompson&Stone,1997).One of thefirst sensorsto be introduced was the thickness-shear mode (TSM)sensor,which,if the substrate is quartz,may commonly be termed the QCM or BAW sensor.Atypical TSM sensor is shown in Fig.2.The sensor consists of overlapping metal electrodes at the topand bottom and the device is normally1.56mmthick and12.5mm in diameter.This type can beused with up to10MHz fundamental resonance frequency with a standing resonant wave being generated where the wavelengths are related to the thickness.As the thickness increases(for example,due to added mass by deposition on the surface),the wavelength increases and the frequency decrea-ses.Thus,the TSM can act as a mass-sensitive device.A major advance on the TSM sensor wasthe SAW version consisting of interdigitated elec-trodes fabricated on to quartz containing a thinfilm of material(Fig.3).When a potential is applied across the two halves of the interdigital transducer(IDT),a surface Rayleigh wave is launched in both directions across the surface fromthe IDT.Adsorption of odours to the coatingresults in a change in mass and the acoustic wave is perturbed leading to a frequency shift(Persaud& Travers,1997).SAW sensors can be operated athigher frequencies than QCM sensors thereby leading to better sensitivities.Acoustic wave sen-sors can also be operated in the liquid-phase.4MOSFET technology:In the1970s,improve-ments in semiconductor technology led to the devel-Food analysis using arrays of biosensors A.K.Deisingh et al.589Ó2004Blackwell Publishing Ltd International Journal of Food Science and Technology2004,39,587–604opment of a FET.This is a very high impedance transistor and the most sensitive measurements of small potentials requiring very lowcurrent flow s are made using this technology.In the FET,current flows along a semiconductor path called the chan-nel,at one end of which is a source electrode.At the opposite end is the drain electrode.The effective electrical diameter of the channel can be varied by application of a voltage to a control or gate electrode.The conductivity of the FET depends on the electrical diameter of the channel.A small change in gate voltage leads to a large variation in current from the source to the drain.This allows the signal to be amplified.For the MOSFET,the thermal oxidation process used to form the silicon dioxide layer on the silicon surface of the device also forms a double layer,which can induce a conducting channel in the silicon substrate.In the MOSFET,the conducting channel is insulated from the gate terminal by a layer of oxide.Thus,there is no conduction even if a reverse voltage is applied to the gate.FET sensors can be operated both with and without a reference electrode.Other chemical principles are being applied to vapour-sensing devices and these include:5Ion mobility spectrometry (IMS)which has the ability to separate ionic species at atmosphericpressure.However,there is also research under-way to develop low-pressure IMS systems.This latter technique can be used to detect and char-acterize organic vapours in air.This involves the ionization of molecules and their subsequent drift through an electric field.Analysis is based on analyte separations resulting from ionic mobilities rather than ionic masses.A major advantage of operation at atmospheric pressure is that it is possible to have smaller analytical units,lower power requirements,lighter weight and easier use (Graseby Ionics,2002).6Other mass spectrometric (MS)techniques that are in commercial development.Two recent developments in MS are atomic pressure ioniza-tion (API)and proton transfer reaction (PTR).Both are rapid,sensitive and specific and allow measurements in real-time.Additionally,they do not suffer the drift or calibration problems cur-rently experienced by electronic noses.With API-MS,ionization takes place at atmospheric pres-sure,which allows nebulization and ionization to be independent of each other.The solute and the solvent elute from a capillary,which is surrounded by the nebulizing gas,usually nitrogen.The capillary and gas are contained in a probe which can be heated up to 700°C depending on the analyte being investigated.The combination of nebulizer gas and heat convert the solvent flow into an aerosol,which evaporates rapidly.Inside the heated source is a corona discharge needle,which ionizes the solvent molecules.In the atmo-spheric region around the corona pin,a combina-tion of collisions and charge transfer reaction produces a chemical ionization reagent known as gas plasma.Sample molecules which elute and pass through this region can be ionized by the transfer of a proton to produce (M +H)+orFood analysis using arrays of biosensors A.K.Deisingh et al.590International Journal of Food Science and Technology 2004,39,587–604Ó2004Blackwell Publishing Ltd(M)H))ions(Ashcroft,1997).The impact of volatile organic compounds(VOCs)on the envi-ronment has led to a growing demand for devices to detect these compounds.One of the most promising uses the proton transfer reaction (PTR)-MS,with thefirst instruments just appear-ing on the market(Ellis,2003).As a result of the advantages listed above,these may play an increased role in the future development of elec-tronic noses and tongues.However,a limitation is the use of quadrupole MS,which has a modest mass resolution and can only monitor a single mass channel at any moment(Ellis,2003). Research is currently underway to replace the quadrupole system by a time-of-flight MS,which will allow more rapid data acquisition.7Optical/spectroscopic techniques are being currently employed,the most popular being the use offibre optics andfluorescence.For the electronic tongue,classical electro-chemical principles such as potentiometry,am-perometry and voltammetry have been utilized. When designing potentiometric devices,ion-selective and redox electrodes are commonly used.Additionally,ISFET technology has been incorporated into commercially available elec-tronic tongues.Many of the available electronic tongues are based on ISEs and a brief discussion of the principles behind this technique will now be given.Otto&Thomas(1985)reported thefirst appli-cation of ISE arrays for multicomponent analysis. Eight sensors were used for the simultaneous determination of sodium,potassium,calcium and magnesium at concentrations typical of biological fluids.However,there was insufficient selectivity for magnesium and sodium.Since then,multisen-sor arrays have become much improved with greater selectivities being reported. Potentiometry generally assumes a linear dependence between an ISE output and the logarithm of activity of the primary ion in a solution.The electrode response should obey the Nernst equation:E¼E0þðRT=Z i FÞln a i;ð1Þwhere E is the potential difference of the electro-chemical cell comprising an ion-selective and a reference electrode,E0is the standard potential,R is the gas constant,T is the absolute temperature,F is the Faraday constant,Z i is the electricalcharge of the primary ion and a i is the activity ofthe primary ion.The term RT/Z i F is the sensi-tivity of the ISE(Legin et al.,2002a).If there are interfering ions on the ISE response,the Nikolsky eqn2is used:E¼E0þðRT=Z i FÞln½a iþR K ijða jÞZ i=Z j ;ð2Þwhere K ij is the selectivity coefficient of the ISE tothe primary ion i in the presence of an interferingion,j,and Z i and Z j are the charges of the primaryand interfering ions,respectively(Legin et al.,2002a).Sensations of taste and smellTasteThe taste buds,of which there are around10000,are found mainly on the tongue with a few on thesoft palate,inner surface of the cheek,pharynxand epiglottis of the larynx(Marieb,1998).Taste sensations can be classified intofive basic categ-ories:sweet,sour,salty,bitter and umami.Table1gives examples of each of these.A single taste bud contains50–100taste cells representing allfive taste sensations.Each tastecell has receptors which bind to the molecules andions which result in the different taste sensations (Kimball,2002).Also,some materials change inflavour as they move through the mouth, e.g. saccharin is initially sweet but has a bitter after-taste at the back of the tongue(Marieb,1998).Each of the basic taste sensations has a different threshold level with bitter substances having the lowest.This is probably a protective function asmany poisonous substances are bitter(Tortora& Grabowski,1996).Sour substances have an intermediate threshold limit while sweet and saltyTable1Taste sensations(Marieb,1998)Sensation Elicited by these compoundsSweet Sugars,amino acids,alcoholsSour Acids,e.g.acetic,citricSalty Table saltBitter Quinine,caffeine,aspirin,nicotineUmami Monosodium glutamate(MSG),disodium inositate in meat andfishdisodium guanylate in mushroomsFood analysis using arrays of biosensors A.K.Deisingh et al.591Ó2004Blackwell Publishing Ltd International Journal of Food Science and Technology2004,39,587–604materials have the highest threshold values.Partial adaptation to a taste occurs in about3–5s with complete adaptation in1–5min.SmellThe human sense of smell can recognize and discriminate volatile compounds with high sensi-tivity and accuracy.Some odours are detected at the parts per trillion level and even stereoisomers are differentiated(Breer,1997).Between10and 100million receptors for olfaction lie in the nasal epithelium in an area of about5cm2.Unlike taste,smell cannot be easily classified into different groups.Humans can distinguish around10000chemicals with the olfactory recep-tors being stimulated by different combinations of a limited number of primary odours.Different researchers have suggested that this number can vary between7and50.Recently,it has been proposed that there may be1000Ôsmell genesÕ, which are actively transcribed only in the nose (Marieb,1998).Odours(or,scientifically,odorant molecules)are generally light(molecular mass up to300Da),small,polar and often hydrophobic (Craven et al.,1996).Simple odours, e.g.an alcohol,will have only one chemical component while complex odours may have up to several thousands.The need for electronic noses and tongues Gardner&Bartlett(1994)have defined the electronic nose asÔan instrument which comprises an array of electronic chemical sensors with partial specificity and an appropriate pattern recognition (PR)system,capable of recognizing simple or complex odoursÕ.Electronic tongues,on the con-trary,areÔmultisensor systems for liquid analysis based on chemical sensor arrays and PRÕ(Legin et al.,2002a).Why,then,is there any need for these multi-sensor systems to be designed for use with PR? The human olfactory system can detect thousands of different compounds with high specificity and research on artificial olfaction has led to signifi-cant advances in odour quality in the food and beverage industries(Craven et al.,1996).There is a relationship between the electronic and human noses.Each nose consists of three major regions.The human nose has an array of olfactory receptor cells,the olfactory bulb and the brain.The equivalent of these regions in the electronic nose are the odour sensor array,data pre-processor and PR system.Meanwhile,the electronic tongue can have better sensitivity and detection limits than the human tongue.This is because the taste system in humans is not as highly developed as the olfactory system.As Legin et al.(2002a)have pointed out,Ôthe electronic tongue can be thought of as analogous to both olfaction and taste and it can be used for the detection of all types of dissolved compounds,including volatile compounds[which] give odours after evaporationÕ.This device can be used for the recognition,classification and quan-titative determination of multiple component concentrations.Data processingA generalized structure of an electronic nose is shown in Fig.4.This could just as easily be applied to the electronic tongue except that liquids are studied.As seen in this diagram and as mentioned in the previous section,PR techniques are used for data processing.The sensor array may be either acoustic wave(e.g.TSM),conducting polymer(e.g.polypyrroles,metal phthalocya-nines)stannic oxide catalytic or semiconducting in nature.The data generated by each sensor are processed by a PR algorithm and the results are then analysed.The potential advantages of such an approach include the reduction in complexity of the sensor coating selection,the ability to characterize complex mixtures without the need to identify and quantify individual components and that it can be exploited for structure–activity relationship studies.In this section,we will briefly reviewthe concepts behind this method.Pattern recognition is a decision vector used to classify a species based on a series of measure-ments(a pattern)on that species.Generally,a matrix is formed from the patterns for a number of species and then a decision vector which divides the pattern into an assigned binary classification is calculated based on standard experiments.This is then used to classify unknown patterns.The success of PR techniques can be enhanced or simplified by suitable prior treatment of the dataFood analysis using arrays of biosensors A.K.Deisingh et al.592International Journal of Food Science and Technology2004,39,587–604Ó2004Blackwell Publishing Ltdsuch that feature selection and feature extraction are important approaches(Adams,1995).The former identifies and selects those features present in the analytical data,which may be important to calibration.Feature extraction changes the dimen-sionality of the data and generally refers to processes combining original variables to provide newand better ones.A w idely used method is principal component analysis(PCA)(Adams, 1995).The PR methods may be divided into supervised and non-supervised methods although a combina-tion of both can be used.In this discussion,we will consider both,as each type is widely used for electronic noses and tongues.The major unsuper-vised technique is PCA while artificial neural network(ANN)is the best-known supervised technique.PCA is a linear PR technique which reduces multidimensional,partly correlated data to two-or three-dimensions.It makes the assumption that,if there is a large number of a variable obtained from a number of cases,then a smaller set of derived variables,which retain most of the original information,can be obtained.The main aim of PCA is to reduce dimensionality with a minimum loss of information.This is achieved by projecting the data onto fewer dimensions and these are chosen to exploit the relationships between the variables.Projections are chosen so that the maximum amount of information is retained in the smallest number of dimensions (Fielding,2000).It also measures qualitative associations between variables.To analyse the results,a line of bestfit through a system of points in space is obtained.This technique allows the similarities and differences between objects and samples to be better assessed(Adams,1995).For a full discussion of PCA,the reader is referred to the book by Beebe et al.(1998).The ANNs are programs used to simulate biological nervous systems and they are based on simulated neurones,which are joined together to form networks.They are computer programs based on a simplified model of the brain;they do not attempt to copy the minute details of howthe brain works but reproduces its logical operation using a collection of neurone-like entities to perform processing(Cartwright,1993).ANN programs are multipurpose and with suitable training,a single program could solve several problems(Cartwright,1993).An artificial neurone with simple characteristics is called an elementary perceptron.This is a simple feed forward system from which artificial net-works are constructed(Cartwright,1993).An ANN,including the perceptron,starts from a position of complete ignorance and so a training period is required before it can be used for real problems.A training set consists of two parts:a training stimulus,a collection of inputs to the perceptron and a training target,the desired output for each stimulus.An example is training of the ANN to be able to distinguish between two or more odorous molecules with different struc-tures.A structure from the training set isshownFood analysis using arrays of biosensors A.K.Deisingh et al.593Ó2004Blackwell Publishing Ltd International Journal of Food Science and Technology2004,39,587–604and the perceptron is allowed to identify it.This decision is compared with the training target and adjustments are made as necessary.Another structure is shown and the process is repeated until the perceptron becomes proficient at identi-fying the structures (Cartwright,1993).Several different types of ANN are available and the most popular is the back propagation approach (Fig.5).The diagram shows a simple three layer feed forward network incorporating input,hidden and output layers.In this procedure,input patterns presented to the input layer,e.g.signals from an array of chemical sensors,generate a flowof activation to the output layer.Errors in the output are then fed back to the input layer to modify the weights of the interconnects.It should be empha-sized that back-propagation does not describe a network but represents a learning algorithm.In this way,the network can be trained with known parameters,such as sensor array responses to sets of known chemicals.The network can then Ôrecog-nize Õan unknown chemical composition when the sensor array is challenged with such a cocktail.This type excels at prediction and classification jobs.In a back-propagation system,extra hidden layers (in addition to the input and output layers)are added.Connections are allowed from the input layer to the hidden layer and then from the hidden layer to the output later (Computer Science,Stirling University,2001).Advantages of ANNs include the ability to handle noisy or missing data,no equations are involved,a network can deal with previously unseen data once training has been completed,large number of variables can be handled and they provide general solutions with good accuracy.Data analysis should include calibration,modelling and PR.Many of these procedures are based on multivariate numerical data processing and before the methods can be successfully applied,it is usual to perform pre-processing of the data (Adams,1995).The main aims of this stage are:1to reduce the amount of data which are irrelevant to the study;2to enhance sufficient information within the data to achieve the desired goal;3to extract the information in,or transform the data to,a form suitable for further analysis.Probably the most common method of pre-processing spectral data is normalization.This may involve either scaling each spectrum in a collection so that the most intense band in each spectrum is a constant value or the spectra may be normalized to constant area under the absorption or emission curve.More complex approaches involve developing a covariance matrix between variables and extracting the eigenvectors and eigenvalues.Eigen analysis will provide a set of variables,which are linear combinations of the original variables.This has the effect ofreducingFood analysis using arrays of biosensors A.K.Deisingh et al.594International Journal of Food Science and Technology 2004,39,587–604Ó2004Blackwell Publishing Ltd。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Consider the Gaussian linear mixed model (Laird and Ware, 1982),
yi = Xi + W bi i + "i
(1)
bi Nq(0; D)
1
where the yi are vectors of length ni containing the observations on the ith unit, and the "i are error
summarize as follows:
Algorithm 1
1. Sample from jy; b; 2; D 2. Sample b from fbigjy; ; 2; D 3. Sample D?1 from D?1jy; ; b; 2 4. Sample 2 from 2jy; ; b; D
2Division of Biostatistics, School of Public Health, University of Minnesota, Box 303 Mayo Building, Minneapolis, Minnesota 55455, U.S.A.
October 7, 1998
On MCMC Sampling in Hierarchical Longitudinal Models
Siddhartha Chib1 and Bradley P. Carlin2
1John M. Olin School of Business, Washington University, One Brookings Drive, St. Louis, Missouri 63130, U.S.A.
5. Repeat Steps 1-4 using the most recent values of the conditioning variables.
The Gaussian-linear structure of our model, combined with the conditional conjugacy of our prior speci cation, means that all of these full conditional distributions are easily available in closed form (as normal, normal, Wishart, and inverse gamma, respectively). This Gibbs sampling scheme has been implemented by several authors in longitudinal modeling applications; see for example Lange et al. (1992), Carlin (1996), and Carlin and Louis (1996, Sec 8.1).
stage (D) dominates that at the rst ( 2), the usual case in hierarchical modeling. Vines, Gilks
and Wild (1996) and Gilks and Roberts (1996) instead recommend a \sweeping" reparametrization, to break serial correlations by reducing the dimension of the model space (analogous to the usual frequentist practice of adding identi ability constraints to ANOVA models). Gelfand and Sahu (1999) build on the de nition of Dawid (1979) to discuss Bayesian identi ability more formally, and go on to endorse \recentering on the y," i.e., imposing identi ability constraints numerically at the end of each MCMC iteration, as a simpler but equivalent alternative to sweeping. Such recentering algorithms have long been in use in MCMC analyses of models employing pairwise di erence priors, which are identi ed only up to an additive constant and commonly used in spatial analyses; see Besag et al. (1995).
2
It is now recognized, however, that Algorithm 1, while relatively easy to implement, can su er from slow convergence if the parameters are highly correlated a posteriori, or if the information in the likelihood and prior is insu cient to completely determine the model parameters. In fact, the
e ects. By contrast, Wi is a ni q design matrix (q typically less than p), and bi is a q 1 vector
of subject-speci c random e ects. The bi model the subject-speci c means, as well as enable the
model to capture marginal dependence among the observations on the ith unit. The hierarchical
speci cation of this model is completed by adding the prior distributions D?1
Key words: Blocking; correlated binary data; Convergence acceleration; Gibbs sampler; MetropolisHastings algorithm; linear mixed model; panel data; random e ects.
gamma prior has mean = : 00 00
This model lends itself to a full Bayesian analysis by Markov chain Monte Carlo (MCMC)
methods. One of the rst such algorithms was proposed by Gelfand and Smith (1990) which we
vectors of the same length independently distributed as Nni(0;
ห้องสมุดไป่ตู้
I ;) 2 ni
i
=
1; : : :; k.
In
this
mixed
model, Xi is an ni p design matrix of covariates and is a corresponding p 1 vector of xed
W
?1
0
R0;
0
,
?2 G( 00=2; 00=2), and Np( 0; B0), where W denotes the Wishart distribution and G
denotes the gamma distribution. In our parametrization, the Wishart prior has mean R0 while the
latter situation arises automatically in model (1) if the priors on both variance components (D and
2) are overly vague, since the data can inform about them only corporately, not independently. Gelfand, Sahu and Carlin (1995, 1996) suggested hierarchical centering of the random e ects in such models to reduce serial correlations, later extending the idea to generalized (non-Gaussian) linear mixed models. These authors show the idea to work well when the variance at the second
Original version: August, 1997
SUMMARY
Markov chain Monte Carlo (MCMC) algorithms have revolutionized Bayesian practice. In their simplest form (i.e., when parameters are updated one at a time) they are, however, often slow to converge when applied to high-dimensional statistical models. A remedy for this problem is to block the parameters into groups, which are then updated simultaneously using either a Gibbs or Metropolis-Hastings step. In this paper we construct several (partially and fully blocked) MCMC algorithms for minimizing the autocorrelation in MCMC samples arising from important classes of longitudinal data models. We exploit an identity used by Chib (1995) in the context of Bayes factor computation to show how the parameters in a general linear mixed model may be updated in a single block, improving convergence and producing essentially independent draws from the posterior of the parameters of interest. We also investigate the value of blocking in non-Gaussian mixed models, as well as in a class of binary response data longitudinal models. We illustrate the approaches in detail with three real-data examples.