Clustered segmentations
component analysis
Cluster assignment strategies for a clustered trace cache processor
Cluster Assignment Strategiesfor a Clustered Trace Cache ProcessorRavi Bhargava and Lizy K.JohnTechnical Report TR-033103-01Laboratory for Computer ArchitectureThe University of Texas at AustinAustin,Texas,78712{ravib,ljohn}@March31,2003AbstractThis report examines dynamic cluster assignment for a clustered trace cache processor(CTCP).Previously proposed clustering techniques run into unique problems as issue width and clustercount increase.Realistic design conditions,such as variable data forwarding latencies betweenclusters and a heavily partitioned instruction window also increase the degree of difficulty foreffective cluster assignment.In this report,the trace cache andfill unit are used to perform effective dynamic cluster as-signment.The retire-timefill unit analysis is aided by a dynamic profiling mechanism embeddedwithin the trace cache.This mechanism provides information on inter-trace trace dependen-cies and critical inputs,elements absent in previous retire-time CTCP cluster assignment work.The strategy proposed in this report leads to more intra-cluster data forwarding and shorterdata forwarding distances.In addition,performing this strategy at retire-time reduces issue-time complexity and eliminates early pipeline stages.This increases overall performance for theSPEC CPU2000integer programs by8.4%over our base CTCP architecture.This speedup issignificantly higher than a previously proposed retire-time CTCP assignment strategy(1.9%).Dynamic cluster assignment is also evaluated for several alternate cluster designs as well asmedia benchmarks.1IntroductionA clustered microarchitecture design allows for wide instruction execution while reducing the amount of complexity and long-latency communication[2,3,5,7,11,21].The execution resources and registerfile are partitioned into smaller and simpler units.Within a cluster,communication is fast while inter-cluster communication is more costly.Therefore,the key to high performance on a clustered microarchitecture is assigning instructions to clusters in a way that limits inter-cluster data communication.During cluster assignment,an instruction is designated for execution on a particular cluster. This assignment process can be accomplished statically,dynamically at issue-time,or dynamicallyat retire-time.Static cluster assignment is traditionally done by a compiler or assembly programmer and may require ISA modification and intimate knowledge of the underlying cluster hardware. Studies that have compared static and dynamic assignment conclude that dynamic assignment results in higher performance[2,15].Dynamic issue-time cluster assignment occurs after instructions are fetched and decoded.In recent literature,the prevailing philosophy is to assign instructions to a cluster based on data de-pendencies and workload balance[2,11,15,21].The precise method varies based on the underlying architecture and execution cluster characteristics.Typical issue-time cluster assignment strategies do not scale well.Dependency analysis is an inherently serial process that must be performed in parallel on all fetched instructions.Therefore, increasing the width of the microarchitecture further delays and frustrates this dependency anal-ysis(also noted by Zyuban et al.[21]).Accomplishing even a simple steering algorithm requires additional pipeline stages early in the instruction pipeline.In this report,the clustered execution architecture is combined with an instruction trace cache, resulting in a clustered trace cache processor(CTCP).A CTCP achieves a very wide instruction fetch bandwidth using the trace cache to fetch past multiple branches in a low-latency and high-bandwidth manner[13,14,17].The CTCP environment enables the use of retire-time cluster assignment,which addresses many of the problems associated with issue-time cluster assignment.In a CTCP,the issue-time dynamic cluster assignment logic and steering network can be removed entirely.Instead,instructions are issued directly to clusters based on their physical instruction order in an trace cache line or instruc-tion cache block.This eliminates critical latency from the front-end of the pipeline.Instead,cluster assignment is accomplished at retire-time by physically(but not logically)reordering instructions so that they are issued directly to the desired cluster.Friendly et al.present a retire-time cluster assignment strategy for a CTCP based on intra-trace data dependencies[6].The trace cachefill unit is capable of performing advanced analysis since the latency at retire-time is more tolerable and less critical to performance[6,8].The shortcoming of this strategy is the loss of dynamic information.Inter-trace dependencies and workload balance information are not available at instruction retirement and are ignored.In this report,we increase the performance of a wide issue CTCP using a feedback-directed, retire-time(FDRT)cluster assignment strategy.Extrafields are added to the trace cache to accu-mulate inter-trace dependency history,as well as the criticality of instruction inputs.Thefill unit combines this information with intra-trace dependency analysis to determine cluster assignments.This novel strategy increases the amount of critical intra-cluster data forwarding by44%while decreasing the average data forwarding distance by35%over our baseline four-cluster,16-way CTCP.This leads to a8.4%improvement in performance over our base architecture compared to 1.9%improvement for Friendly’s method.2Clustered MicroarchitectureA clustered microarchitecture is designed to reduce the performance bottlenecks that result from wide-issue complexity[11].Structures within a cluster are small and data forwarding delays are reduced as long as communication takes place within the cluster.The target microarchitecture in this report is composed of four,four-way clusters.Four-wide, out-of-order execution engines have proven manageable in the past and are the building blocks of previously proposed two-cluster microarchitectures.Similarly configured16-wide CTCP’s have been studied[6,21],but not with respect to the performance of dynamic cluster assignment options.An example of the instruction and data routing for the baseline CTCP is shown in Figure1. Notice that the cluster assignment for a particular instruction is dependent on its placement in the instruction buffer.The details of a single cluster are explored later in Figure3Figure1:Overview of a Clustered Trace Cache ProcessorC2and C3are clusters identical to Cluster1and Cluster4.2.1Shared ComponentsThe front-end of the processor(i.e.fetch and decode)is shared by all of the cluster resources. Instructions fetched from the trace cache(or from the instruction cache on a trace cache miss)are decoded and renamed in parallel beforefinally being distributed to their respective clusters.The memory subsystem components,including the store buffer,load queue,and data cache,are also shared.Pipeline The baseline pipeline for our microarchitecture is shown in Figure2.Three pipeline stages are assigned for instruction fetch(illustrated as one box).After the instructions are fetched, there are additional pipeline stages for decode,rename,issue,dispatch,and execute.Registerfile accesses are initiated during the rename stage.Memory instructions incur extra stages to access the TLB and data cache.Floating point instructions and complex instructions(not shown)alsoexecution.endure extra pipeline stages forTrace Cache The trace cache allows multiple basic blocks to be fetched with just one request[13, 14,17].The retired instruction stream is fed to thefill unit which constructs the traces.These traces consist of up to three basic blocks of instructions.When the traces are constructed,the intra-trace and intra-block dependencies are analyzed.This allows thefill unit to add bits to the trace cache line which accelerates register renaming and instruction steering[13].This is the mechanism which is exploited to improve instruction reordering and cluster assignment.2.2Cluster DesignThe execution resources modeled in this report are heavily partitioned.As shown in Figure3,each cluster consists offive reservation stations which feed a total of eight special-purpose functional units.The reservation stations hold eight instructions and permit out-of-order instruction selec-tion.The economical size reduces the complexity of the wake-up and instruction select logic while maintaining a large overall instruction window size[11].Figure3:Details of One ClusterThere are eight special-purpose functional units per cluster:two simple integer units,one integermemory unit,one branch unit,one complex integer unit,one basicfloating point(FP),one complexFP,one FP memory.There arefive8-entry reservation stations:one for the memory operations(integer and FP),one for branches,one for complex arithmetic,two for the simple operations.FPis not shown.Intra-cluster communication(i.e.forwarding results from the execution units to the reservation stations within the same cluster)is done in the same cycle as instruction dispatch.However,to forward data to a neighboring cluster takes two cycles and beyond that another two cycles.This latency includes all of the communication and routing overhead associated with sharing inter-cluster data[12,21].The end clusters do not communicate directly.There are no data bandwidth limitations between clusters in our work.Parcerisa et al.show that a point-to-point interconnect network can be built efficiently and is preferable to bus-based interconnects[12].2.3Cluster AssignmentThe challenge to high performance in clustered microarchitectures is assigning instructions to the proper cluster.This includes identifying which instruction should go to which cluster and then routing the instructions accordingly.With16instructions to analyze and four clusters from whichto choose,picking the best execution resource is not straightforward.Accurate dependency analysis is a serial process and is difficult to accomplish in a timely fashion.For example,approximately half of all result-producing instructions have data consumed by an instruction in the same cache line.Some of this information is preprocessed by thefill unit, but issue-time processing is also required.Properly analyzing the relationships is critical but costly in terms of pipe stages.Any extra pipeline stages hurt performance when the pipeline refills after branch mispredictions and instruction cache misses.Totallyflexible routing is also a high-latency process.So instead,our baseline architecture steers instructions to a cluster based on its physical placement in the instruction buffer.Instructions are sent in groups of four to their corresponding cluster where they are routed on a smaller crossbar to their proper reservation station.This style of partitioning results in less complexity and fewer potential pipeline stages,but is restrictive in terms of issue-timeflexibility and steering power.A large crossbar will permit instruction movement from any position in the instruction buffer to any of the clusters.In addition to the latency and complexity drawbacks,this option mandates providing enough reservations stations write ports to accommodate up to16new instructions per cycle.Therefore,we concentrate on simpler,low-latency instruction steering.Assignment Options For comparison purposes,we look at the following dynamic cluster assign-ment options:•Issue-Time:Instructions are distributed to the cluster where one or more of their input data is known to be generated.Inter-trace and intra-trace dependencies are visible.A limit of four instructions are assigned to each cluster every cycle.Besides simplifying hardware, this also balances the cluster workloads.This option is examined with zero latency and with four cycles of latency for dependency analysis,instruction steering,and routing.•Friendly Retire-Time:This is the only previously proposedfill unit cluster assignment policy.Friendly et al.propose afill unit reordering and assignment scheme based on intra-trace dependency analysis[6].Their scheme assumes a front-end scheduler restricted to simple slot-based issue,as in our base model.For each issue slot,each instruction is checked for an intra-trace input dependency for the respective cluster.Based on these data dependencies, instructions are physically reordered within the trace.3CTCP CharacteristicsThe following characterization serves to highlight the cluster assignment optimization opportunities.3.1Trace-level AnalysisTable1presents some run-time trace line characteristics for our benchmarks.Thefirst metric(% TC Instr)is the percentage of all retired instructions fetched from the trace cache.Benchmarks with a large percentage of trace cache instructions benefit more fromfill unit optimizations since instructions from the instruction cache are unoptimized for the CTCP.Trace Size is the average number of instructions per trace line.When thefill unit does the intra-trace dependency analysis for a trace,this is the available optimization scope.Table1:Trace CharacteristicsTrace Size99.31craft y10.8772.63gap11.7569.66gzip11.7996.61parser9.0286.43twolf10.3273.77vpr11.1084.310%20%40%60%80%100%bzp crf eon gap gcc gzp mcf psr prl twf vor vpr%D y n a m i c I n s t r u c t i o n s W i t h I n p u t sFigure 4:Source of Critical Input DependencyFrom RS2:Critical input provided by the producer for input RS2.From RS1:Critical input provided by the producer for input RS1.From RF:Critical input provided by the register file.Table 2:Dynamic Consumers Per InstructionInter-Trace0.90crafty0.270.82gap0.241.03gzip0.300.54parser0.501.17twolf0.331.00vpr0.331.03Table3:Critical Data Forwarding Dependencies%of critical dep.’sthat are inter-trace85.63%crafty24.32%86.58%gap22.72%87.59%gzip24.38%89.62%parser38.16%86.11%twolf23.95%85.51%vpr25.84%84.91%1For bzip2,the branch predictor accuracy is sensitive to the rate at which instructions retire and the“better”case with no data forwarding latency actually leads to an increase in branch mispredictions and worse performance.the registerfile read latency is presented and has almost no effect on overall performance.In fact, registerfile latencies between zero and10cycles have no impact on performance.This is due to the abundance and critical nature of in-flight instruction data forwarding.3.3Resolving Inter-Trace DependenciesThefill unit accurately determines intra-trace dependencies.Since a trace is an atomic trace cache unit,the same intra-trace instruction data dependencies will exist when the trace is later fetched. However,incorporating inter-trace dependencies at retire-time is essentially a prediction of issue-time dependencies,some of which may occur thousands or millions of cycles in the future.This problem presents an opportunity for an execution history based mechanism to predict the source clusters or producers for instructions with inter-trace dependencies.Table4examines the how often an instruction’s forwarded data comes from the same producer instruction.For each static instruction,the program counter of the last producer is tracked for each source register(RS1 and RS2).The table shows that an instruction’s data forwarding producer is the same for RS1 96.3%of the time and the same for RS294.3%of the time.Table4:Frequency of Repeated Forwarding ProducersAll Critical Inter-traceInput RS1Input RS2Input RS1Input RS297.44%89.30%crafty97.82%93.55%93.83%85.79%gap93.65%77.88%96.39%85.36%gzip99.02%96.04%96.61%92.36%parser87.66%78.76%97.78%90.83%twolf90.78%76.40%89.67%70.87%vpr96.06%91.67%96.25%86.92%Inter-trace dependencies do not necessarily arrive from the previous trace.They could arrive from any trace in the past.In addition,static instructions are sometimes incorporated into several different dynamic traces.Table5analyzes the distance between an instruction and its critical inter-trace producer.The values are the percentage of such instructions that encounter the same distance in consecutive executions.This percentage correlates very well to the percentages in the last two columns of Table4.On average,85.9%of critical inter-trace forwarding is the same distance from a producer as the previoue dynamic instance of the instruction.Table5:Frequency of Repeated Critical Inter-Trace Forwarding Distancesbzip2craftyeongapgccgzipmcfparserperlbmktwolfvortexvprAveragedecrease.The important aspect is that physical reordering reduces inter-cluster communications while maintaining low-latency,complexity-effective issue logic.4.1Pinning InstructionsPhysically reordering instructions at retire-time based on execution history can cause more inter-cluster data forwarding than it eliminates.When speculated inter-trace dependencies guide the reordering strategy,the same trace of instructions can be reordered in a different manner each time the trace is constructed.Producers shift from one cluster to another,never allowing the consumers to accurately gauge the cluster from which their input data will be produced.A retire-time instruction reordering heuristic must be chosen carefully to avoid unstable cluster assignments and self-perpetuating performance problems.To combat this problem,we pin key producers and their subsequent inter-trace consumers to the same cluster to force intra-cluster data forwarding between inter-trace dependencies.This creates a pinned chain of instructions with inter-trace dependencies.Thefirst instruction of the pinned chain is called a leader.The subsequent links of the chain are referred to as followers.The criteria for selecting pin leaders and followers are presented in Table6.Table6:Leader and Follower CriteriaConditions to become a pin leader:1.Not already a leader or follower2.Producer provides last input data3.Producer is a leader or follower4.Producer is from different traceThere are two key aspects to these guidelines.Thefirst is that only inter-trace dependencies are considered.Placing instructions with intra-trace dependencies on the same cluster is easy and reliable.Therefore,these instructions do not require the pinned chain method to establish dependencies.Second,once an instruction is assigned to a pinned chain as a leader or a follower,its status should not change.The idea is to pin an instruction to one cluster and force the other instructions in the inter-trace dependency chain to follow it to that same cluster.If the pinned cluster is allowed to change,then it could lead to performance-limiting race conditions discussed earlier.Table7:FDRT Cluster Assignment StrategyDependency typeif...if...if...if...if...Intra-trace producer:no yes yes no no Intra-trace consumer:1.producer 1.pin 1.pin 1.middle 1.skipAssignment Priority:3.skip 3.skip 3.neighbor2Using Cacti2.0[16],an additional byte per instruction in a trace cache line is determined not to change the fetch latency of the trace cache.4.3Cluster Assignment StrategyThefill unit must weigh intra-trace information along with the inter-trace feedback from the trace cache execution histories.Table7summarizes our proposed cluster assignment policies.The inputs to the strategy are:1)the presence of an intra-trace dependency for the instruction’s most critical source register(i.e.the input that was satisfied last during execution),2)the pin chain status, and3)the presence of intra-trace consumers.Thefill unit starts with the oldest instruction and progresses in logical order to the youngest instruction.For option A in Table7,thefill unit attempts to place instructions that have only an intra-trace dependency on the same cluster as its producer.If there are no instruction slots available for the producer’s cluster,an attempt is made to assign the instruction to a neighboring cluster.For an instruction with just an inter-trace pin dependency(option B),thefill unit attempts to place the instruction on the pinned cluster(which is found in the Pinned Cluster trace profilefield)or a neighboring cluster.An instruction can have both an intra-trace dependency and a pinned inter-trace dependency (option C).Recall that pinning an instruction is irreversible.Therefore,if the critical input changes or an instruction is built into a new trace,an intra-trace dependency could exist along with a pinned inter-trace dependency.When an instructions has both a pinned cluster and an intra-trace producer,the pinned cluster takes precedence(although our simulations show that it doesn’t matter which gets precedence).If four instructions have already been assigned to this cluster,the intra-trace producer’s cluster is the next target.Finally,there is an attempt to assign the instruction to a pinned cluster neighbor.If an instruction has no dynamically forwarded input data but does have an intra-trace output dependency(option D),it is assigned to a middle cluster(to reduce potential forwarding distances).Instructions are skipped if they have no input or output dependencies(option E),or if they cannot be assigned to a cluster near their producer(lowest priority assignment for options A-D). These instructions are later assigned to the remaining slots using Friendly’s method.4.4ExampleA simple illustration of the FDRT cluster assignment strategy is shown in Figure6.There are two traces,T1and T2.Trace T1is the older of the two traces.Four instructions(out of at least10)of each trace are shown.Instruction I7is older than instruction I10.The arrows indicate dependencies. The arrows originate at the producer and go to the consumer.A solid black arrowhead represents an intra-trace dependence and a white arrowhead represents an inter-trace dependence.A solidarrow line represents a critical input,and a dashed line represents a non-critical input.Arrows with numbers adjacent to them are chain dependencies where the number represents the chain clusterpinned.to which the instructions should beThe upper portion of this example examines four instructions from the middle of two different traces.The instruction numberings are in logical order.The arrows indicate dependencies,traveling fromthe producer to the consumer.The instructions are assigned to clusters2and3based on theirdependencies.The letterings in parenthesis match those in Table7.Instruction T1I8has one intra-trace producer(Option A),and is assigned to the same cluster as its producer TIdependencies(Option B).The chain cluster value is3,so these instructions are assigned to that cluster.Note that the intra-trace input to T1I7follows Option B and is assigned to Cluster3based on its chain inter-trace dependency.Instruction T2I9and T2I7 and T2Table8:SPEC CINT2000Benchmarks Benchmark InputsMinneSPECcrafty crafty.inSPEC testgap-q-m64M test.inMinneSPECgzip smred.log1MinneSPECparser 2.1.dict-batch mdred.inMinneSPECtwolf mdredMinneSPECvpr small.arch.in-nodisp-place t5-exit t0.9412-innerL1Data Cache:4-way,32KB,2-cycle accessL2Unified cache:4-way,1MB,+8cyclesNon-blocking:16MSHRs and4portsD-TLB:128-entry,4-way,1-cyc hit,30-cyc miss Store buffer:32-entry w/load forwardingLoad queue:32-entry,no speculative disambiguation Main Memory:Infinite,+65cyclesFetch Engine ·Functional unit#t.Issue lat. Simple Integer21cycle1cycle Simple FP231 Memory111Int.Mul/Div13/201/19FP Mul/Div/Sqrt13/12/241/12/24Int Branch111FP Branch111·Inter-Cluster Forwarding Latency:2cycles per forward ·Register File Latency:2cycles·5Reservation stations·8entries per reservation station·2write ports per reservation station·192-entry ROB·Fetch width:16·Decode width:16·Issue width:16·Execute width:16·Retire width:165.2Performance AnalysisFigure7presents the speedups over our base architecture for different dynamic cluster assignment strategies.The proposed feedback-directed,retire-time(FDRT)cluster assignment strategy pro-vides a 7.3%improvement.Friendly’s method improves performance by 1.9%3.This improvement in performance is due to enhancements in both the intra-trace and inter-trace aspects of cluster as-signment.Additional simulations (not shown)show that isolating the intra-trace heuristics from the FDRT strategy results in a 3.4%improvement by itself.The remaining performance improvement generated by FDRT assignment comes from the inter-traceanalysis.0.900.951.001.051.101.151.201.251.301.351.40bzpgcccrfeongapgzpmcfpsrprltwfvorvprHMFigure 7:Speedup Due to Different Cluster Assignment StrategiesThe performance boost is due to an increase in intra-cluster forwarding and a reduction in average data forwarding distance.Table 10presents the changes in intra-cluster forwarding.On average,both CTCP retire-time cluster assignment schemes increase the amount of same-cluster forwarding to above 50%,with FDRT assignment doing better.The inter-cluster distance is the primary cluster assignment performance-related factor (Ta-ble 11).For every benchmark,the retire-time instruction reordering schemes are able to improve upon the average forwarding distance.In addition,the FDRT scheme always provides shorter overall data forwarding distances than the Friendly method.This is a result of funneling producers with no input dependencies to the middle clusters and placing consumers as close as possible to their producers.For the program eon ,the Friendly strategy provides a higher intra-cluster forwarding percent-age than FDRT without resulting in higher performance.The reasons for this are two-fold.MostTable10:Percentage of Intra-Cluster Forwarding For Critical InputsFriendlybzip260.84%crafty54.29%eon52.83%gap58.77%gcc58.14%gzip53.91%mcf64.69%parser57.67%perlbmk58.36%twolf56.91%vortex54.00%vpr58.70%Average57.43%Base FDRT0.830.240.900.590.960.700.710.490.710.510.940.560.620.440.690.530.780.490.730.560.780.520.920.570.800.52Distance is the number of clusters traversed by forwarded data.importantly,the average data forwarding distance is reduced compared to the Friendly method de-spite the extra inter-cluster forwarding.There are also secondary effects that result from improving overall forwarding latency,such as a change in the update rate for the branch predictor and BTB. In this case,our simulations show that FDRT scheme led to improved branch prediction as well.The two retire-time instruction reordering strategies are also compared to issue-time instruction steering in Figure7.In one case,instruction steering and routing is modeled with no latency (labeled as No-lat Issue-time)and in the other case,four cycles are modeled(Issue-time).The results show that latency-free issue-time steering is the best,with a9.9%improvement over the base.However,when applying an aggressive four-cycle latency,issue-time steering is only preferable for three of the12benchmarks and the average performance improvement(3.8%)is almost half that of FDRT cluster assignment.5.3FDRT Assignment AnalysisFigure 8is a breakdown of instructions based on their FDRT assignment strategy option.On average 32%have only an intra-trace dependency,while 16%of the instructions have just an inter-trace pinned dependency.Only 7%of the instructions have both a pin inter-trace dependency and a critical intra-trace dependency.Therefore,55%of the instructions are considered to be consumers and are therefore placed near theirproducers.0%10%20%30%40%50%60%70%80%90%100%bzpcrf eon gap gcc gzp mcf psrprltwfvor vpr AvgFigure 8:FDRT Critical Input DistributionThe letters A-E correspond to the lettered options in Table 7.Around 10%of the instructions had no input dependencies but did have an intra-trace consumer.These producer instructions are assigned to a middle cluster where their consumers will be placed on the same cluster later.Only a very small percentage (less than 1%)of instructions with identified input dependencies are initially skipped because their is no suitable neighbor cluster for assignment.Finally,a large percentage of instructions (around 34%)are determined to not have a critical intra-trace dependency or pinned inter-trace dependency.Most of these instructions do have data dependencies,but they did not require data forwarding or did not meet the pin criteria.Table 12presents pinned chain characteristics,including the average number of leaders per trace and average number of followers per trace.Because pin dependencies are limited to inter-trace dependencies,the combined number of leaders and followers is only 2.90per trace.This is about 1/4of the instructions in a trace.For some of the options in Table 7,the fill unit cluster assignment mechanism attempts to place。
营销核心概念(英文版)
Opera-tions
Margin
Margin
Levi Strauss’
Value-Delivery Network
T8
Order
Du Pont〔Fibers〕
Milliken〔Fabric〕
Levi’s〔Apparel〕
Sears〔Retail〕
petition is between networks, not panies.The winner is the pany with the better network.
Resourcemarkets
Taxes,goods
Money
Money
Services,money
Services,money
Taxes
Taxes,goods
Services,money
Services
Taxes,goods
Goods, services
Goods, services
Resources
The Profit Triangle
Fig. 2.08
T10
Profit
Value creation
petitive advantage
Internal operations
Strategic Planning, Implementation, and Control Process
T11
Fewcustomers/distributors
Accountable
Proactive
Partnership
Proactive
Accountable
市场营销(双语教程)Chapter 5 Segmentation, Targeting and Positioning
5.1.4 Market-Segmentation Procedure
1. Survey Stage The researcher conducts exploratory interviews and focus groups to gain insight into customer motivations, attitudes, and behavior. Then the researcher prepares a questionnaire and collects data on attributes and their importance ratings, brand awareness and brand ratings, product-usage patterns, attitudes toward the product category, and respondents’ demographics, geographics, psychographics, and media-graphics. 2. Analysis Stage The researcher applies factor analysis to the data to remove highly correlated variables, and then uses cluster analysis to create a specified number of maximally different segments. 3. Profiling Stage Each cluster is profiled in terms of its distinguishing attitudes, behavior, demographics, psychographics, and media patterns, and then each segment is given a name based on its dominant characteristic.
InstanceSegmentation入门总结
InstanceSegmentation⼊门总结前⼀阵⼦好忙啊,好久没更新了。
最近正好挖了新坑,来更新下。
因为之前是做检测的,⽽⽬前课题顺道偏到了instance segmentation,这篇⽂章简单梳理⼀下从检测、分割结果到instance segmentation结果问题在哪⾥,以及已有的解决⽅案。
初见instance segmentation分类、检测、分割是有天然的联系的:从⽬的来讲,三个任务都是为了正确的分类⼀张(或⼀部分)图像;进⼀步,检测和分割还共同负责定位任务。
这些任务之间的不同是由于⼈在解决同⼀类问题时,对问题的描述⽅案不同导致的,是⼈为的。
因⽽,可以找到⼀种共同的描述(或任务),即instance segmentation。
那么,instance segmentation既然集成了上述3种任务,如果有上述3种任务的结果,是不是通过简单组合就可以得到instance segmentation的结果呢?显然是不⾏的。
为什么不可以呢?或者说需要补全什么才可以得到instance segmentation的结果呢?我们先从每个任务分开来看,检测:已经编码了空间上的相关性,但是缺少精细的定位(即segmentation mask)分割:已经具备了精细的定位,但是缺少空间相关性解释⼀下空间上的相关性,它对于检测和instance segmentation都⾮常重要。
空间上的相关性即同⼀个像素由于处在物体的不同相对位置,它对于不同物体可能语义并不相同,如图中⼈的框虽然覆盖了部分⽺,但是该框并不会分到⽺这个类别,⽽是分到了⼈,正是由于空间上,⽺没并有主导整个框。
既然如此,解决⽅案似乎显⽽易见了,缺什么补什么就⾏了。
常⽤instance segmentation范例依然沿着以上思路,精细定位⽐较容易补出来,那么空间相关性如何编码呢?与检测相似,可以采⽤roi pooling或position sensitive map解决。
聚类分析文献英文翻译
电气信息工程学院外文翻译英文名称:Data mining-clustering译文名称:数据挖掘—聚类分析专业:自动化姓名:****班级学号:****指导教师:******译文出处:Data mining:Ian H.Witten, EibeFrank 著二○一○年四月二十六日Clustering5.1 INTRODUCTIONClustering is similar to classification in that data are grouped. However, unlike classification, the groups are not predefined. Instead, the grouping is accomplished by finding similarities between data according to characteristics found in the actual data. The groups are called clusters. Some authors view clustering as a special type of classification. In this text, however, we follow a more conventional view in that the two are different. Many definitions for clusters have been proposed:●Set of like elements. Elements from different clusters are not alike.●The distance between points in a cluster is less than the distance betweena point in the cluster and any point outside it.A term similar to clustering is database segmentation, where like tuple (record) in a database are grouped together. This is done to partition or segment the database into components that then give the user a more general view of the data. In this case text, we do not differentiate between segmentation and clustering. A simple example of clustering is found in Example 5.1. This example illustrates the fact that that determining how to do the clustering is not straightforward.As illustrated in Figure 5.1, a given set of data may be clustered on different attributes. Here a group of homes in a geographic area is shown. The first floor type of clustering is based on the location of the home. Homes that are geographically close to each other are clustered together. In the second clustering, homes are grouped based on the size of the house.Clustering has been used in many application domains, including biology, medicine, anthropology, marketing, and economics. Clustering applications include plant and animal classification, disease classification, image processing, pattern recognition, and document retrieval. One of the first domains in which clustering was used was biological taxonomy. Recent uses include examining Web log data to detect usage patterns.When clustering is applied to a real-world database, many interesting problems occur:●Outlier handling is difficult. Here the elements do not naturally fallinto any cluster. They can be viewed as solitary clusters. However, if aclustering algorithm attempts to find larger clusters, these outliers will beforced to be placed in some cluster. This process may result in the creationof poor clusters by combining two existing clusters and leaving the outlier in its own cluster.● Dynamic data in the database implies that cluster membership may change over time.● Interpreting the semantic meaning of each cluster may be difficult. With classification, the labeling of the classes is known ahead of time. However, with clustering, this may not be the case. Thus, when the clustering process finishes creating a set of clusters, the exact meaning of each cluster may not be obvious. Here is where a domain expert is needed to assign a label or interpretation for each cluster.● There is no one correct answer to a clustering problem. In fact, many answers may be found. The exact number of clusters required is not easy to determine. Again, a domain expert may be required. For example, suppose we have a set of data about plants that have been collected during a field trip. Without any prior knowledge of plant classification, if we attempt to divide this set of data into similar groupings, it would not be clear how many groups should be created.● Another related issue is what data should be used of clustering. Unlike learning during a classification process, where there is some a priori knowledge concerning what the attributes of each classification should be, in clustering we have no supervised learning to aid the process. Indeed, clustering can be viewed as similar to unsupervised learning.We can then summarize some basic features of clustering (as opposed to classification):● The (best) number of clusters is not known.● There may not be any a priori knowledge concerning the clusters.● Cluster results are dynamic.The clustering problem is stated as shown in Definition 5.1. Here we assume that the number of clusters to be created is an input value, k. The actual content (and interpretation) of each cluster,j k ,1j k ≤≤, is determined as a result of the function definition. Without loss of generality, we will view that the result of solving a clustering problem is that a set of clusters is created: K={12,,...,k k k k }.D EFINITION 5.1.Given a database D ={12,,...,n t t t } of tuples and an integer value k , the clustering problem is to define a mapping f : {1,...,}D k → where each i t is assigned to one cluster j K ,1j k ≤≤. A cluster j K , contains precisely those tuples mapped to it; that is, j K ={|(),1,i i j t f t K i n =≤≤and i t D ∈}.A classification of the different types of clustering algorithms is shown in Figure 5.2. Clustering algorithms themselves may be viewed as hierarchical or partitional. With hierarchical clustering, a nested set of clusters is created. Each level in the hierarchy has a separate set of clusters. At the lowest level, each item is in its own unique cluster. At the highest level, all items belong to the same cluster. With hierarchical clustering, the desired number of clusters is not input. With partitional clustering, the algorithm creates only one set of clusters. These approaches use the desired number of clusters to drive how the final set is created. Traditional clustering algorithms tend to be targeted to small numeric database that fit into memory .There are, however, more recent clustering algorithms that look at categorical data and are targeted to larger, perhaps dynamic, databases. Algorithms targeted to larger databases may adapt to memory constraints by either sampling the database or using data structures, which can be compressed or pruned to fit into memory regardless of the size of the database. Clustering algorithms may also differ based on whether they produce overlapping or nonoverlapping clusters. Even though we consider only nonoverlapping clusters, it is possible to place an item in multiple clusters. In turn, nonoverlapping clusters can be viewed as extrinsic or intrinsic. Extrinsic techniques use labeling of the items to assist in the classification process. These algorithms are the traditional classification supervised learning algorithms in which a special input training set is used. Intrinsic algorithms do not use any a priori category labels, but depend only on the adjacency matrix containing the distance between objects. All algorithms we examine in this chapter fall into the intrinsic class.The types of clustering algorithms can be furthered classified based on the implementation technique used. Hierarchical algorithms can becategorized as agglomerative or divisive. ”Agglomerative ” implies that the clusters are created in a bottom-up fashion, while divisive algorithms work in a top-down fashion. Although both hierarchical and partitional algorithms could be described using the agglomerative vs. divisive label, it typically is more associated with hierarchical algorithms. Another descriptive tag indicates whether each individual element is handled one by one, serial (sometimes called incremental), or whether all items are examined together, simultaneous. If a specific tuple is viewed as having attribute values for all attributes in the schema, then clustering algorithms could differ as to how the attribute values are examined. As is usually done with decision tree classification techniques, some algorithms examine attribute values one at a time, monothetic. Polythetic algorithms consider all attribute values at one time. Finally, clustering algorithms can be labeled base on the mathematical formulation given to the algorithm: graph theoretic or matrix algebra. In this chapter we generally use the graph approach and describe the input to the clustering algorithm as an adjacency matrix labeled with distance measure.We discuss many clustering algorithms in the following sections. This is only a representative subset of the many algorithms that have been proposed in the literature. Before looking at these algorithms, we first examine possible similarity measures and examine the impact of outliers.5.2 SIMILARITY AND DISTANCE MEASURESThere are many desirable properties for the clusters created by a solution to a specific clustering problem. The most important one is that a tuple within one cluster is more like tuples within that cluster than it is similar to tuples outside it. As with classification, then, we assume the definition of a similarity measure, sim(,i l t t ), defined between any two tuples, ,i l t t D . This provides a more strict and alternative clustering definition, as found in Definition 5.2. Unless otherwise stated, we use the first definition rather than the second. Keep in mind that the similarity relationship stated within the second definition is a desirable, although not always obtainable, property.A distance measure, dis(,i j t t ), as opposed to similarity, is often used inclustering. The clustering problem then has the desirable property that given a cluster,j K ,,jl jm j t t K ∀∈ and ,(,)(,)i j jl jm jl i t K sim t t dis t t ∉≤.Some clustering algorithms look only at numeric data, usually assuming metric data points. Metric attributes satisfy the triangular inequality. The cluster can then be described by using several characteristic values. Given a cluster, m K of N points { 12,,...,m m mN t t t }, we make the following definitions [ZRL96]:Here the centroid is the “middle ” of the cluster; it need not be an actual point in the cluster. Some clustering algorithms alternatively assume that the cluster is represented by one centrally located object in the cluster called a medoid . The radius is the square root of the average mean squared distance from any point in the cluster to the centroid, and of points in the cluster. We use the notation m M to indicate the medoid for cluster m K .Many clustering algorithms require that the distance between clusters (rather than elements) be determined. This is not an easy task given that there are many interpretations for distance between clusters. Given clusters i K and j K , there are several standard alternatives to calculate the distance between clusters. A representative list is:● Single link : Smallest distance between an element in onecluster and an element in the other. We thus havedis(,i j K K )=min((,))il jm il i j dis t t t K K ∀∈∉and jm j i t K K ∀∈∉.● Complete link : Largest distance between an element in onecluster and an element in the other. We thus havedis(,i j K K )=max((,))il jm il i j dis t t t K K ∀∈∉and jm j i t K K ∀∈∉.● Average : Average distance between an element in onecluster and an element in the other. We thus havedis(,i j K K )=((,))il jm il i j mean dis t t t K K ∀∈∉and jm j i t K K ∀∈∉.● Centroid : If cluster have a representative centroid, then thecentroid distance is defined as the distance between the centroids.We thus have dis(,i j K K )=dis(,i j C C ), where i C is the centroidfor i K and similarly for j C .Medoid : Using a medoid to represent each cluster, thedistance between the clusters can be defined by the distancebetween the medoids: dis(,i j K K )=(,)i j dis M M5.3 OUTLIERSAs mentioned earlier, outliers are sample points with values much different from those of the remaining set of data. Outliers may represent errors in the data (perhaps a malfunctioning sensor recorded an incorrect data value) or could be correct data values that are simply much different from the remaining data. A person who is 2.5 meters tall is much taller than most people. In analyzing the height of individuals, this value probably would be viewed as an outlier.Some clustering techniques do not perform well with the presence of outliers. This problem is illustrated in Figure 5.3. Here if three clusters are found (solid line), the outlier will occur in a cluster by itself. However, if two clusters are found (dashed line), the two (obviously) different sets of data will be placed in one cluster because they are closer together than the outlier. This problem is complicated by the fact that many clustering algorithms actually have as input the number of desired clusters to be found.Clustering algorithms may actually find and remove outliers to ensure that they perform better. However, care must be taken in actually removing outliers. For example, suppose that the data mining problem is to predict flooding. Extremely high water level values occur very infrequently, and when compared with the normal water level values may seem to be outliers. However, removing these values may not allow the data mining algorithms to work effectively because there would be no data that showed that floods ever actually occurred.Outlier detection, or outlier mining, is the process of identifying outliers in a set of data. Clustering, or other data mining, algorithms may then choose to remove or treat these values differently. Some outlier detection techniques are based on statistical techniques. These usually assume that the set of data follows a known distribution and that outliers can be detected by well-known tests such as discordancy tests. However, thesetests are not very realistic for real-world data because real-world data values may not follow well-defined data distributions. Also, most of these tests assume single attribute value, and many attributes are involved in real-world datasets. Alternative detection techniques may be based on distance measures.聚类分析5.1简介聚类分析与分类数据分组类似。
Segmentation - University of M
Robustness
– Outliers: Improve the model either by giving the noise “heavier tails” or allowing an explicit outlier model
– M-estimators
Assuming that somewhere in the collection of process close to our model is the real process, and it just happens to be the one that makes the estimator produce the worst possible estimates
– Proximity, similarity, common fate, common region, parallelism, closure, symmetry, continuity, familiar configuration
Segmentation by clustering
Partitioning vs. grouping Applications
ri (x i , );
i
(u;
)
u2 2
u
2
Segmentation by fitting a model(3)
RANSAC (RAMdom SAmple Consensus)
– Searching for a random sample that leads to a fit on which many of the data points agree
Allocate each data point to cluster whose center is nearest
三维重建技术在乳腺癌诊疗中的应用
•综述•三维重建技术在乳腺癌诊疗中的应用邹娟郑杰华李志扬林伟洵陈业晞汕头大学医学院第二附属医院甲乳疝外科515041邹娟和郑杰华对本文有同等贡献通信作者:陈业時,Email:chenyexi@;李志扬,Email:s_zyli4@【摘要】近年来随着三维重建技术的不断发展,基于病理学及医学影像学的三维重建技术在乳腺癌的诊治方面显示出一定的价值。
三维重建技术凭借其可提供病灶的空间位置、形态结构及与周围组织脏器的三维结构关系等优点在乳腺癌的早期诊断、手术治疗、术后精确评估治疗效果等方面发挥了关键的作用。
尽管目前基于病理学及医学影像学的三维重建技术的应用仍存在不足,但随着科学技术的不断发展,三维重建技术在乳腺癌的诊断、个性化治疗及预后评估等方面将发挥越来越重要的作用。
【关键词】乳腺肿瘤;图像处理,计算机辅助;诊断;治疗;应用基金项目:白求恩•爱惜康卓越外科基金(HZB-20181119-25 );汕头大学大学生创新创业训练计划(201810560036)D0I:10. 3760/cma. j. cn371439-20200810-00011Application of three-dimensional reconstruction technique in diagnosis and treatment of breast cancerZou Juan, Zheng Jiehua, Li Zhiyang, Lin Weixun, Chen YexiDepartment of Thyroid and Breast Hernia Surgery,Second Affiliated Hospital o f Shantou University MedicalCollege,Shantou 51504】,ChinaZou Juan and Zheng Jiehua are contributed equally to the articleCorresponding author\ Chen Yexi,E m ail:*****************.com; Li Zhiyang,Email: s一zyli4@stu. edu. cn【Abstract】In recent years, with being gradually developed, three-dimensional (3D) reconstructionbased on pathology and medical imaging technology has shown certain value in the diagnosis and treatment ofbreast cancer. And with its advantages of providing the spatial location, morphological structure and 3Dstructure relationships with the surrounding tissues and organs, 3D reconstruction technology has played a keyrole in the early diagnosis, surgical treatment, and accurate evaluation of the treatment effect after surgery ofbreast cancer. Although the application of 3D reconstruction technology based on pathology and medical imagingis still inadequate, with the continuous development of science and technology, 3D reconstruction technologywill play an increasingly important role in the diagnosis, personalized treatment and prognosis assessment ofbreast cancer.【Key words 】Breast neoplasms; Image processing,computer-assisted; Diagnosis; Therapy; ApplicationFund program :Bethune-Ethicon Excellence Surgery Fund ( HZB-20181119-25 ) ;UndergraduateInnovation and Entrepreneurship Training Program Project of Shantou University (201810560036)DOI:10. 3760/cma. j. cn371439-20200810-00011乳腺癌是全世界女性最常见的恶性肿瘤之一,同时也是导致女性死亡的主要原因[1]。
市场细分一般数据分析方法
5. 可行性 (Ac精t品i文档onability):计划可以
第二节 目标市场选择
目标市场是指企业决定进入的,具有共同需要或特征的购买者 集合。
企业进行市场细分,并对细分市场进行评估之后,结合自身的 实际情况,企业要决定选择哪些和选择多少细分市场,即最具 优势的市场做为自己的目标市场,这就是目标市场的选择
市场细分的理论基础
是消费者需求偏好的差异性。任何产品都表现为一组属性 的集合。但不同的消费者对同类产品的不同属性赋予不同 的重视程度。根据对同类产品不同属性的重视程度,可以 把消费者的需求偏好分成三种类型:
精品文档
1. 同质偏好 (Homogeneous Preferences) 指所有顾客对产品的各种属性表现出大 致相同的偏好。
课程回顾
1. 什么是封闭性问题和开放式问题,举例说 明!
2. 制定调研计划的4W3H是什么?
精品文档
2. 制定调查计划
调查计划包括:4W3H what明确的调研目的和内容
面谈、问卷、邮 寄、电话、观察、
实验调查
who调研对象的选择
where调研地点的选择
when调研程序及日程安排
how使用的调研方法\收集的信息
一、分析评价细分市场
1. 细分市场的规模和增长潜力 (segment size and
growth)
2. 细分市场的结构吸引力 (segment structural
attractiveness)
3. 企业的目标和资源 (company objectives and
resources)
精品文档
和路雪 VS பைடு நூலகம்利
选择一种 职业就是 选择一种 生活方式
机器学习之聚类英文PPT
•
•
Building Visual Dictionaries
1. Sample patches from a database
– E.g., 128 dimensional SIFT vectors
2. Cluster the patches
– Cluster centers are the dictionary
Good
• Simple to implement, widespread application • Clusters have adaptive shapes • Provides a hierarchy of clusters
Bad
• May have imbalanced clusters • Still have to choose number of clusters or threshold • Need to use an “ultrametric” to get a meaningful hierarchy
Mean shift segmentation
D. Comaniciu and P. Meer, Mean Shift: A Robust Approach toward Feature Space Analysis, PAMI 2002.
• Versatile technique for clustering-based segmentation
Region of interest Center of mass
Mean Shift vector
Slide by Y. Ukrainitz & B. Sarel
Mean shift
Region of interest Center of mass
Survey of clustering data mining techniques
A Survey of Clustering Data Mining TechniquesPavel BerkhinYahoo!,Inc.pberkhin@Summary.Clustering is the division of data into groups of similar objects.It dis-regards some details in exchange for data simplifirmally,clustering can be viewed as data modeling concisely summarizing the data,and,therefore,it re-lates to many disciplines from statistics to numerical analysis.Clustering plays an important role in a broad range of applications,from information retrieval to CRM. Such applications usually deal with large datasets and many attributes.Exploration of such data is a subject of data mining.This survey concentrates on clustering algorithms from a data mining perspective.1IntroductionThe goal of this survey is to provide a comprehensive review of different clus-tering techniques in data mining.Clustering is a division of data into groups of similar objects.Each group,called a cluster,consists of objects that are similar to one another and dissimilar to objects of other groups.When repre-senting data with fewer clusters necessarily loses certainfine details(akin to lossy data compression),but achieves simplification.It represents many data objects by few clusters,and hence,it models data by its clusters.Data mod-eling puts clustering in a historical perspective rooted in mathematics,sta-tistics,and numerical analysis.From a machine learning perspective clusters correspond to hidden patterns,the search for clusters is unsupervised learn-ing,and the resulting system represents a data concept.Therefore,clustering is unsupervised learning of a hidden data concept.Data mining applications add to a general picture three complications:(a)large databases,(b)many attributes,(c)attributes of different types.This imposes on a data analysis se-vere computational requirements.Data mining applications include scientific data exploration,information retrieval,text mining,spatial databases,Web analysis,CRM,marketing,medical diagnostics,computational biology,and many others.They present real challenges to classic clustering algorithms. These challenges led to the emergence of powerful broadly applicable data2Pavel Berkhinmining clustering methods developed on the foundation of classic techniques.They are subject of this survey.1.1NotationsTo fix the context and clarify terminology,consider a dataset X consisting of data points (i.e.,objects ,instances ,cases ,patterns ,tuples ,transactions )x i =(x i 1,···,x id ),i =1:N ,in attribute space A ,where each component x il ∈A l ,l =1:d ,is a numerical or nominal categorical attribute (i.e.,feature ,variable ,dimension ,component ,field ).For a discussion of attribute data types see [106].Such point-by-attribute data format conceptually corresponds to a N ×d matrix and is used by a majority of algorithms reviewed below.However,data of other formats,such as variable length sequences and heterogeneous data,are not uncommon.The simplest subset in an attribute space is a direct Cartesian product of sub-ranges C = C l ⊂A ,C l ⊂A l ,called a segment (i.e.,cube ,cell ,region ).A unit is an elementary segment whose sub-ranges consist of a single category value,or of a small numerical bin.Describing the numbers of data points per every unit represents an extreme case of clustering,a histogram .This is a very expensive representation,and not a very revealing er driven segmentation is another commonly used practice in data exploration that utilizes expert knowledge regarding the importance of certain sub-domains.Unlike segmentation,clustering is assumed to be automatic,and so it is a machine learning technique.The ultimate goal of clustering is to assign points to a finite system of k subsets (clusters).Usually (but not always)subsets do not intersect,and their union is equal to a full dataset with the possible exception of outliersX =C 1 ··· C k C outliers ,C i C j =0,i =j.1.2Clustering Bibliography at GlanceGeneral references regarding clustering include [110],[205],[116],[131],[63],[72],[165],[119],[75],[141],[107],[91].A very good introduction to contem-porary data mining clustering techniques can be found in the textbook [106].There is a close relationship between clustering and many other fields.Clustering has always been used in statistics [10]and science [158].The clas-sic introduction into pattern recognition framework is given in [64].Typical applications include speech and character recognition.Machine learning clus-tering algorithms were applied to image segmentation and computer vision[117].For statistical approaches to pattern recognition see [56]and [85].Clus-tering can be viewed as a density estimation problem.This is the subject of traditional multivariate statistical estimation [197].Clustering is also widelyA Survey of Clustering Data Mining Techniques3 used for data compression in image processing,which is also known as vec-tor quantization[89].Datafitting in numerical analysis provides still another venue in data modeling[53].This survey’s emphasis is on clustering in data mining.Such clustering is characterized by large datasets with many attributes of different types. Though we do not even try to review particular applications,many important ideas are related to the specificfields.Clustering in data mining was brought to life by intense developments in information retrieval and text mining[52], [206],[58],spatial database applications,for example,GIS or astronomical data,[223],[189],[68],sequence and heterogeneous data analysis[43],Web applications[48],[111],[81],DNA analysis in computational biology[23],and many others.They resulted in a large amount of application-specific devel-opments,but also in some general techniques.These techniques and classic clustering algorithms that relate to them are surveyed below.1.3Plan of Further PresentationClassification of clustering algorithms is neither straightforward,nor canoni-cal.In reality,different classes of algorithms overlap.Traditionally clustering techniques are broadly divided in hierarchical and partitioning.Hierarchical clustering is further subdivided into agglomerative and divisive.The basics of hierarchical clustering include Lance-Williams formula,idea of conceptual clustering,now classic algorithms SLINK,COBWEB,as well as newer algo-rithms CURE and CHAMELEON.We survey these algorithms in the section Hierarchical Clustering.While hierarchical algorithms gradually(dis)assemble points into clusters (as crystals grow),partitioning algorithms learn clusters directly.In doing so they try to discover clusters either by iteratively relocating points between subsets,or by identifying areas heavily populated with data.Algorithms of thefirst kind are called Partitioning Relocation Clustering. They are further classified into probabilistic clustering(EM framework,al-gorithms SNOB,AUTOCLASS,MCLUST),k-medoids methods(algorithms PAM,CLARA,CLARANS,and its extension),and k-means methods(differ-ent schemes,initialization,optimization,harmonic means,extensions).Such methods concentrate on how well pointsfit into their clusters and tend to build clusters of proper convex shapes.Partitioning algorithms of the second type are surveyed in the section Density-Based Partitioning.They attempt to discover dense connected com-ponents of data,which areflexible in terms of their shape.Density-based connectivity is used in the algorithms DBSCAN,OPTICS,DBCLASD,while the algorithm DENCLUE exploits space density functions.These algorithms are less sensitive to outliers and can discover clusters of irregular shape.They usually work with low-dimensional numerical data,known as spatial data. Spatial objects could include not only points,but also geometrically extended objects(algorithm GDBSCAN).4Pavel BerkhinSome algorithms work with data indirectly by constructing summaries of data over the attribute space subsets.They perform space segmentation and then aggregate appropriate segments.We discuss them in the section Grid-Based Methods.They frequently use hierarchical agglomeration as one phase of processing.Algorithms BANG,STING,WaveCluster,and FC are discussed in this section.Grid-based methods are fast and handle outliers well.Grid-based methodology is also used as an intermediate step in many other algorithms (for example,CLIQUE,MAFIA).Categorical data is intimately connected with transactional databases.The concept of a similarity alone is not sufficient for clustering such data.The idea of categorical data co-occurrence comes to the rescue.The algorithms ROCK,SNN,and CACTUS are surveyed in the section Co-Occurrence of Categorical Data.The situation gets even more aggravated with the growth of the number of items involved.To help with this problem the effort is shifted from data clustering to pre-clustering of items or categorical attribute values. Development based on hyper-graph partitioning and the algorithm STIRR exemplify this approach.Many other clustering techniques are developed,primarily in machine learning,that either have theoretical significance,are used traditionally out-side the data mining community,or do notfit in previously outlined categories. The boundary is blurred.In the section Other Developments we discuss the emerging direction of constraint-based clustering,the important researchfield of graph partitioning,and the relationship of clustering to supervised learning, gradient descent,artificial neural networks,and evolutionary methods.Data Mining primarily works with large databases.Clustering large datasets presents scalability problems reviewed in the section Scalability and VLDB Extensions.Here we talk about algorithms like DIGNET,about BIRCH and other data squashing techniques,and about Hoffding or Chernoffbounds.Another trait of real-life data is high dimensionality.Corresponding de-velopments are surveyed in the section Clustering High Dimensional Data. The trouble comes from a decrease in metric separation when the dimension grows.One approach to dimensionality reduction uses attributes transforma-tions(DFT,PCA,wavelets).Another way to address the problem is through subspace clustering(algorithms CLIQUE,MAFIA,ENCLUS,OPTIGRID, PROCLUS,ORCLUS).Still another approach clusters attributes in groups and uses their derived proxies to cluster objects.This double clustering is known as co-clustering.Issues common to different clustering methods are overviewed in the sec-tion General Algorithmic Issues.We talk about assessment of results,de-termination of appropriate number of clusters to build,data preprocessing, proximity measures,and handling of outliers.For reader’s convenience we provide a classification of clustering algorithms closely followed by this survey:•Hierarchical MethodsA Survey of Clustering Data Mining Techniques5Agglomerative AlgorithmsDivisive Algorithms•Partitioning Relocation MethodsProbabilistic ClusteringK-medoids MethodsK-means Methods•Density-Based Partitioning MethodsDensity-Based Connectivity ClusteringDensity Functions Clustering•Grid-Based Methods•Methods Based on Co-Occurrence of Categorical Data•Other Clustering TechniquesConstraint-Based ClusteringGraph PartitioningClustering Algorithms and Supervised LearningClustering Algorithms in Machine Learning•Scalable Clustering Algorithms•Algorithms For High Dimensional DataSubspace ClusteringCo-Clustering Techniques1.4Important IssuesThe properties of clustering algorithms we are primarily concerned with in data mining include:•Type of attributes algorithm can handle•Scalability to large datasets•Ability to work with high dimensional data•Ability tofind clusters of irregular shape•Handling outliers•Time complexity(we frequently simply use the term complexity)•Data order dependency•Labeling or assignment(hard or strict vs.soft or fuzzy)•Reliance on a priori knowledge and user defined parameters •Interpretability of resultsRealistically,with every algorithm we discuss only some of these properties. The list is in no way exhaustive.For example,as appropriate,we also discuss algorithms ability to work in pre-defined memory buffer,to restart,and to provide an intermediate solution.6Pavel Berkhin2Hierarchical ClusteringHierarchical clustering builds a cluster hierarchy or a tree of clusters,also known as a dendrogram.Every cluster node contains child clusters;sibling clusters partition the points covered by their common parent.Such an ap-proach allows exploring data on different levels of granularity.Hierarchical clustering methods are categorized into agglomerative(bottom-up)and divi-sive(top-down)[116],[131].An agglomerative clustering starts with one-point (singleton)clusters and recursively merges two or more of the most similar clusters.A divisive clustering starts with a single cluster containing all data points and recursively splits the most appropriate cluster.The process contin-ues until a stopping criterion(frequently,the requested number k of clusters) is achieved.Advantages of hierarchical clustering include:•Flexibility regarding the level of granularity•Ease of handling any form of similarity or distance•Applicability to any attribute typesDisadvantages of hierarchical clustering are related to:•Vagueness of termination criteria•Most hierarchical algorithms do not revisit(intermediate)clusters once constructed.The classic approaches to hierarchical clustering are presented in the sub-section Linkage Metrics.Hierarchical clustering based on linkage metrics re-sults in clusters of proper(convex)shapes.Active contemporary efforts to build cluster systems that incorporate our intuitive concept of clusters as con-nected components of arbitrary shape,including the algorithms CURE and CHAMELEON,are surveyed in the subsection Hierarchical Clusters of Arbi-trary Shapes.Divisive techniques based on binary taxonomies are presented in the subsection Binary Divisive Partitioning.The subsection Other Devel-opments contains information related to incremental learning,model-based clustering,and cluster refinement.In hierarchical clustering our regular point-by-attribute data representa-tion frequently is of secondary importance.Instead,hierarchical clustering frequently deals with the N×N matrix of distances(dissimilarities)or sim-ilarities between training points sometimes called a connectivity matrix.So-called linkage metrics are constructed from elements of this matrix.The re-quirement of keeping a connectivity matrix in memory is unrealistic.To relax this limitation different techniques are used to sparsify(introduce zeros into) the connectivity matrix.This can be done by omitting entries smaller than a certain threshold,by using only a certain subset of data representatives,or by keeping with each point only a certain number of its nearest neighbors(for nearest neighbor chains see[177]).Notice that the way we process the original (dis)similarity matrix and construct a linkage metric reflects our a priori ideas about the data model.A Survey of Clustering Data Mining Techniques7With the(sparsified)connectivity matrix we can associate the weighted connectivity graph G(X,E)whose vertices X are data points,and edges E and their weights are defined by the connectivity matrix.This establishes a connection between hierarchical clustering and graph partitioning.One of the most striking developments in hierarchical clustering is the algorithm BIRCH.It is discussed in the section Scalable VLDB Extensions.Hierarchical clustering initializes a cluster system as a set of singleton clusters(agglomerative case)or a single cluster of all points(divisive case) and proceeds iteratively merging or splitting the most appropriate cluster(s) until the stopping criterion is achieved.The appropriateness of a cluster(s) for merging or splitting depends on the(dis)similarity of cluster(s)elements. This reflects a general presumption that clusters consist of similar points.An important example of dissimilarity between two points is the distance between them.To merge or split subsets of points rather than individual points,the dis-tance between individual points has to be generalized to the distance between subsets.Such a derived proximity measure is called a linkage metric.The type of a linkage metric significantly affects hierarchical algorithms,because it re-flects a particular concept of closeness and connectivity.Major inter-cluster linkage metrics[171],[177]include single link,average link,and complete link. The underlying dissimilarity measure(usually,distance)is computed for every pair of nodes with one node in thefirst set and another node in the second set.A specific operation such as minimum(single link),average(average link),or maximum(complete link)is applied to pair-wise dissimilarity measures:d(C1,C2)=Op{d(x,y),x∈C1,y∈C2}Early examples include the algorithm SLINK[199],which implements single link(Op=min),Voorhees’method[215],which implements average link (Op=Avr),and the algorithm CLINK[55],which implements complete link (Op=max).It is related to the problem offinding the Euclidean minimal spanning tree[224]and has O(N2)complexity.The methods using inter-cluster distances defined in terms of pairs of nodes(one in each respective cluster)are called graph methods.They do not use any cluster representation other than a set of points.This name naturally relates to the connectivity graph G(X,E)introduced above,because every data partition corresponds to a graph partition.Such methods can be augmented by so-called geometric methods in which a cluster is represented by its central point.Under the assumption of numerical attributes,the center point is defined as a centroid or an average of two cluster centroids subject to agglomeration.It results in centroid,median,and minimum variance linkage metrics.All of the above linkage metrics can be derived from the Lance-Williams updating formula[145],d(C iC j,C k)=a(i)d(C i,C k)+a(j)d(C j,C k)+b·d(C i,C j)+c|d(C i,C k)−d(C j,C k)|.8Pavel BerkhinHere a,b,c are coefficients corresponding to a particular linkage.This formula expresses a linkage metric between a union of the two clusters and the third cluster in terms of underlying nodes.The Lance-Williams formula is crucial to making the dis(similarity)computations feasible.Surveys of linkage metrics can be found in [170][54].When distance is used as a base measure,linkage metrics capture inter-cluster proximity.However,a similarity-based view that results in intra-cluster connectivity considerations is also used,for example,in the original average link agglomeration (Group-Average Method)[116].Under reasonable assumptions,such as reducibility condition (graph meth-ods satisfy this condition),linkage metrics methods suffer from O N 2 time complexity [177].Despite the unfavorable time complexity,these algorithms are widely used.As an example,the algorithm AGNES (AGlomerative NESt-ing)[131]is used in S-Plus.When the connectivity N ×N matrix is sparsified,graph methods directly dealing with the connectivity graph G can be used.In particular,hierarchical divisive MST (Minimum Spanning Tree)algorithm is based on graph parti-tioning [116].2.1Hierarchical Clusters of Arbitrary ShapesFor spatial data,linkage metrics based on Euclidean distance naturally gener-ate clusters of convex shapes.Meanwhile,visual inspection of spatial images frequently discovers clusters with curvy appearance.Guha et al.[99]introduced the hierarchical agglomerative clustering algo-rithm CURE (Clustering Using REpresentatives).This algorithm has a num-ber of novel features of general importance.It takes special steps to handle outliers and to provide labeling in assignment stage.It also uses two techniques to achieve scalability:data sampling (section 8),and data partitioning.CURE creates p partitions,so that fine granularity clusters are constructed in parti-tions first.A major feature of CURE is that it represents a cluster by a fixed number,c ,of points scattered around it.The distance between two clusters used in the agglomerative process is the minimum of distances between two scattered representatives.Therefore,CURE takes a middle approach between the graph (all-points)methods and the geometric (one centroid)methods.Single and average link closeness are replaced by representatives’aggregate closeness.Selecting representatives scattered around a cluster makes it pos-sible to cover non-spherical shapes.As before,agglomeration continues until the requested number k of clusters is achieved.CURE employs one additional trick:originally selected scattered points are shrunk to the geometric centroid of the cluster by a user-specified factor α.Shrinkage suppresses the affect of outliers;outliers happen to be located further from the cluster centroid than the other scattered representatives.CURE is capable of finding clusters of different shapes and sizes,and it is insensitive to outliers.Because CURE uses sampling,estimation of its complexity is not straightforward.For low-dimensional data authors provide a complexity estimate of O (N 2sample )definedA Survey of Clustering Data Mining Techniques9 in terms of a sample size.More exact bounds depend on input parameters: shrink factorα,number of representative points c,number of partitions p,and a sample size.Figure1(a)illustrates agglomeration in CURE.Three clusters, each with three representatives,are shown before and after the merge and shrinkage.Two closest representatives are connected.While the algorithm CURE works with numerical attributes(particularly low dimensional spatial data),the algorithm ROCK developed by the same researchers[100]targets hierarchical agglomerative clustering for categorical attributes.It is reviewed in the section Co-Occurrence of Categorical Data.The hierarchical agglomerative algorithm CHAMELEON[127]uses the connectivity graph G corresponding to the K-nearest neighbor model spar-sification of the connectivity matrix:the edges of K most similar points to any given point are preserved,the rest are pruned.CHAMELEON has two stages.In thefirst stage small tight clusters are built to ignite the second stage.This involves a graph partitioning[129].In the second stage agglomer-ative process is performed.It utilizes measures of relative inter-connectivity RI(C i,C j)and relative closeness RC(C i,C j);both are locally normalized by internal interconnectivity and closeness of clusters C i and C j.In this sense the modeling is dynamic:it depends on data locally.Normalization involves certain non-obvious graph operations[129].CHAMELEON relies heavily on graph partitioning implemented in the library HMETIS(see the section6). Agglomerative process depends on user provided thresholds.A decision to merge is made based on the combinationRI(C i,C j)·RC(C i,C j)αof local measures.The algorithm does not depend on assumptions about the data model.It has been proven tofind clusters of different shapes,densities, and sizes in2D(two-dimensional)space.It has a complexity of O(Nm+ Nlog(N)+m2log(m),where m is the number of sub-clusters built during the first initialization phase.Figure1(b)(analogous to the one in[127])clarifies the difference with CURE.It presents a choice of four clusters(a)-(d)for a merge.While CURE would merge clusters(a)and(b),CHAMELEON makes intuitively better choice of merging(c)and(d).2.2Binary Divisive PartitioningIn linguistics,information retrieval,and document clustering applications bi-nary taxonomies are very useful.Linear algebra methods,based on singular value decomposition(SVD)are used for this purpose in collaborativefilter-ing and information retrieval[26].Application of SVD to hierarchical divisive clustering of document collections resulted in the PDDP(Principal Direction Divisive Partitioning)algorithm[31].In our notations,object x is a docu-ment,l th attribute corresponds to a word(index term),and a matrix X entry x il is a measure(e.g.TF-IDF)of l-term frequency in a document x.PDDP constructs SVD decomposition of the matrix10Pavel Berkhin(a)Algorithm CURE (b)Algorithm CHAMELEONFig.1.Agglomeration in Clusters of Arbitrary Shapes(X −e ¯x ),¯x =1Ni =1:N x i ,e =(1,...,1)T .This algorithm bisects data in Euclidean space by a hyperplane that passes through data centroid orthogonal to the eigenvector with the largest singular value.A k -way split is also possible if the k largest singular values are consid-ered.Bisecting is a good way to categorize documents and it yields a binary tree.When k -means (2-means)is used for bisecting,the dividing hyperplane is orthogonal to the line connecting the two centroids.The comparative study of SVD vs.k -means approaches [191]can be used for further references.Hier-archical divisive bisecting k -means was proven [206]to be preferable to PDDP for document clustering.While PDDP or 2-means are concerned with how to split a cluster,the problem of which cluster to split is also important.Simple strategies are:(1)split each node at a given level,(2)split the cluster with highest cardinality,and,(3)split the cluster with the largest intra-cluster variance.All three strategies have problems.For a more detailed analysis of this subject and better strategies,see [192].2.3Other DevelopmentsOne of early agglomerative clustering algorithms,Ward’s method [222],is based not on linkage metric,but on an objective function used in k -means.The merger decision is viewed in terms of its effect on the objective function.The popular hierarchical clustering algorithm for categorical data COB-WEB [77]has two very important qualities.First,it utilizes incremental learn-ing.Instead of following divisive or agglomerative approaches,it dynamically builds a dendrogram by processing one data point at a time.Second,COB-WEB is an example of conceptual or model-based learning.This means that each cluster is considered as a model that can be described intrinsically,rather than as a collection of points assigned to it.COBWEB’s dendrogram is calleda classification tree.Each tree node(cluster)C is associated with the condi-tional probabilities for categorical attribute-values pairs,P r(x l=νlp|C),l=1:d,p=1:|A l|.This easily can be recognized as a C-specific Na¨ıve Bayes classifier.During the classification tree construction,every new point is descended along the tree and the tree is potentially updated(by an insert/split/merge/create op-eration).Decisions are based on the category utility[49]CU{C1,...,C k}=1j=1:kCU(C j)CU(C j)=l,p(P r(x l=νlp|C j)2−(P r(x l=νlp)2.Category utility is similar to the GINI index.It rewards clusters C j for in-creases in predictability of the categorical attribute valuesνlp.Being incre-mental,COBWEB is fast with a complexity of O(tN),though it depends non-linearly on tree characteristics packed into a constant t.There is a similar incremental hierarchical algorithm for all numerical attributes called CLAS-SIT[88].CLASSIT associates normal distributions with cluster nodes.Both algorithms can result in highly unbalanced trees.Chiu et al.[47]proposed another conceptual or model-based approach to hierarchical clustering.This development contains several different use-ful features,such as the extension of scalability preprocessing to categori-cal attributes,outliers handling,and a two-step strategy for monitoring the number of clusters including BIC(defined below).A model associated with a cluster covers both numerical and categorical attributes and constitutes a blend of Gaussian and multinomial models.Denote corresponding multivari-ate parameters byθ.With every cluster C we associate a logarithm of its (classification)likelihoodl C=x i∈Clog(p(x i|θ))The algorithm uses maximum likelihood estimates for parameterθ.The dis-tance between two clusters is defined(instead of linkage metric)as a decrease in log-likelihoodd(C1,C2)=l C1+l C2−l C1∪C2caused by merging of the two clusters under consideration.The agglomerative process continues until the stopping criterion is satisfied.As such,determina-tion of the best k is automatic.This algorithm has the commercial implemen-tation(in SPSS Clementine).The complexity of the algorithm is linear in N for the summarization phase.Traditional hierarchical clustering does not change points membership in once assigned clusters due to its greedy approach:after a merge or a split is selected it is not refined.Though COBWEB does reconsider its decisions,its。
cdv_-营销管理概念之中英文对照
A-PDF OFFICE TO PDF DEMO: Purchase from to remove the watermark 营销管理》》概念之中英文对照营销管理《营销管理》概念之中英文对照上海交通大学管理学院李国振AAccelerated test marketing 加速试销Acceleration principle 加速原则Accuracy 准确性Activity-based cost accounting: ABC 作业成本会计Activity-based cost:ABC 活动-基础成本Adequate marketing information 充分的营销信息Administered VMS 管理式分销系统Advanced pricing agreement 事前定价协议Advertising 广告Advertising goal 广告目标Advocate channels 提倡者渠道Advocates clients 主动性客户Affinity 亲密关系Agent middlemen 代理中间商Aggressive demarketing response 大胆地减营销反应Allocating skills 分配技能Allowances 折让Annual call schedule 年度访问计划日程表Annual plan control 年度计划控制Annual planning stage 年度计划工作阶段Antimerger Act 反合并法Area market potential 地区市场潜力Area market specialists 地区市场专家Arranged interviews 安排访问Aspirational groups 崇拜性群体Asset turnover 资金周转率Assortment strategies 品种搭配战略At-home shopping channels 家庭购物频道Atmospheres 环境气氛Atmospheres atmospherics 环境气氛?Attention,interest,desire,action AIDA模式Attitudes 态度Attribution 归因Augmented product 附加产品Automatic vending 自动售货Available market 有效市场BBackward channel 后向渠道Backward flow 反向流程Backward integration 后向一体化Balanced orientation 平衡导向Bargain buyers 竞价购买者Bargaining 讨价还价Barter transaction 易货贸易Basic product 基础产品Basing-point pricing 基点定价Behavioral segmentation 行为细分Belief 信念Benefit segmentation 利益的细分化Best alternative to negotiation agreement: BATNA 谈判协议之最佳备选方案Bill of lading 提单Blanket contracts 长期订货合同Bonuses 红利Boston Consulting Group (BCG) 波士顿矩阵Bottom-up planning 自下而上计划工作Brainstorming 头脑风暴法Branchising 品牌专营Brand 品牌Brand acceptability 品牌接受度Brand awareness 品牌知晓度Brand beliefs 品牌信念Brand equity 品牌权益Brand familiarity 品牌熟悉Brand image 品牌形象Brand ladder 品牌阶梯Brand loyalty 品牌忠诚度Brand parity 品牌类型Brand preference 品牌偏好度Brand switchers 品牌转换者Brand-extension decision 品牌扩展决策Brand-form market 品牌试样市场Brand-form market 品牌试样市场Brank-even analysis 保本分析Break-even chart 保本图Bridge 桥梁者Brokers 经纪人Budget-to-sales ratio 预算对销售额比率Build 发展Bulletin board system : BBS 公告牌系统Business domain 业务范围Business services 业务服务Business strength 业务优势Buyclasses 购买等级Buyer intention surveys 购买者意图调查Buyer market 卖方市场Buyer turnover 顾客周转率Buyer-readiness stage 购买者待购阶段Buyflow 购买流程Buyflow map 购买流程图Buying behavior 购买行为Buying decision process 购买决策过程Buying roles 参入购买的角色Buying styles 采购方式Buyphases 采购阶段Bypass attack 绕道进攻By-product pricing 副产品定价CCall reports 访问报告Canned approach 固定法Capital items 资本项目Captive-product pricing 专属产品定价Cardinal utility 主要效用Cash cow 金牛类Cash discounts 现金折扣Catalog show rooms 样品目录陈列室Category development index 品种发展指数Category management 类目管理Causal-analysis diagram 因素分析图Census tracts 普查区域Central business districts 商业中心区Centralized purchasing 集中采购Chain ratio method 连比法Chains stores 连锁商店Channel captain 渠道领袖Channel competition 渠道竞争Channel conflict 渠道冲突Channel cooperation 渠道合作Channel lever 渠道级数Channels of distribution 分配渠道Cheaper goods strategy 廉价商品战略Child Protection Act 儿童保障法案Choosing the value选择价值City version 都市版本Classical conditioning theory 传统的条件作用理论Cleanliness 清洁Clients 客户Cliques 派系Clod calls 冷漠访问Closed-end questions 封闭式问题Closing the sale 达成交易Cluster analysis 集群分析Clustered preferences 集群偏好Co-branding 合作品牌Coercive power 强制力量Cold calls 冷漠访问Combination stores 联合商店Commerce control list 商品控制目录Commerce country chart 贸易国别表Commercial invoice 商业发票Commercialization 商业化Commission agent 佣金代理人Common market 共同市场Communicate the value传播价值Communication 沟通Communication adaptation 沟通传播适应Communication-effect research 沟通效果调研Company demand 公司需求Company marketing opportunity 公司营销机会Company performance 公司绩效Company potential 公司潜量Company sales forecast 公司销售预测Company sales forecast 公司销售预测Company sales potential 公司销售潜量Comparison advertising 比较广告Compensation deals 补偿贸易Competitive cycle 竞争周期Competitive differentiation 竞争差别化Competitive scopes 竞争范围Competitive strategies 竞争战略Complete industrialization 工业化Complex buying behavior 复杂的购买行为Complex sales-force structures 复杂的销售队伍结构Component co-branding 中间产品合作品牌Computer-aided design (CAD) 电脑辅助设计Computer-aided manufacturing (CAM) 电脑辅助制造Concentrated marketing 集中营销Concentric diversification 同心多样化Concept development and testing 概念开发与测试Conclusion drawing 作出结论Conclusion drawing 结论阐述Conformance quality 一致性质量Conglomerate diversification 跨行业多样化Conjoint measurement 共变量衡量Consistency 黏度Consular invoice 领事发票Consumer behavior 消费者行为Consumer choice 消费者选择Consumer confidence measure 消费者信任衡量Consumer cooperatives 消费者合作社Consumer credit 消费者信用Consumer expenditure patterns 消费者支出模式Consumer information sources 消费者信息来源Consumer market 消费者市场Consumer promotion 针对消费者的促销Consumer satisfaction 消费者满意Consumer sentiment measure 消费者情绪衡量Consumer sovereignty 消费者主权Consumer testing 消费者测试Consumer-adoption process 消费者采购过程Consumer-goods classification 消费品分类Consumption system 整体消费系统Contact method 接触方法Containerization 集装箱Contingency plans 权变计划Continuity advertising 连续性广告Contract manufacturing 合同制造,契约式生产Contraction defense 收缩防御Contractual VMS 合同式分销系统Controlled test marketing 控制试销Convenience 便利Convenience food stores 方便食品店Convenience goods 方便品Coporate VMS 公司式分销系统Copy testing 文稿测试Core benefit 核心利益Core competence 核心能力Core concept of marketing 市场营销的核心慨念Core product 核心产品Corporate culture 公司文化Cost estimation 成本估计Cost reduction 成本降价Cost to the customer 顾客成本Cost-plus pricing 成本加成定价Counter purchase 反向购买Counter segmentation 反细分化Counter trade 对销贸易Counteroffensive defense 反攻式防御Creative marketing 创造营销Creativity research 创造性调研Credence qualities 信任质量Credulous person standard 轻信人指标Critical path diagram 关键线路图Critical path scheduling (CPS) 关键线路排序Cross-functional teams 跨职能小组Cross-impact analysis 交叉影响分析法Cues 诱因Cultural empathy 文化转移Cultural environment 文化环境Cultural factors 文化因素Current marketing situation 当前营销状况Current profit maximization 当期利润最大化Custom union 关税同盟Customary merchantable quality 通行的畅销品质量Customer attributes (CAs) 顾客属性Customer consulting 客户咨询Customer database 顾客数据库Customer database marketing 顾客数据库营销Customer defection rate 消费者流失率Customer delivered value顾客让渡价值Customer groups 顾客群Customer importance 顾客重要性Customer mailing list 顾客邮寄单Customer needs and wants 顾客需求和欲望Customer relationship management 客户关系管理Customer satisfying process 顾客满意过程Customer train 客户培训Customer-attitude tracking 消费者态度追踪Customers 顾客Customers union 关税Customer-service decision 顾客服务决策Customer-structured sales force 按顾客组织的销售队伍DData-collection methods 数据收集方法Dealer sales contests 经销商销售竞赛Death rate 死亡率Decision models 决策模型Decision-tree diagram 决策树图解Decline-stage 衰退期Declining market share 下降中的市场份额Decoding 解码Delivery 送货Delphi method 德尔菲法Demand 需求Demand curve 需求曲线Demand management 需求管理Demand measurement 需求衡量Demand/hazard forecasting 需求/风险预测Demographic environment 人口统计环境Demographic markets 人口统计市场Demographic segmentation 人口统计细分化Demographic segmentation 人文统计细分Department stores 百货商店Descriptive models 描述性模型Design 设计Desired image 期望印象Differentiated marketing 差异化营销Differentiation 差异化Diffused preferences 扩散偏好Direct costs 直接成本Direct export 直接出口Direct investment 直接投资Direct mail 直接邮寄Direct marketing 直接营销Direct of -response advertising 直复广告Direct product profitability 直接商品盈利率Direct purchasing 直接采购Direct-marketing channel 直接营销渠道Direct-order marketing 直接订货营销Direct-relationship marketing 直接关系营销Discount stores 折价商店,廉价商店Discriminated analysis 差别分析Discriminatory pricing 差别定价Disjunctive model of consumer choice 消费者选择的重点模式Display 陈列Disqualified prospects customer 不合格预期顾客Dissociative groups 隔离群体Dissonance-reducing buying behavior 减少不协调购买行为Distribution channel 分销渠道Distribution programming 分销计划Distribution-innovation strategy 分销创新战略Distributors 分销商(批发商)Diversification growth 多样化成长Diversification growth opportunities 多样化增长机会Divest 放弃Dogs 狗类Do-house 入户访问Donor markets 捐赠者市场Door to door retailing 挨家挨户零售Drive 驱动力Dual adaptation 双重适应Dual branding 双品牌Dumping 倾销Dumping and countervailing duty 倾销与反倾销Durable goods 耐用品EEarly-adopter theory 早期采用理论Economic value to the customer:EVC 顾客经济价值Eco-label 生态标志Efficient consumer response 消费者良好反应Ego drive 自我驱向Elderly consumers 老年消费者Electric date interchange (EDI) 电子数据交换Electronic Request for Item Classification 商品分类的电子申请Embargo 禁运Emergency goods 救济品Emotional appeals 感性诉求Empathy 感同力Encirclement attack 包围进攻Encoding 编码End users 最终用户Engineering attributes (EAs) 工程属性Environment threat 环境威胁Environmental change 环境变化Environmental forecasting 环境预测Environmental scanning 环境扫描Environmental threat 环境威胁Equity joint venture 股权式合资Esteem needs 尊重需要Evaluating effectiveness 评估效益Evaluation alternatives 可供选择方案的评估Evaluation procedure 评价程序Event creation 时间制造Events 事件Everyday fair pricing 每天公平定价Everyday low pricing:EDLP 天天低价Exchange 交换Exchange control 外汇控制Excise taxes 消费税Exclusive agent 独家代理人Exclusive dealing 排他性经营Exclusive distribution 专营Exclusive distribution 专营性分销Exclusive summary 经理摘要Exit barriers 退出障碍Expectancy-value model of consumer choice 消费者选择期望值模型Expected product 期望产品Expected service 预期服务Experience curve 经验曲线Experience or learning curve 经验或者学习曲线Experience qualities 经验质量Experimental design 实验设计Experimental research 实验法Expert channels 专家渠道Expert opinion 专家法Expert power 专家力量Export Administration Regulation 出口管理条例Export Control Classification Number 出口管制分类号Export declaration 出口申请表Export development 出口部Export License Application and Information Network 出口许可证申请和信息网Export management company 出口管理公司External environment 外部环境External marketing 外部营销FFactor analysis 因子分析Fair Credit Collecting Act 公正索债行为法案Fair Credit Reporting Act 公正信贷报告法案Fair Packaging and Labeling Act 公正包装和标签法案Family buying 家庭购买Family life cycle 家庭生命周期Fast-food industry 快餐行业Fear appeals 害怕性诉求Features 特点Fed 时尚Feedback 反馈Feedback-system diagram 反馈系统图Financial executives 财务主管Financial intermediaries 财务中介机构Financial leverage 财务杠杆率First-time customer 首次购买顾客First-time prospects 首次购买者Fishback 鱼背Five-Ms(5Ms): Mission 任务Fixed costs 固定成本Fixed price-and-incentive 固定价格与奖励Fixed-pricing 固定定价法Flank attack 侧翼进攻Flanking defense 侧翼防御Fluctuation demand 动摇不定的需求FOB origin pricing 原地运输工具上交货定价法Focus-group interviewing 集中小组面谈Focus-group research 焦点访谈座谈会Follow-up 续后工作Foreign Corporate Practices Act 反对外贿赂法Foreign-freight forwarder 代理公司Forgetting rate 遗忘率Formulated approach 公式化方法Forums 论坛Forward flow 正向流程Forward integration 前向一体化Fragile-market-share trap 脆弱的市场占有率陷阱Fragmented industry 裂化行业Franchise 特许经营Franchise organizations 特许经营组织Franchising 特许经营Free-trade area 自由贸易区Freight-absorption pricing 运费吸收定价Frequency 购买次数Frequency marketing programs (FMP) 频繁营销计划Frontal attack 正面进攻Full costing 总成本法Full coverage 全部覆盖Full-service retailing 全面服务零售Functional discounts 功能折扣Functional tests 功能测试Functional-relationship diagram 功能关系图GGame play 博弈计划Gatekeepers 守门人Gatekeepers 把关人General Agreement on Tariffs and Trade 关税与贸易总协定General Agreement on Trade in Service (GATS) 服务贸易总协定Generation 代沟Generies 无品牌产品Geographic segmentation 地理细分Geographical organizations 地理区域性组织Geographical pricing 地理定价Geographical segmentation 地理细分化Geographical shifts in population 人口地理转移Geographical-expansion strategy 地理扩张战略Ghost shoppers 幽灵顾客Global marketing 全球营销Global organization 全球组织Global strategy 全球战略Glocal strategy 全球本地化战略Goal set 目标确定Goals-down-plans-up planning 目标下达、计划上报之计划工作Going-rate pricing 通行价格定价Going-rate pricing 随行就市定价法Goods-producing process 商品生产过程Goodwill 商誉Government markets 政府市场Government regulations 政府规定Growth opportunities 成长机会Growth stage 成长阶段Growth-share matrix 成长份额矩阵Growth-shave matrix 成长-份额矩阵Guarantees 保证Guerrilla attack 游击进攻HHabitual buying behavior 习惯的购买行为Harvest 收获Harvest/divest strategy 收获/放弃战略Heavy-user target marketing 大量使用者目标营销Heuristic programming 启发式规划Hierarchical objectives 层级目标Hierarchy-of-effects” model “影响的层级”模型High-context culture 高语境文化Hold 维持Home-country middleman 母国中间商Homogeneous preferences 同质偏好Horizontal channel 水平渠道Horizontal diversification 水平式多样化Horizontal integration 水平式一体化Horizontal marketing system 水平营销系统Hypermarches 巨型超级市场IIdea generation 构思产生Idea screening 构思筛选Idea-brand model of consumer choice 消费者选择的理想品牌模型Identity-building program 识别建立方案Image analysis 形象分析Image power 形象力Importance-performance 重要绩效分析Improved-services strategy 改进服务战略Impulse goods 冲动品Income distribution 收入分配Income segmentation 收入细分化Indend agent 定购代理人Index method 指数法Indirect export 间接出口Indirect exporting 间接出口Industrial markets 工业品市场Industrial-goods classification 工业品分类Industrial-goods market testing 工业品市场试销Industry 行业Industry attractiveness 行业吸引力Industry forecast 行业预测Inelastic demand 无弹性需求Information search 信息收集Informative advertising 通知性广告In-house marketing seminars 公司内部营销调研会Innovation diffusion 创新扩散Inseparability 不可分离性Inside sales force 内勤销售人员Installation 安装Institutional loyalty 社会公用事业忠诚Institutional market 机构市场Intangibility 无形性Integrated direct marketing 整合直接营销Integrated management information system 一体化的管理信息系统Integrated marketing 整合营销Integrated marketing communication 整合营销沟通Integrated marketing organization 整合营销组织Integrative growth 一体化成长Integrative growth opportunities 一体化增长机会Intensive distribution 密集性分销Intensive growth 密集成长Intensive growth opportunities 密集型增长机会Interacting skills 影响能力Interactive marketing 交互营销Intercept interviews 拦截访问Interest rate 利率Intermodal services 分式联运Internal accounting system 内部会计系统Internal marketing 内部营销International divisions 国际事业部International product life cycle 国际产品生命周期International subsidiaries 国际子公司Investment portfolio 投资组合JJoint venture co-branding 合资合作品牌Joint ventures 合资企业Joint venturing 合营Just-in-time production methods 准时生产方法KKey account concentration 关键客户集中化Kiosk 顾客订货机Knowledge arbitrage 知识套利LLabeling 标签Lead time 前量时间Lean production 精益生产Learning 学习Least-development countries (LLDCs) 最不发达国家Legitimate power 法律力量Length 长度Less-development countries (LDCs) 欠发达国家Lexicographic model of consumer choice 消费者选择的字典编纂式模型Liaison 联络者Licensing 许可证Life expectancy 预期寿命Life style 生活方式Lifetime value寿命价值Life-way groups 生活法群体Limited-service retailing 有限服务零售Line filling 产品线填补Local marketing 本地化营销Logical resistance 逻辑抵触Logical-flow diagram 逻辑流程图Logistics management 后勤管理Long-range-planning stage 长期计划工作阶段Long-term contracts 长期合同Low-context culture 低语境文化Low-quality trap 低质量陷阱Loyalty status 忠诚地位Lubrication 打点MMail questionnaire 邮寄调查表Mail-order retailing 邮购零售Maintenance and repair 维修Management by objectives 目标管理Management contract 出售管理合同Management contracting 契约式经营Management of objectives 目标管理Manufacturer’s export agent 制造商的出口代理Manufacturing-cost-reduction strategy 制造成本降低战略Manufacturing-driven 制造驱动Market attractiveness 市场吸引力Market broadening 市场拓宽Market challenger strategies 市场挑战者战略Market coverage strategies 市场覆盖战略Market crystallization stage 市场具体化阶段Market demand 市场需求Market demand function 市场需求函数Market development 市场发展Market diversification 市场多样化Market evolution 市场演进Market expansion stage 市场演进阶段Market follower strategies 市场追随者战略Market forecast 市场预测Market fragmentation stage 市场分装阶段Market growth rate 市场成长率Market logistics 市场后勤学Market measurement 市场衡量Market minimum 基本销售量Market minimum 市场最低点Market modification 市场改进Market opportunity analysis 市场机会分析Market opportunity index 市场机会指数Market orientation 市场导向Market partitioning 市场分割Market partitioning theory 市场分割理论Market penetration pricing 市场渗透定价Market penetration strategy 市场渗透战略Market positioning 市场定位Market potential 市场潜量Market prioritizing 市场优化Market probing 市场探察Market response function 市场反应函数Market segmentation 市场细分化Market share 市场份额Market specialization 市场专业化Market targeting 目标市场选定Market termination stage 市场终止阶段Market-buildup method 市场组合法Market-centered organization 以市场为中心的组织Marketing 营销Marketing administrative/management 营销管理Marketing allocation optimization 营销分配最优化Marketing audit 营销审计Marketing channel 营销渠道Marketing communications mix 营销沟通组合Marketing community 营销社团Marketing concept 营销观念Marketing controller 营销主计长Marketing culture 营销文化Marketing database 营销数据库Marketing decision support system (MDSS) 营销决策支持系统Marketing effectiveness rating review 营销效益等级评核Marketing efficiency studies 营销效益调研Marketing elasticity 营销弹性Marketing environment 营销环境Marketing expenditures 营销支出Marketing expense-to-sale analysis 营销费用对销售额分析Marketing hyperopia 营销远视症Marketing implementation 营销执行Marketing information system (MIS) 营销信息系统Marketing intelligence system 营销情报系统Marketing legislation 市场营销立法Marketing mix 营销组合Marketing network 营销网Marketing opportunity 营销机会Marketing orientation 营销导向Marketing philosophy 营销哲学Marketing process 营销程序Marketing profit ability analysis 营销盈利率分析Marketing public-relations 营销公关Marketing research 营销调研Marketing research 市场调查Marketing resource allocation 营销资源分配Marketing resources 营销资源Marketing sensitivity of demand 营销需求敏感性Marketing service agencies 营销服务机构Marketing targeting 市场选择Marketing test 试销Marketing-driven 营销驱动Marketing-mix modification 营销组合改进Marketing-mix optimization 营销组合最优化Market-leader strategies 市场领导者战略Market-niche strategies 市场补缺战略Market-oriented definitions of business 企业的市场导向定义Market-protection activities 市场保护行动Market-reconsolidation stage 市场再结合阶段Markets manager 市场经理Market-share leadership 市场份额领导地位Market-skimming pricing 市场撇脂定价Markov-process model 马尔可夫过程模型Mass market 大众化市场Mass marketing 大规模营销Mass marketing 大众化营销Mass media 大众媒体Materials and parts 材料和部件Matrix organization 矩阵组织Maturity stage 成熟阶段Maximarketing 最大化营销Measurement 衡量Media selection 媒体选择Mega marketing 大市场营销Megatrends 大趋势Merchandise selection 商品选择Merchant middlemen 中间商Merchant wholesales 批发商Message 信息Message content 信息内容Message evaluation and selection 信息评估和选择Message execution 信息执行Message generation 信息产生Message source 信息来源Micromarkets 微观市场Microsales analysis 微观销售分析Mini-market testing 微型市场测试Mini-max criterion 最小化的最大损失原则Miscellaneous service 多种服务Miscellaneous wholesales 其他批发商Missionary sales force 传教式销售队伍Missionary sales force 访问使团推销队伍Mobile defense 运动防御Model bank 模型库Modified rebuy 修正再采购,改进的再购买Monetary amount 购买金额Monitoring skills 监控技术Mono-chronic time 单一时间利用方式Moral appeals 道德诉求More-development countries (MDCs) 发达国家Morphological analysis 形态分析Multi-brand decision 多品牌决策Multichannel conflict 多渠道冲突Multi-channel marketing system 多渠道分销系统Multinational strategy 多国战略Multiple regression analysis 多元回归分析Multiple scenarios 多种情景描绘法Multiple –sponsor co-branding 多发起人合作品牌Multiple-factor index method 多因数指数法Multivariate statistical techniques 多变量统计技术NNational account management division 全国性大客户管理部National Environmental Policy Act 全国环境政策法案National traffic and Safety Act 全国交通和安全法案Need markets 需求市场Needs and wants 需求和欲望Need-satisfaction approach 需求-满足法Need-size-from-brand market 需要-规模-形式-品牌市场Negotiated exchange 谈判交换Negotiated-contract buying 按商定的合同购买Negotiation strategy 谈判(讨价还价)战略Networking 网络化Network-planning diagram 网络计划图Networks 网络Neural-networking software 神经网络软件New market strategy 新市场战略New product development 新产品开发New product failure rate 新产品失败率New product pricing 新产品定价New task 新任务New users and use 新用户和用途New-business plan 新业务计划Newly industrialized countries (NICs) 新型工业化国家Newsgroups 消息组Non-durable goods 非耐用品Non-personal communication channels 非人员沟通渠道Non-profit sector 非盈利部门Nonstore retailing 非商店零售Non-tariff barriers 非关税壁垒Nontraceable common cost 不可追溯的先期成本Norazi agent 走私代理商Novices 新手OObjective and task method 目标和任务法Objectives 购买目的Objects 购买对象Observational research 观察法Occasion segmentation 时机细分Occasions 购买时机Occupants 购买者Occupation/consumer behavior 职业/消费者行为Offer 提供物Oligopoly 垄断One-lever channel 一级渠道One-one marketing 一对一营销One-or-two messages 单面或双面信息Online marketing channel 网上营销渠道Open-bid buying 公开招标Open-end question 开放式问题Operational efficiency 工作效率Operations 营运Operations 购买行为Opportunities/threats analysis 机会/威胁分析Optional-product pricing 选购产品定价Order getters 订单争取者Order of presentation 展示次序Order point 订货点Order processing 订单处理Order takers 订单承接者Ordering ease 订货方便Order-routine specification 订单程序具体规定Order-shipping-billing cycle 订货-发运-开帐单周期Order-to-payment cycle 订单收购系统Organizational adaptability 组织的适应能力Organizational buyers 组织结构采购者Organizational climate 组织气候Organizational market 组织结构市场Organizational-environment fit 组织-结构相适应Organizations 购买组织Original equipment manufacture 设备制造商Outdoor advertising 户外广告Outlets 购买地点Outside sales force 外勤销售人员Over demand 供不应求PPackaging 包装Paired comparisons 配对市场Parallel importing 平行进口Partners 合伙人Penetrated market 渗透市场Perceived risk 认知风险Perceived service 感知服务Perceived value认知价值Perceived-value pricing 认知价值定价法Percentage-of-sales method 销售额百分率法Performance business 高绩效业务Performance quality 性能质量Performance review 绩效评核Periodic purchase orders 定期采购订单Perishability 易消失性Personal influence 个人影响Personal interviewing 面谈访问Personal selling 人员推销Persuasive advertising 劝说性广告Persuasive advertising 说服性广告Physical distribution 实体分配Physical distribution system 实体分配体系Physical environment 自然环境Physical evidence 实体证明Physiological needs 心理需要Piggyback 猪背Place 渠道Place utility 地点效用Point-of-purchase displays and demonstrations 售点陈列和示范Point-of-purchase processing 销售点程序Political power 政治权力Political/legal environment 政治/法律环境Poly-chronic time 多种时间利用方式Population growth 人口增长Portfolio plan 业务经营组合计划Position defense 阵地防御Postpurchase behavior 购买压后的行为Potential market 潜在市场Potential product 潜在产品Preemptive defense 先发制人防御Pre-industrial or commercial stage 前工业或商业阶段Prestige pricing 威望(领袖)定价法Prestige-goods strategy 威望(领袖)商品战略Price 价格Price dumping 价格倾销Price elasticity of demand 价格需求弹性Price escalation 价格阶升Price fixing 限定价格Price increases 提价Price packs 价格包Price points 价格点Price setting 限定价格Price steps 价格间距Price-discount strategy 价格折扣战略Primary data 第一手资料Primary demand 主要需求Primary manufacturing 初级制造业Product 产品Product adaptation 产品适应Product assortment 产品品种搭配Product attributes 产品品种属性Product buy-back agreement 产品回购协议Product categories 产品种类Product choice set 产品选择组合Product classification schemes 产品分类设计Product concept 产品概念Product decisions 产品决策Product features 产品特点Product hierarchy 产品层级Product image 产品形象Product innovation strategy 产品创新战略Product invention 产品发明Product life cycle 产品生命周期Product line 产品线Product- line decisions 产品线决的策Product mix 产品组合Product modification 产品改进Product orientation 产品导向Product positioning 产品定位Product proliferation strategy 产品扩散战略Product specialization 产品专门化Product style 产品试样Product support services 产品支持服务Product system 产品系统Product value analysis (PV A) 产品价值分析Product/Battlefield 产品/市场竞争形势图Product/market expansion grid 产品/市场格子Product/market expansion grid 产品/市场扩展方格Product-differentiated marketing 产品差异营销Production concept 生产观念Productivity 生产率Product-line decisions 产品线决策Product-line pricing 产品线定价Product-structured sales force 产品组织的销售队伍Product-use test 产品使用测试Professional purchasing 专业采购Profit equation 利润方程式Profit margin 净利润Profit optimization 利润最优化Profitability control 盈利率控制Profitable customer 有利益的顾客Profit-and-loss statements 损益表Programmed buyers 程序购买者Promotion 促销Promotion clutter 促销喧嚣Promotion mix 促销组合Promotional pricing 促销定价Proposal solicitation 征求供应建议书Prospects customer 预期顾客Provide the value提供价值Psychographic segmentation 按心理细分化Psychological discounting 心理性折旧Psychological life cycles 心理生命周期Psychological life-cycle stages 心理上的生命周期阶段Psychological pricing 心理定价法Psychological resistance 心理抵触Public opinion 公众意见Public relation 公共关系Public-interest groups 公共利益群体Publicity 公众宣传Pull strategy 拉动战略Pulsing advertising 间歇式广告Purchase decision 购买决策Purchase frequency 购买频率Purchase intention 购买意图Purchase probability scale 购买概率量表Purchase subdecisions 购买子决策Purchase taxes 进货税Purchasing-performance evaluation 采购绩效评估Pure competition 完全竞争Pure monopoly 完全独占Push money 推销金Push strategy 推动战略Push/pull strategy 推/拉战略QQualified available market 合格有效市场Qualified prospects customer 合格预期顾客Quantity discount 数量折扣Question mark 问题表Questionnaires 调查表Queuing models 排队模型RRacial population 种族人口Rate of return on net worth 净资产报酬率Rating scales 评价量表Rational appeals 理性诉求Real need 真实需求Real-income decline 实际收入下降Receiver 接受者Recency 近期购买Reciprocal marketing 双边营销Recruitment procedures 招聘程序Reference groups 相关群体Reference groups 参考群体Referent power 相关力量Referral networks 推荐网络Regional cooperation groups 区域性合作集团Reinforcement advertising 增援广告Relationship 关系Relationship buyers 关系购买者Relationship marketing 关系营销Relative market share 相对市场份额Reliability 可靠性Reminder advertising 提醒性广告Reminding advertising 提醒性广告Repairability 可维修性Repeat customer 重复购买顾客Repeat sales 重复销售Replacement sales 更新销售Reseller market 再售商市场Resident sales agent 外常驻销售代理人Resistance to marketing 对营销的阻力Response 反应Response compression 紧迫反应Response hierarchy models 反应层次模型Responsive marketing 响应营销Responsiveness 反应性Retail life cycle 零售生命周期Retail sales taxes 零售营业税Retailer cooperatives 零售商合作组织Retailer version 零售版本Return on assets 资产报酬率Return on investment 投资报酬率Reward power 报酬力量Risk taking 承担风险Rollout marketing 扩展营销Routinized exchange 惯例化交换Runaway inflation 脱缰式通货膨胀SSafety needs 安全需要Sale-effect research 销售效果研究Sale-estimation methods 销售估算法Sales agent 销售代理人(商)Sales analysis 销售分析Sales approach 推销方法Sales budget 销售预算Sales concept 销售观念Sales force 销售队伍Sales force compensation 销售队伍报酬Sales force efficiency 销售队伍效率Sales force estimates 销售队伍估计Sales force evaluation 销售队伍评价Sales force motivation 销售队伍激励Sales force objectives 销售队伍目标Sales force recruitment and selection 销售队伍招聘与挑选Sales force size 销售队伍规模Sales force stereotypes 销售队伍类型Sales force strategy 销售队伍战略Sales force structure 销售队伍结构Sales force supervision 销售队伍监督Sales force training 销售队伍培训Sales presentation 销售展示陈说Sales promotion 销售促进Sales quota 销售定额Sales-generation activities 实现销售的活动Salesmanship 销售技术Sales-response function 销售反应函数。
Aruba 9004 Series Gateway Datasheet说明书
Aruba 9004 Series Gateways provide high-performance SD-WAN and security functionality in a compact and cost-effective form factor. Ideally suited for branch and small campus networks, the 9004 Series Gateways serve a key role within Aruba’s SD-Branch solution, which unifies WLAN, LAN, SD-WAN and security for distributed enterprises.The 9004 Series can be easily configured and managed using Aruba Central, a cloud-based network operations, assurance and security platform. Onsite deployment is accomplished with a simple mobile installer application.HIGH PERFORMANCE AND RELIABILITYFor distributed enterprises with increasing performance and bandwidth needs, the 9004 is designed with scale and flexibility, and equipped with plenty of horsepower. The 9004 provides connectivity for up to 2,048 users or client devices at up to 2 Gbps of firewall throughput or up to 4 Gbps of wired bridged throughput. These capabilities are up to 40 times the client density and 10 times the maximum throughput of typical SD-WAN appliances.For enhanced resiliency and high availability, the 9004 can be clustered together with multiple gateways at each branch.IOT AND INTEGRATION READYThe 9004 includes flexible connectivity options, with four Ethernet ports that can be used as access/WAN uplinks, a USB 3.0 port for cellular third-party connectivity, and an integrated IoT radio that supports Bluetooth 5.The gateway also uses integrated device profiling to improve client visibility, and works with Aruba ClearPass Policy Manager or ClearPass Device Insight to provide advanced user, device and IoT policy management and insights.ARUBA 9004 SERIES GATEWAYSVersatile and cost-effective branch networkingSD-WAN DEPLOYMENTFor organizations that are now managing multiple WAN connections, 9004 Gateways can be connected to Aruba’s SD-WAN fabric right out of the box. SD-WAN is a rich WAN management solution that is used to simplify management of traffic entering and exiting branch sites. Please refer to the SD-WAN datasheet for more information.MOBILITY CONTROLLER DEPLOYMENT9004 Gateways can also be re-purposed as Mobility Controllers with ArubaOS 8.5 software to provide wireless LAN services. In this mode, the 9004 cannot simultaneously be used for SD-WAN. For more information, refer to the ArubaOS datasheet .KEY FEATURES• Cloud-managed and purpose-built for branch SD-WAN requirements.• Up to 10 times the performance and 40 times the client density of typical SD-WAN appliances.• Unified policy enforcement for wired and wireless traffic through Aruba Dynamic Segmentation.• Visibility into over 3,000 applications with no addedhardware.UNIFIED POLICY ENFORCEMENTTo simplify and better secure wired and wireless network access, the 9004 Series Gateways are a pivotal componentin Aruba’s Dynamic Segmentation framework. Wired and wireless traffic can be tunneled to a gateway, which then provides consistent policy enforcement based on user role (e.g. guest, contractor, departmental employee), device type, application or network location. Learn about Dynamic Segmentation in this guide.MICROSOFT FEATURESAruba’s integration with Microsoft enables unique application intelligence that detects Microsoft 365, Teams, and Skype for Business traffic and then prioritizes them over less critical applications. Through management interfaces on Aruba Central, ArubaOS, and Aruba AirWave, IT can visualize call quality metrics such as MOS, latency, jitter, and packet loss for additional insights.ENHANCED CAPABILITIESPolicy Enforcement FirewallThe 9004 includes a Layer 4-7 stateful firewall with PEF to deliver a consistent user, device, and application awareness across WLAN, LAN, and WAN. When deployed alongside Aruba ClearPass Policy Manager, policies are automatically enforced to simplify SSID, VLAN and policy management. This is the foundation of Dynamic Segmentation, and is included within the Foundation, Foundation Base Capacity, and PEF licenses. Application visibility and controlDeep Packet Inspection (DPI) technology, which is a component of PEF, consistently evaluates and optimizes performance and usage policies for over 3,000 applications. This ensures the highest possible Quality of Service (QoS) – even for encrypted traffic.High AvailabilityThe 9004 can be deployed with N+1 or NxN redundancy, and can also join a controller cluster when deployed as a Mobility Controller managed by Mobility Master. This increases performance and scale for enhanced resiliency.Simple to use, mobile provisioningAllows on-site personnel to use a mobile app to onboard gateways. A central IT team can verify device location, licenses, and status with no additional steps required. Available for iOS and Android.Unified Communications and Collaboration (UCC) Visualize and troubleshoot networks based on call quality metrics such as MOS, latency jitter and packet loss. Supported applications include: Teams, Skype for Business, Wi-Fi Calling, FaceTime, SIP, Jabber, Spark and more.TECHNICAL SPECIFICATIONS*These modes are only enabled when the appropriate minimum licenses and ArubaOS firmware are deployed:• SD-WAN Mode – Aruba Central Foundation, Foundation Base, or Advanced Licenses• Controller Mode – ArubaOS Licenses1 LED utilized by the SD-WAN Gateway solution2 1RU can support two 9004 gateways side-by-side using an optional mount kit.SERVICE AND WARRANTY INFORMATION• Hardware: 1 year parts/ labor, can be extended with support contract• Mobility Controller Software: 90 days, can be extended with support contract • SD-WAN Gateway Software: 1, 3, 5, 7, or 10 year subscription licenses.For additional information on the Aruba 9004 Series Gateways, please refer to:• 9004 Series Ordering Guide• SD-WAN Datasheet• ArubaOS Datasheet3 1RU can support two 9004 gateways side-by-side using an optional mount kit.。
Chr4目标市场营销战略
PPT文档演模板
Chr4目标市场营销战略
•三、市场细分的原理与理论依据
• 1.市场细分的原理
• 2.市场细分的理论依据
•
1)同质偏好(homogeneous demand )
•
2) 分散偏好(diffused demand )
•
3)集群偏好(clustered demand )
PPT文档演模板
Chr4目标市场营销战略
打算满足的具有某一需求的顾客群体。企业在选择目标市场时有五种可
供参考的市场覆盖模式,见图 8 一 4。
PPT文档演模板
Chr4目标市场营销战略
• 1.市场集中化: 这是一种最简单的目标市场模式,即企业只选取一 个• 细分4市.市场,场只专生业产化一(类m产a品rke,t s供pe应cia某l )一单:一企的业顾专客门群经,营进满行足集某中一营顾销客。群体 • 2.选择专业化(selected coverage ):企业选取若干个具有良好的盈利 潜需要力的和结各构种吸产引品力。,且符合企业的目标和资源条件细分市场作为目标市场。
Chr4目标市场营销战略
PPT文档演模板
2020/10/30
Chr4目标市场营销战略
•第一节 市场细分
•一、市场细分战略的产生与发展
• 1.含义:市场细分(market segmentation ): 企业根据
自身条件和营销意图,以顾客需求的某些特征或变量为依据,区分具有不 同需求的顾客群体的过程。
•四、选择目标市场战略的条件 • 1.企业能力:是指企业在生产、技术、销售、管理和资金等方
面力量的总和。
• 2.产品的同质性:用户很难区别产品品质上的差异。 • 3.产品寿命周期阶段:不同阶段分别采用无差异、差异和集
第7讲 国际市场细分50
国家市场一般划分
每一个国家就是一个细分市场
3、混合市场细分
把国家市场特别大、值得特别重视的市场给予特别市场区隔,把 若干类似的小市场国家拼成一个市场来经营。
厄瓜多尔
秘鲁
哥伦比亚
玻利维亚
七、国际市场宏观细分
(一)国际市场宏观细分过程
1.确定划分世界市场的方法(即确定细分标准);
第二是可接近性,指企业可以达到并服务于该市场的程度。
例如有些产品即可民用又可军用,因此可以分成两个子市场。但在有 些国家,军用产品只能由国家军工企业生产,一般企业就难以接近军用 品市场,因而划分出军用这一子市场就没有价值。
第三是足量性,即子市场的规模应足够大,企业服务该子市场应能获得 足够多的利润。因为服务于子市场就失去了规模经济效益,子市场不够 大,就难以抵偿提高了的成本。
海盗电脑
专门的游戏PC 神奇的外型 独立显卡 内存8G
(二)工业品市场的细分标准
1.地理位置。 2. 用户性质。如生产企业、中间商、政府部门等。
3.用户规模。如大客户、中等客户、小客户等。 4.用户要求。如经济型、质量型、方便利等。 5.购买方式。如购买频率、支付方式等。
汽车市场
客车市场
大客车 市场
小轿车 市场
豪华轿车 市场
廉价轿车市场 (目标市场)
舒肤佳洗手液
2004年5、6月份,舒肤佳在广东推出两款500ml健康除菌洗手液
“蓝月亮 、 北开米”
整个洗手液市场规模仍然很小,即使在SARS时期,市场容量也仅 在5亿至8亿元之间
六、国际市场细分的分类
先进行国家细分 还是 先进行非国家细分?
基于NODE-UNet++和标记分水岭算法的红细胞图像分割
第37卷第9期2022年9月Vol.37No.9Sept.2022液晶与显示Chinese Journal of Liquid Crystals and Displays基于NODE-UNet++和标记分水岭算法的红细胞图像分割荣亚琪1,2,张丽娟2,崔金利3,苏伟4,盖梦野1*(1.吉林农业大学信息技术学院,吉林长春130118;2.长春工业大学计算机科学与工程学院,吉林长春130012;3.长春中医药大学附属医院医药影像科,吉林长春130000;4.长春中医药大学医药信息学院,吉林长春130117)摘要:对血液涂片图像中的红细胞进行精确分割是一项重要的技术,也是一个难题,主要是因为红细胞经常重叠,没有明显边界。
针对此问题,本文提出一种基于U-Net++和神经常微分方程(Neural Ordinary Differential Equations,NODE)的深度学习网络NODE-UNet++用于红细胞的初步分割,再利用标记分水岭算法分割血液涂片图像中的粘连红细胞。
首先对图像进行裁剪和标注,突出待分割区域;然后应用新的语义分割体系结构NODE-UNet++对预处理后的图像进行初始分割得到概率灰度图;最后采用标记分水岭算法将灰度图中的粘连红细胞分离,得到最终红细胞分割结果图。
实验结果表明,Dice系数达到96.89%、平均像素准确率达到98.97%、平均交并比达到96.33%。
通过对不同血液涂片图像的分割结果表明,该方法能高效精确地提取每个红细胞,满足后续红细胞图像处理的需求。
关键词:图像分割;红细胞;神经常微分方程;标记分水岭算法中图分类号:TP391文献标识码:A doi:10.37188/CJLCD.2022-0009Red blood cell image segmentation based onNODE-UNet++and marker watershedRONG Ya-qi1,2,ZHANG Li-juan2,CUI Jin-li3,SU Wei4,GAI Meng-ye1*(1.School of Information Technology,Jilin Agricultural University,Changchun130118,China;2.School of Computer Science and Engineering,Changchun University of Technology,Changchun130012,China;3.Department of Radiology,Affiliated Hospital of Changchun University of Chinese Medicine,Changchun130000,China;4.College of Medicine Information,Changchun University of Chinese Medicine,Changchun130117,China)Abstract:Accurate segmentation of red blood cell(RBC)from blood smear images is an important technique 文章编号:1007-2780(2022)09-1190-09收稿日期:2022-01-15;修订日期:2022-02-25.基金项目:国家自然科学基金(No.61806024);吉林省教育厅科学研究项目(No.JJKH20210747KJ,No.JJKH20200678KJ);吉林省生态环境科学研究项目(No.202107)Supported by National Natural Science Foundation of China(No.61806024);Scientific Research Project ofJilin Provincial Department of Education(No.JJKH20210747KJ,No.JJKH20200678KJ);Ecological Envi‐ronment Scientific Research Project of Jilin(No.202107)*通信联系人,E-mail:mengyeg@第9期荣亚琪,等:基于NODE-UNet++和标记分水岭算法的红细胞图像分割and a difficult problem,mainly because RBCs often overlap and have no distinct boundaries.To solve this problem,a deep learning network called NODE-UNet++is proposed,which is based on U-Net++and neural ordinary differential equations(NODE).It is mainly used for pre-segmentation of RBCs,and then the marker watershed algorithm is adopted to segment clustered RBCs from blood smear images.Firstly,an image is clipped and labeled to highlight the region to be segmented.Then,a new semantic segmentation architecture NODE-UNet++is applied for pre-segmentation of the preprocessed image to obtain the probability grayscale image.Finally,the marker watershed method is used to separate the clustered RBCs in the grayscale image to obtain final RBC segmentation result.The experimental results show that the Dice similarly coefficient is96.89%,the mean pixel accuracy is98.97%,and the mean intersection over union is 96.33%.Segmentation results of different blood smear images show that the proposed method can extract each RBC efficiently and accurately to meet the requirements of subsequent RBC image processing.Key words:image segmentation;red blood cell;neural ordinary differential equations;marked water‐shed algorithm1引言红细胞是血液中最丰富的血细胞,其主要作用是运输氧气和一部分二氧化碳[1]。
彩色扫描地形图的分色方法
色彩 聚类
直方图 模糊FCM 聚类
加壳变换
提细 线划 分版图 提粗 线划
扫描图
去背景
主色图像与边 缘图像联合
障碍 距离变换
Canny算 子检测
提边缘
图像预分色
分版图提取
分成黑棕兰绿四大类。 3)上述两步只是对要素的主色像素进行了划分,尚未 处理边缘的过渡像素。利用 Canny 算子进行边缘检测, 提取要素的边界; 利用距离变换的方法找到要素的主色 像素与边界之间的像素,并进行颜色聚类,最终实 _________________________________________
图 1 本文地形图分色算法
2
总体思路
作者简介:陈飞(1979-) ,男,广西北流人,工程师,2007 年 获得武汉大学博士学位,主要从事“3S”技术研究。 E-mail:chenfei_welcome@ 收稿日期:2009-07-29
在前人研究的基础上, 本文提出一种新的地形图分 色算法,如图 1 所示。 1)扫描地形图上存在大量背景像素,为了提高分 色速度,通过灰度直方图方法找到图像分割的阈值, 过 滤掉背景像素。 2)在已去除背景像素的基础上,把要素点的颜色 转化成均匀的 Lab 颜色空间。 根据像素灰度的梯度值找 出要素的主色像素,从而去掉要素边缘过渡色的干扰; 然后利用直方图模糊 FCM 方法对主色像素进行聚类,
现地形图的分色。
[10]
S0
S1 S2
图 2 细线划的周围区域
3
预分色
对地形图进行分色, 可避免一般图像聚类方法引起的 少量颜色分色不当的问题,也加快了算法的收敛速度, 同时模糊隶属度的运用有利于对地形图上大量过渡色 彩的分辨。 定义 1:设 I f (i, j ) ,i=0,1,„M-1, j=0,1,„N-1,是 M× N 的数字图像阵列, f (i, j ) {0,1,2,, G 1} 是在阵列位 置 (i, j ) 处像元的灰度值, 整数 G>0 是图像 I 的灰度级, 定义 I 的一维灰度统计直方图函数为:
项目三:确定目标市场
1.同质偏好(Homogeneous Preferences) 指所有顾客对产品的各种属性表现 出大致相同的偏好。 2.分散偏好(Diffused Preferences) 指不同的顾客对产品的不同属性 表现出不同的偏好
奶 油
甜份
奶 油
甜份
3. 集群偏好(Clustered Preferences) 指某些顾客对产品的某些属性表 现出相同的偏好,其他顾客对其他 属性表现出相同的偏好,形成若干 个偏好群组。
指按照消费者的年龄、性别、家庭规模、家庭生命周期、 收入、职业、教育程度、种族、宗教信仰等变量对市场 进行细分。
随堂案例 1
日本资生堂公司根据女性消费者的年龄, 将化妆品市场分为四个子市场: 15-17岁,妙龄,讲究打扮,追求时髦, 以单一化妆品为主要消费; 18-24岁,积极消费,只要满意,不惜价 格; 25-34岁,化妆是日常习惯; 25-34岁,单一品种。
5、初步为细分市场命名。 6、仔细分析各子市场:顾客特点、购买行 为、企业实际情况等。 7、决定每个细分市场的规模。
即选择市场潜力大的、获利机会多的市场为目
标市场。
四、市场细分原则
想想:福特公司为何放弃这一市场细分市场?
• 福特汽车公司曾经打算专门为1.2m以下 的侏儒生产特制汽车,如特殊的产品设 计、与大众化汽车生产不同的生产线及 工装设备,这必然造成成本的大量增加, 但更好地满足了特殊消费者的需求。通 过市场调研与细分后,发现这一汽车细 分市场的需求极其有限,人口较少,盈 利前景暗淡,最终放弃了这一构想。
从以上例子我们可以看出,细分市场并不 是越细越好,有效的市场细分应该满足以 下要求: 1、市场可衡量性
市场大小或市场规模可以度量
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Clustered segmentationsAristides Gionis Heikki Mannila Evimaria T erziHIIT,Basic Research UnitDepartment of Computer ScienceUniversity of Helsinki,Finlandlastname@cs.helsinki.fiABSTRACTThe problem of sequence and time-series segmentation has been discussed widely and it has been applied successfully in a variety of areas,including computational genomics,data analysis for scientific applications,and telecommunications. In many of these areas the sequences involved are multi-dimensional,and the goal of the segmentation is to discover sequence segments with small variability.One of the charac-teristics of existing techniques is that they force all dimen-sions to share the same segment boundaries,yet,it is often reasonable to assume that different dimensions are more cor-related than others,and that concrete and meaningful states are associated only with a subset of dimensions.In this paper we study the problem of segmenting a multi-dimensional sequence when the dimensions of the sequence are allowed to form clusters and be segmented separately within each cluster.We demonstrate the relevance of this problem to many data-mining applications.We discuss the connection of our setting with existing work,we show the hardness of the suggested problem,and we propose a num-ber of algorithms for its solution.Finally,we give empirical evidence showing that our algorithms work well in practice and produce useful results.1.INTRODUCTIONMethods for segmenting sequence and time-series data have been discussed widely in the area of data mining.The goal of the segmentation problem is to discover sequence segments with small variability,and segmentation algorithms have been a key to compact representation and knowledge discov-ery in sequential data.A variety of segmentation algorithms have been proposed and they have been used successfully in many application areas,including computational genomics, data analysis for scientific applications,and telecommunica-tions[8,10,14,17].In the next paragraphs we discuss three concrete examples in which different kinds of segmentation methods have been applied effectively;these examples also motivate the work presented in this paper.Example1.Himberg et al.[10]demonstrated the applica-bility of sequence-segmentation algorithms for the problem of“context awareness”in the area of mobile communica-tions.The notion of context awareness can be a very pow-erful cue for improving the friendliness of mobile devices.As a few examples consider the situations where a mobile de-vice adjusts automatically the ring tone,the audio volume, the screen font size,and other controls depending on where the users are located,what they are doing at that time, who else is around,what is the current noise or tempera-ture level,and numerous other such context variables.The approach followed by Himberg et al.[10]is to infer context information from sensors attached to mobile devices:sen-sors for acceleration,noise level,temperature,luminosity, etc.Measurements from these sensors form naturally multi-dimensional time series.“Context”can then be inferred by segmenting the time series and annotating the various segments with concrete state descriptions(for example,if Acceleration≈u0,Noise≈v0,and Illumination≈w0, then State=W alkingInTheStreet).Example2.The problem of discovering recurrent sources in sequences is examined in[8].The idea is that many ge-nomic(or other multivariate)sequences are often assembled by a small number of possible“sources”,each of which might contribute several segments in the sequence.For instance, Azad et al.[2]try to identify a coarse-grained description of a given DNA string in terms of a smaller set of distinct domain labels.The work in[8]extends existing sequence-segmentation algorithms in a way that the resulting seg-ments are associated with a description label,and the same label might appear in several segments of the sequence. Example 3.One of the most important discoveries for the search of structure in genomic sequences is the“block structure”discovery of haplotypes.To explain this notion, consider a collection of DNA sequences over n marker sites (e.g.,SNPs)for a population of p individuals.The“haplo-type block structure”hypothesis states that the sequence of markers can be segmented in blocks,so that,in each block most of the haplotypes in the population fall into a small number of classes.The description of these haplotypes can be used for further knowledge discovery,e.g.,for associating specific blocks with specific genetic-influenced diseases[9].Figure1:An illustrating example.From the computational point of view,the problem of dis-covering haplotype blocks in genetic sequences can be viewed as partitioning a long multidimensional sequence into seg-ments,such that,each segment demonstrates low diversity among the different dimensions.Naturally,segmentation algorithms have been applied to a good effect for this prob-lem[17,6,20,22].In all of the above examples,segmentation algorithms have been employed for discovering the underlying structure of multi-dimensional sequences.The goal is tofind segments with small variability.In previous approaches,however,di-mensions are typically forced to share the same segment boundaries.On the other hand,it is often reasonable to assume that different dimensions are more correlated than others,and that concrete and meaningful states are associ-ated with only small subsets of the dimensions.An illustrating example of the above assumption is shown in Figure1.The input sequence,a four-dimensional time se-ries,is shown in the top box.In the middle box,it is shown a globally optimal segmentation for the input time series when all dimensions share common segment boundaries.In this example,one can see that the global segmentation provides a good description of the sequence—in all dimensions most of the segments can be described fairly well using a con-stant value.However,some of the segments are not quite uniform,while on the other hand there are some segment boundaries introduced in relatively constant pieces of the sequence.These representation problems can be alleviated if one allows different segment boundaries among subsets of dimensions,as shown in the lower box of Figure1.We see that the segmentation after clustering the dimensions in pairs{1,2}and{3,4}gives a“tighter”fit and a more intu-itive description of the sequence,even though the number of segments used for each cluster is smaller than the number of segments used in the global segmentation.In this paper we study the problem of segmenting a multi-dimensional sequence when subsets of dimensions are al-lowed to be clustered and segmented separately from other subsets.We call this problem clustered segmentation. Clustered segmentation can be used to extend and improve the quality of results in all of the applications mentioned in our motivating examples:In the application of context awareness,certain context states might be independent of some sensor readings,therefore,a segmentation based on all sensors simultaneously would be a bad predictor.For the problem of discovering recurrent segments one can imagine situations that sources are associated only with a subset of dimensions,for instance,specific regions in DNA sequences (CpG-islands)are associated with a low-dimensional signal (concentration of C and G bases)out of the whole genomic “vocabulary”.Finally,in the problem of haplotype-block discovery,the sequence dimensions correspond to different individuals of the population,and thus,clustering of the di-mensions would allow the discovery of subpopulation groups with distinct haplotype block structure.The existence of such subpopulations gives rise to a mosaic structure of hap-lotypes,which is a viable biological hypothesis[23,25]. For solving the problem of clustered segmentation,measures and algorithms that give more refined segmentations and describe the data accurately need to be found.In this paper we are making the following contributions:•We define the problem of clustered segmentation,and we demonstrate its relevance with existing data-mining applications.•We demonstrate the hardness of the clustered segmen-tation problem and we develop practical algorithms for its solution.•We perform extensive experimental evaluation on syn-thetically generated and real data sets.Our experi-ments study the behavior of the suggested algorithms.Furthermore,for many cases of real data sets,we show that clustered segmentation can provide a better model for representing and understanding the data.The rest of this paper is organized as follows.In the next Section we define the problem of clustered segmentation in detail and we discuss its connection with other known prob-lems.In Section3we describe a number of algorithms for solving the problem of clustered segmentation,and in Sec-tion4we describe our experiments.Finally,Section5is a short conclusion.2.DESCRIPTION OF THE PROBLEMAs we discussed in the introduction,the goal of this paper is to develop improved measures and algorithms for the prob-lem of multi-dimensional sequence segmentation.We now describe the problem in detail byfirst introducing the nec-essary notation.Let S={s1,...,s d}be a d-dimensional sequence,where s i is the i-th dimension(signal,attribute, individual,etc.).We assume that each dimension is a se-quence of n values,and we denote by s i[u]the value at the u-th position of s i.We write S S to denote that S is asubsequence of S,and s i[u,v]is the subsequence of s i be-tween the positions u and v.Similarly we denote by S[u,v] the subsequence of S between the positions u and v when all dimensions are considered.The positions on each dimen-sion are naturally ordered,for example,in time-series data the order is induced by the time attribute,while in genomic data the order comes from the position of each base in the DNA sequence.In addition,we assume that the dimensions of the sequence are aligned,that is,the values at the u-th position of all dimensions are semantically associated(e.g., they correspond to the same time).We denote byσa k-segmentation of the sequence 1,...,n , that is,a partitioning of 1,...,n into k contiguous and non overlapping segments.As a result,for specifying a k-segmentation it suffices to provide the k−1boundary points. We write a k-segmentation asσ= σ[1],...,σ[k] ,so that we can refer to the individual segments.Becauseσ[t]refers only to a pair of boundary points,we use the notation S|σ[t] to“extract”from S the subsequence between the boundary points ofσ[t].In other words,the boundary points of a segmentation can be viewed as operators that when“ap-plied”to a sequence they provide the actual subsequence segments.The set of all k-segmentations,i.e.,all possible subsets of k−1boundary points out of the n positions,is denoted by S k.For accessing the quality of a segmentation of a sequence S we need a measure of the variability of each segment of S.Let f(S )be such a measure defined for all S S. We typically assume that f is an easily computable cost function.For example,for real-valued sequences,f is often taken to be the variance of the values in S .Now,the quality of a k-segmentationσ= σ[1],...,σ[k] on a sequence S is defined to bef(S|σ)=kt=1f(S|σ[t]),that is,the sum of the cost function f over all segments of S defined byσ.The sequence-segmentation problem is to compute the optimal such k-segmentationσ∗=arg minσ∈S kf(S|σ).For a wide variety of cost functions,a dynamic programming (e.g.,algorithm[3])can be used for computing the optimal k-segmentation.In the above formulation,the number of segments k is given in advance.If k is variable,the trivial n-segmentation can typically achieve a zero cost.Thus,a popular way for al-lowing k to be variable is to add a penalization factor for choosing large values of k,for example,the optimal segmen-tation is now defined to beσ∗=arg mink,σ∈S kf(S|σ)+kγ.A Bayesian approach for selecting the penalization term to make it proportional to the description length[21]of the seg-mentation model.For instance,by assuming that for each segment we need to specify one boundary point and d val-ues(one per dimension),one can chooseγ=(d+1)log(dn). An important observation,however,is that the same dy-namic programming algorithm with no additional time over-head can be used to compute the optimal segmentation for the variable-k version of the problem.In the following and mainly for clarity of exposition,we will only concentrate on thefixed-k version,however,all of our definitions and algorithms can be applied to the variable-k version,as well. We now proceed to define the clustered version of segmen-tation problems,which is the focus of our paper.As we explained in the introduction,the main idea is to allow sub-sets of dimensions to be segmented separately.For the next definition we use the notation s i|σas we used S|σ:the seg-mentationσis applied to the one-dimensional sequence s i as it was applied to the multi-dimensional sequence S;the cost function f(s i|σ)is computed accordingly.Problem 1.Given a d-dimensional sequence S with n values in each dimension,a cost function f defined on all subsequences and all dimensions of S,and integers k and c,compute c different k-segmentationsσ1,...,σc,so as to minimize the sumdi=1min1≤j≤cf(s i|σj).In other words,we seek to partition the sequence dimensions in c clusters,and compute the optimal segmentation in each one of those,in a way that the total error is minimized. As before,one can use the Bayesian approach to define the variable-c version of the problem,where the optimal value for c is sought,but again we assume that the value of c is given.2.1Connections to related workThe clustered segmentation problem,as stated above,is clearly related with the time-series clustering problem[24, 13].Several definitions for time-series similarity have been discussed in the literature,for example,see[4,7,1].A key difference,however,is that in our formulation signals are as-signed to the same cluster if they can be segmented“well”together,while most time-series clustering algorithms base their grouping criteria in a more geometric notion of similar-ity.To emphasize the difference,we note that our methods can be used to segment non-numerical sequences like the SNP data we described in the introduction.The formulation in Problem1suggests that one can con-sider clustered segmentation as a k-median type of problem (e.g.,see[18]).However,a main difficulty with trying to apply k-median algorithms in our setting,is that the space of solutions is extremely large.Furthermore,for k-median algorithms it is often the case that a“discretization”of the solution space can be applied(seek for solutions only among the input points).Assuming the triangle inequality,this dis-cretization degrades the quality of the solution by a factor of at most2.In our setting,however,the solution space(seg-mentations)is different from the input space(sequence),and also many natural distance functions between sequences and segmentations does not form a metric.Finally,our problem is also related with the notion of seg-mentation problems as introduced by Kleinberg et al.[16].In[16],starting from an optimization problem,the“seg-mented”version of that problem is defined by allowing the input to be partitioned in clusters,and considering the best solution for each cluster separately.To be precise,in the ter-minology of[16]our problem should be called“segmented segmentation”since in our case the optimization problem is the traditional segmentation problem.Even when starting from very simple optimization problems,their correspond-ing segmented versions turn out to be hard.2.2Problem complexityIn this subsection we demonstrate the hardness of the clus-tered segmentation problem.In our case,the optimization problem we start with is the traditional non-clustered seg-mentation problem.This problem is solvable in polynomial time when the k-segmentation variance is considered.Not surprisingly the corresponding clustered segmentation is an NP-hard problem.Theorem 1.Clustered segmentation,as defined in Prob-lem1,with real-valued sequences,and cost function f the variance function,is NP-hard.The proof of the theorem can be found in the Appendix. 3.ALGORITHMSIn this section we describe two classes of algorithms for solv-ing the clustered segmentation problem.In thefirst class we define distance measures between sequence segmenta-tions and we employ a standard clustering algorithm(e.g., k-means)on the pair-wise distance matrix.The second class consists of two randomized algorithms that cluster sequences using segmentations as“centroids”.In particular,we use the notion of a distance between a segmentation and a se-quence,which is the error induced to the sequence when the segmentation is applied to it.The algorithms of the second class treat the clustered-segmentation problem as a model selection problem and they try tofind the best model that describes the data.Before proceeding with the description of the algorithms, we briefly review the dynamic-programming algorithm that segments a sequence S into k segments.This algorithm is used by almost all of our methods.The idea of the dynamic programming algorithm is to compute the segmentation in-crementally,for all subsequences S[1,i]with i=1,...,n, and all values for an l-segmentation with l=1,...,k.In particular,the computation of an l-segmentationσfor the subsequence S[1,i]is based on the equationf(S[1,i]|σ)=minτ∈S l−1,1≤j<if(S[1,j]|τ)+f(S[j+1,i]).The running time of the dynamic programming algorithm is O(n2dkF(n)),where F(t)is the time required to compute the function f on a subsequence of length t.In the case that f is the variance function,the computation can be done in constant time(by precomputing the sum of values and the sum of squares of values of all prefixes of the sequence),so the running time of the dynamic programming algorithm is O(n2dk).3.1Distance-based clustering of segmentations In this section we define distance functions between the seg-mentations of two sequences.Such distance functions can be used to construct a pair-wise distance matrix between the dimensions of the sequence.The distance matrix is then used for clustering the dimensions via a standard distance-based clustering algorithm;in our case,we use standard the k-means algorithm.k-means despite its limitations as a hill-climbing algorithm that is not guaranteed to converge to a global optimum,is mainly used because it is efficient and it works very well in practice.Variations of the k-means algorithm have been proposed for time-series clustering,for example in[24].The main idea of the k-means algorithm is the following: Given N points that need to be clustered and a distance function d between them,the algorithm starts by selecting k random points as cluster centers and assigning the rest of the N−k points to the closest cluster center,according to d. In that way k clusters are formed.Within each cluster the mean of the points defining the cluster is evaluated and the process continues iteratively with those means as the new cluster centers,until convergence.The two distance functions defined here are rather intuitive and simple.Thefirst one,D E,is based on the mutual ex-change of optimal segmentations of the two sequences and the evaluation of the additional error such an exchange in-troduces.Therefore,two sequences are then similar if the best segmentation of the one describes“well”the second, and vice versa.The second distance function,D P,is proba-bilistic and it defines the distance between two sequences by comparing the probabilities of each position in the sequence being a segmentation boundary.3.1.1Distance as a measure offit of optimal segmen-tationsThe goal of our clustering is to cluster together dimensions in such a way that similarly segmented dimensions are put in the same cluster,while the overall cost of the clustered segmentation to be minimized.Intuitively this means that a distance function should perform well if it quantifies how well the optimal segmentation of the one sequence describes the other one and vice versa.Based on exactly this notion of “exchange”of optimal segmentations of sequences,we define the distance function D E in the following way.Given two dimensions s i,s j and their corresponding optimal k-segmentationsσ∗i,σ∗j∈S k we define the distance of s i fromσ∗j denoted by D E(s i,σ∗i|σ∗j)as follows:D E(s i,σ∗i|σ∗j)=f(s i|σ∗j)−f(s i|σ∗i)However in order for the distance between two sequences and their corresponding segmentations to be symmetric we alternatively use the following symmetric definition of D E:D E(s i,σ∗i,s j,σ∗j)=D E(s i,σ∗i|σ∗j)+D E(s j,σ∗j|σ∗i)3.1.2Probabilistic distanceDistance function D P is based on comparing two dimen-sions by comparing the probability distributions of theirpoints being segment boundaries.1For each dimension s i with1≤i≤d we associate a probability distribution p i. The value of p i[t]at point t with1≤t≤n corresponds to the probability of the t-th point of the series being a segment boundary.Associating a probability distribution with each sequence,allows us to define the distance between two se-quences as the variational distance between the correspond-ing distributions.Therefore,we define the distance function D P(s i,s j)between the dimensions s i and s j as follows:D P(s i,s j)=Var(p i,p j)=1≤t≤n|p i[t]−p j[t]|(1)Computing the probabilities of segment boundaries for a given sequence and given the required k number of segments, can be done in O(n2k)time using dynamic programming. Consider a dimension s of our d-dimensional sequence.Let the probability that there is a segment boundary at point t ofs,when segmented using k-segments.Denote by S(t)k the setof all k-segmentations having a boundary at point t.Then we are interested in the probability of any segmentation from the set S(t)kgiven the sequence s:p(S(t)k|s)=σ∈S(t)kp(σ|s).Since S k refers to the set of all k-segmentations of the full sequence s,the above equation can be rewritten as follows:p(S(t)k |s)=σ ∈S(t)kp(σ ,s) Z2−[a,b]∈σf(s[a,b]|σ).In the above equation Z is a normalizing constant that can-cels out.For building the dynamic programming equations we need to define the following entity for a segmentationσ:q(a,b)=2−f(s[a,b]|σ).Additionally,for any interval[t,t ]and considering segmen-tations consisting of i-segments(where1≤i≤k)we define: Q i(t,t )=σ∈S i[t,t ][a,b]∈σq(a,b).Since S k[t,t+1]is equal to the Cartesian product S i[1,t]×S k−i[t+1,n]for1≤i≤k we have that:p(S(t)k|s)=1≤i≤kQ i(1,t)Q k−i(t+1,n)1For a similar development see[17].Using the above equations we can compute the probabilities of each point being a segment boundary in each one of the d dimensions of S.Pairwise distances between two dimensions are evaluated using Equation(1).3.2Non-distance based clustering of segmen-tationsIn this section we describe two algorithms that treat clus-tered segmentation as a model-selection problem.Thefirst algorithm,SamplSegm,is a sampling algorithm and it is motivated by the theoretical work of Kleinberg et al.[16], Indyk[11,12]and Charikar et al.[5].The second,Iter-ClustSegm,is an adaptation of the popular k-means algo-rithm.Both algorithms are simple and intuitive and they perform well in practice.3.2.1The SamplSegm algorithmThe basic idea behind the SamplSegm approach is the intu-ition that if the data exhibit clustered structure,then a small sample of the data would exhibit the same structure.The reason is that for large clusters in the data set one would ex-pect to sample enough data,so that similar clusters appear in the sampled data.On the other hand,one can possibly afford to miss data from small clusters in the sampling pro-cess,because small clusters do not contribute much in the overall error function.Our algorithm is motivated by the work of Kleinberg et al.[16],in which a sampling algorithm for the segmented version of the catalog problem is proposed. Similar ideas have been used successfully by Indyk[11,12] for the problem of clustering in metric spaces.For the clustered segmentation problem we adopt a natural sampling-based technique:Wefirst sample uniformly at ran-dom a small set A of r log d dimensions,where r is a small constant.Then we search exhaustively all possible parti-tions of A into c clusters A1,...,A c.For each cluster A j we find the optimal segmentationσj∈S k for the sequence S on the dimensions that are associated with A j.The rest of the dimensions s i that are not included in the sample,are assigned to the set j that minimizes the error f(s i|σj).The partition of the sample set A that causes the least error is considered to be the solution found for the set A.The whole sampling process is repeated with different sample sets A for a small number of times(in our experiments3times)and the best result is reported as the output of the sampling algorithm.When the size of the sample set is logarithmic in the number of dimensions,the overall running time of the algorithm is polynomial.In our experiments,we found that the method is accurate for data sets of moderate size,but it does not scale well for larger data sets.3.2.2The IterClustSegm algorithmThe IterClustSegm algorithm is an adaptation of the widely-used k-means algorithm where the cluster means are re-placed by the common segmentation of the dimensions in the cluster and the distances of a sequence to the cluster is the error induced when the cluster’s segmentation is applied to the sequence.Therefore in our case,the c centers correspond to c differentsegmentations.The algorithm is iterative and at the t-th iteration step it keeps an estimate for the solution segmen-tationsσt1,...,σt c,which is to be refined in the consecutive steps.The algorithm starts with a random clustering of the dimensions,and it computes the optimal k-segmentation for each cluster.At the(t+1)-th iteration step,each dimen-sion s i is assigned to the segmentationσt j for which the error f(s i|σt j)is minimized.Based on the newly obtained clustersof dimensions,new segmentationsσt+11,...,σt+1care com-puted,and the process continues until there is no more im-provement in the error.The complexity of the algorithm is O(I(cd+cP(n,d))),where I is the number of iterations until convergence,and P(n,d)is the complexity of segmenting a sequence of length n and d dimensions.4.EXPERIMENTSIn this section we describe the experiments we performed in order to evaluate the validity of the clustered segmentation model and the behavior of the suggested algorithms.For our experiments we used both synthetically generated data, as well as real data consisting of time series and genomic sequences.For the synthetic data we report that in all cases the true underlying model,used to generate the data,is found.For the real data we found that in all cases clustered segmentations,output by the proposed algorithms,produce better models than the models produced by non-clustered segmentations.4.1Ensuring fairness in model comparisons In the experimental results shown in this section we report the accuracy in terms of errors.Our intention is to consider the error as a measure of comparing models:a smaller error indicates a better model.However,this is the case when the compared models have the same number of parameters. It would be unfair to compare the errors induced by two models with different number of parameters,because in this case the trivial model of each point described by itself would induce the the least error and would be the best. Therefore,to make the comparison between two different models fair,we are taking care to ensure that the same number of parameters is used in both models.We briefly describe the methodology we followed in order to guarantee fairness in comparing the different models.Consider a k-segmentation for a d-dimensional sequence S of length n.If no clustering of dimensions is considered,the number of pa-rameters that are necessary to describe this k-segmentation model is k(d+1).This number comes from the fact that we can describe the model by specifying the starting point and the d mean values—one for each dimension—for each one of the k segments.Consider now a clustered segmen-tation of the sequence with c clusters and k segments for each cluster.The number of parameters for this model isd+ci=1k (d i+1)=d+k (d+c),since,in addition tospecifying the starting points and the values for each clus-ter,we also need d parameters to indicate the cluster that each dimension belongs to.In our experiments,in order to compare the errors induced by the two models we select parameters so that k(d+1)=d+k (d+c).4.2Experiments on synthetic dataWefirst describe our experiments on synthetic data.For the purpose of this experiment,we have generated sequence data from a known model,and the task is to test if the suggested algorithms are able to discover that model.The data we used were generated as follows:the d dimen-sions of the generated sequence were divided in advance into c clusters.For each cluster we select k segment boundaries, which are common for all the dimensions in that cluster,and for the j-th segment of the i-th dimension we select a mean valueµij,which is uniformly distributed in[0,1].Points are then generated by adding a noise value sampled from the normal distribution N(µij,σ2).An example of a small data set generated by this method is shown in Figure1.For our experiments wefixed the values n=1000points,k=10seg-ments,and d=200dimensions.We created different data sets using c=2,...,6clusters and with standard deviations varying from0.005to0.16.The results for the synthetically generated data are shown in Figure2.One can see that the errors of the reported clus-tered segmentation models are typically very low for all of our algorithms.In most of the cases all proposed methods approach the true error value.Here we report the results for small sample sizes(usually(c+4)samples with c being the number of clusters).Since our algorithms are random-ized we repeat each one of them for5times and report the best found solution.Apart from the errors induced by the proposed algorithms,thefigures include also two additional errors.The error induced by the non-clustered segmenta-tion model with the same number of parameters and the error induced by the true model that has been used for gen-erating the data(“ground-truth”).Thefirst one is always much larger than the error induced by the models reported by our algorithms.In all the comparisons between the dif-ferent segmentation models we take into consideration the fairness criterion discussed in the previous subsection.As indicated in Figure2(a)the difference in errors becomes smaller as the standard deviation increases.This is natu-ral since as standard deviation increases all dimensions tend to become uniform and the segment structure disappears. The error of the clustered segmentation model for different number of clusters is shown in Figure2(b).The better per-formance of the clustered model is apparent.Notice that the error caused by the non-clustered segmentation is an order of magnitude larger than the corresponding clustered segmentation results and thus omitted from the plot.4.3Experiments on time-series dataNext,we tested the behavior of the clustered segmentation model on real time-series data sets obtained by the UCR time-series data mining archive[15].We used the phone and the spot。