Enumerating global states of a distributed computationin lexicographic and breadth-first ma
地理信息系统导论课后习题英文版8
Chapter 8 Review Questions1. Explain the difference between location errors and topological errors. Location errors such as missing polygons or distorted lines relate to the geometric inaccuracies of spatial features, whereas topological errors such as dangling lines and unclosed polygons relate to the logical inconsistencies between spatial features.2. What are the primary data sources for digitizing?Global positioning systems (GPS) and remote sensing imagery provide the primary data sources for digitizing. These data sources can bypass printed maps and the practice of various methods of map generalization.3. A digitized map from a secondary data source such as a USGS quadrangle map is subject to more location errors than a primary data source. Why?A secondary data source such as a USGS quadrangle map is subject to more location errors because the map has undergone simplification, generalization, and other practices during the mapmaking process.4. Although the U.S. National Map Accuracy Standard adopted in 1947 is still printed on USGS quadrangle maps, the standard is not really applicable to GIS data. Why?A GIS uses digital spatial data, which can be easily manipulated and output to any scale. The U.S. National Map Accuracy Standard, on the other hand, is scale dependent.5. According to the new National Standard for Spatial Data Accuracy, a geospatial data producer is encouraged to report a RMS statistic associated with a data set. In general terms, how does one interpret and use the RMS statistic? One can first multiply the RMS by 1.7308 to get the NSSDA (National Standard for Spatial Data Accuracy) statistic. This statistic represents the standard error of the mean at the 95 percent confidence level. In other words, one can be sure that, 95 percent of the time, the accuracy of a point or a line is within the NSSDA statistic.6. Suppose a point location is recorded as (575729.0, 5228382) in data set 1 and (575729.64, 5228382.11) in data set 2. Which data set has a higher data precision? In practical terms, what does the difference in data precision in this case mean?Data set 2 has a higher data precision than data set 1. If the measurement unit is meters, the recording of a point location in data set 2 is down to one hundredth of a meter and the recording in data set 1 is rounded off at meters.7. The ArcGIS Desktop Help has a poster illustrating topology rules in the geodatabase data model (ArcGIS Desktop Help > Editing in ArcMap > Editing Topology > Topology rules). View the poster. Can you think of an example (other than those on the poster) that can use the polygon rule of “Must be covered by feature class of”?[The poster illustrates the polygon rule with the example of “States are covered by counties.” By extension, counties must be covered by census tracts, census tracts by block groups, and block groups by blocks.]8. Give an example (other than those on the poster) that can use the polygon rule of “Must not overlap with.”[The poster illustrates the polygon rule with the example of “Lakes and land parcels from two different feature classes must not overlap.” The Census Administrative Boundaries Data Model poster that can be downloaded from the ESRI website has two other examples: the feature class of American Indian Reservation must not overlap with the feature class of Place:city, and the feature class of American Indian Reservation must not overlap with the feature class of Place:town.]9. Give an example (other than those on the poster) that can use the line rule of “Must not intersect or touch interior.”[The poster illustrates the line rule with the example of “Lot lines cannot intersect or overlap and must connect to one another only at the endpoint of each line feature.”Like lot lines, road center lines cannot intersect or overlap.]10. Use a diagram to illustrate how a large nodesnap for editing can alter the shape of line features.Node a is supposed to be snapped to node b. But a large nodesnap can snap node a to node c instead.11. Use a diagram to illustrate how a large cluster tolerance for editing can alter the shape of line features.A large cluster tolerance can incorrectly snap the two lines in the center of the diagram indicating a small stream channel.12. Explain the difference between a dangling node and a pseudo node.A dangling node is at the end of a dangling arc, whereas a pseudo node appears along a continuous line and divides the line unnecessarily into separate lines.13. What is a map topology?A map topology is a temporary set of topological relationships between the parts of features that are supposed to be coincident.14. Describe the three basic steps in using a topology rule.Step 1: create a new topology by defining the participating feature classes, the ranks for each feature class, the topology rule(s), and a cluster tolerance.Step 2: evaluate the topology rule and creates errors indicating those features that have violated the topology rule.Step 3: fix errors or accepts errors as exceptions.15. Some nontopological editing operations can create features from existing features. Give two examples of such operations.[Examples include merge features, buffer features, union features, and intersect features.]16. Edgematching requires a source layer and a target layer. Explain the difference between these two types of layers.Features, typically vertices, on the source layer are moved to match those on the target layer during the edgematching process.17. The Douglas-Peucker algorithm typically produces simplified lines with sharp angles. Why?The Douglas-Peucker algorithm connects trend lines to create simplified lines. Because the trend lines are straight lines, they form sharp angles when connected.。
Reservoir Computing Approaches toRecurrent Neural Network Training
1. Introduction Artificial recurrent neural networks (RNNs) represent a large and varied class of computational models that are designed by more or less detailed analogy with biological brain modules. In an RNN numerous abstract neurons (also called units or processing elements ) are interconnected by likewise abstracted synaptic connections (or links ), which enable activations to propagate through the network. The characteristic feature of RNNs that distinguishes them from the more widely used feedforward neural networks is that the connection topology possesses cycles. The existence of cycles has a profound impact: • An RNN may develop a self-sustained temporal activation dynamics along its recurrent connection pathways, even in the absence of input. Mathematically, this renders an RNN to be a dynamical system, while feedforward networks are functions. • If driven by an input signal, an RNN preserves in its internal state a nonlinear transformation of the input history — in other words, it has a dynamical memory, and is able to process temporal context information. This review article concerns a particular subset of RNN-based research in two aspects: • RNNs are used for a variety of scientific purposes, and at least two major classes of RNN models exist: they can be used for purposes of modeling biological brains, or as engineering tools for technical applications. The first usage belongs to the field of computational neuroscience, while the second
机器学习专业词汇中英文对照
机器学习专业词汇中英⽂对照activation 激活值activation function 激活函数additive noise 加性噪声autoencoder ⾃编码器Autoencoders ⾃编码算法average firing rate 平均激活率average sum-of-squares error 均⽅差backpropagation 后向传播basis 基basis feature vectors 特征基向量batch gradient ascent 批量梯度上升法Bayesian regularization method 贝叶斯规则化⽅法Bernoulli random variable 伯努利随机变量bias term 偏置项binary classfication ⼆元分类class labels 类型标记concatenation 级联conjugate gradient 共轭梯度contiguous groups 联通区域convex optimization software 凸优化软件convolution 卷积cost function 代价函数covariance matrix 协⽅差矩阵DC component 直流分量decorrelation 去相关degeneracy 退化demensionality reduction 降维derivative 导函数diagonal 对⾓线diffusion of gradients 梯度的弥散eigenvalue 特征值eigenvector 特征向量error term 残差feature matrix 特征矩阵feature standardization 特征标准化feedforward architectures 前馈结构算法feedforward neural network 前馈神经⽹络feedforward pass 前馈传导fine-tuned 微调first-order feature ⼀阶特征forward pass 前向传导forward propagation 前向传播Gaussian prior ⾼斯先验概率generative model ⽣成模型gradient descent 梯度下降Greedy layer-wise training 逐层贪婪训练⽅法grouping matrix 分组矩阵Hadamard product 阿达马乘积Hessian matrix Hessian 矩阵hidden layer 隐含层hidden units 隐藏神经元Hierarchical grouping 层次型分组higher-order features 更⾼阶特征highly non-convex optimization problem ⾼度⾮凸的优化问题histogram 直⽅图hyperbolic tangent 双曲正切函数hypothesis 估值,假设identity activation function 恒等激励函数IID 独⽴同分布illumination 照明inactive 抑制independent component analysis 独⽴成份分析input domains 输⼊域input layer 输⼊层intensity 亮度/灰度intercept term 截距KL divergence 相对熵KL divergence KL分散度k-Means K-均值learning rate 学习速率least squares 最⼩⼆乘法linear correspondence 线性响应linear superposition 线性叠加line-search algorithm 线搜索算法local mean subtraction 局部均值消减local optima 局部最优解logistic regression 逻辑回归loss function 损失函数low-pass filtering 低通滤波magnitude 幅值MAP 极⼤后验估计maximum likelihood estimation 极⼤似然估计mean 平均值MFCC Mel 倒频系数multi-class classification 多元分类neural networks 神经⽹络neuron 神经元Newton’s method ⽜顿法non-convex function ⾮凸函数non-linear feature ⾮线性特征norm 范式norm bounded 有界范数norm constrained 范数约束normalization 归⼀化numerical roundoff errors 数值舍⼊误差numerically checking 数值检验numerically reliable 数值计算上稳定object detection 物体检测objective function ⽬标函数off-by-one error 缺位错误orthogonalization 正交化output layer 输出层overall cost function 总体代价函数over-complete basis 超完备基over-fitting 过拟合parts of objects ⽬标的部件part-whole decompostion 部分-整体分解PCA 主元分析penalty term 惩罚因⼦per-example mean subtraction 逐样本均值消减pooling 池化pretrain 预训练principal components analysis 主成份分析quadratic constraints ⼆次约束RBMs 受限Boltzman机reconstruction based models 基于重构的模型reconstruction cost 重建代价reconstruction term 重构项redundant 冗余reflection matrix 反射矩阵regularization 正则化regularization term 正则化项rescaling 缩放robust 鲁棒性run ⾏程second-order feature ⼆阶特征sigmoid activation function S型激励函数significant digits 有效数字singular value 奇异值singular vector 奇异向量smoothed L1 penalty 平滑的L1范数惩罚Smoothed topographic L1 sparsity penalty 平滑地形L1稀疏惩罚函数smoothing 平滑Softmax Regresson Softmax回归sorted in decreasing order 降序排列source features 源特征sparse autoencoder 消减归⼀化Sparsity 稀疏性sparsity parameter 稀疏性参数sparsity penalty 稀疏惩罚square function 平⽅函数squared-error ⽅差stationary 平稳性(不变性)stationary stochastic process 平稳随机过程step-size 步长值supervised learning 监督学习symmetric positive semi-definite matrix 对称半正定矩阵symmetry breaking 对称失效tanh function 双曲正切函数the average activation 平均活跃度the derivative checking method 梯度验证⽅法the empirical distribution 经验分布函数the energy function 能量函数the Lagrange dual 拉格朗⽇对偶函数the log likelihood 对数似然函数the pixel intensity value 像素灰度值the rate of convergence 收敛速度topographic cost term 拓扑代价项topographic ordered 拓扑秩序transformation 变换translation invariant 平移不变性trivial answer 平凡解under-complete basis 不完备基unrolling 组合扩展unsupervised learning ⽆监督学习variance ⽅差vecotrized implementation 向量化实现vectorization ⽮量化visual cortex 视觉⽪层weight decay 权重衰减weighted average 加权平均值whitening ⽩化zero-mean 均值为零Letter AAccumulated error backpropagation 累积误差逆传播Activation Function 激活函数Adaptive Resonance Theory/ART ⾃适应谐振理论Addictive model 加性学习Adversarial Networks 对抗⽹络Affine Layer 仿射层Affinity matrix 亲和矩阵Agent 代理 / 智能体Algorithm 算法Alpha-beta pruning α-β剪枝Anomaly detection 异常检测Approximation 近似Area Under ROC Curve/AUC Roc 曲线下⾯积Artificial General Intelligence/AGI 通⽤⼈⼯智能Artificial Intelligence/AI ⼈⼯智能Association analysis 关联分析Attention mechanism 注意⼒机制Attribute conditional independence assumption 属性条件独⽴性假设Attribute space 属性空间Attribute value 属性值Autoencoder ⾃编码器Automatic speech recognition ⾃动语⾳识别Automatic summarization ⾃动摘要Average gradient 平均梯度Average-Pooling 平均池化Letter BBackpropagation Through Time 通过时间的反向传播Backpropagation/BP 反向传播Base learner 基学习器Base learning algorithm 基学习算法Batch Normalization/BN 批量归⼀化Bayes decision rule 贝叶斯判定准则Bayes Model Averaging/BMA 贝叶斯模型平均Bayes optimal classifier 贝叶斯最优分类器Bayesian decision theory 贝叶斯决策论Bayesian network 贝叶斯⽹络Between-class scatter matrix 类间散度矩阵Bias 偏置 / 偏差Bias-variance decomposition 偏差-⽅差分解Bias-Variance Dilemma 偏差 – ⽅差困境Bi-directional Long-Short Term Memory/Bi-LSTM 双向长短期记忆Binary classification ⼆分类Binomial test ⼆项检验Bi-partition ⼆分法Boltzmann machine 玻尔兹曼机Bootstrap sampling ⾃助采样法/可重复采样/有放回采样Bootstrapping ⾃助法Break-Event Point/BEP 平衡点Letter CCalibration 校准Cascade-Correlation 级联相关Categorical attribute 离散属性Class-conditional probability 类条件概率Classification and regression tree/CART 分类与回归树Classifier 分类器Class-imbalance 类别不平衡Closed -form 闭式Cluster 簇/类/集群Cluster analysis 聚类分析Clustering 聚类Clustering ensemble 聚类集成Co-adapting 共适应Coding matrix 编码矩阵COLT 国际学习理论会议Committee-based learning 基于委员会的学习Competitive learning 竞争型学习Component learner 组件学习器Comprehensibility 可解释性Computation Cost 计算成本Computational Linguistics 计算语⾔学Computer vision 计算机视觉Concept drift 概念漂移Concept Learning System /CLS 概念学习系统Conditional entropy 条件熵Conditional mutual information 条件互信息Conditional Probability Table/CPT 条件概率表Conditional random field/CRF 条件随机场Conditional risk 条件风险Confidence 置信度Confusion matrix 混淆矩阵Connection weight 连接权Connectionism 连结主义Consistency ⼀致性/相合性Contingency table 列联表Continuous attribute 连续属性Convergence 收敛Conversational agent 会话智能体Convex quadratic programming 凸⼆次规划Convexity 凸性Convolutional neural network/CNN 卷积神经⽹络Co-occurrence 同现Correlation coefficient 相关系数Cosine similarity 余弦相似度Cost curve 成本曲线Cost Function 成本函数Cost matrix 成本矩阵Cost-sensitive 成本敏感Cross entropy 交叉熵Cross validation 交叉验证Crowdsourcing 众包Curse of dimensionality 维数灾难Cut point 截断点Cutting plane algorithm 割平⾯法Letter DData mining 数据挖掘Data set 数据集Decision Boundary 决策边界Decision stump 决策树桩Decision tree 决策树/判定树Deduction 演绎Deep Belief Network 深度信念⽹络Deep Convolutional Generative Adversarial Network/DCGAN 深度卷积⽣成对抗⽹络Deep learning 深度学习Deep neural network/DNN 深度神经⽹络Deep Q-Learning 深度 Q 学习Deep Q-Network 深度 Q ⽹络Density estimation 密度估计Density-based clustering 密度聚类Differentiable neural computer 可微分神经计算机Dimensionality reduction algorithm 降维算法Directed edge 有向边Disagreement measure 不合度量Discriminative model 判别模型Discriminator 判别器Distance measure 距离度量Distance metric learning 距离度量学习Distribution 分布Divergence 散度Diversity measure 多样性度量/差异性度量Domain adaption 领域⾃适应Downsampling 下采样D-separation (Directed separation)有向分离Dual problem 对偶问题Dummy node 哑结点Dynamic Fusion 动态融合Dynamic programming 动态规划Letter EEigenvalue decomposition 特征值分解Embedding 嵌⼊Emotional analysis 情绪分析Empirical conditional entropy 经验条件熵Empirical entropy 经验熵Empirical error 经验误差Empirical risk 经验风险End-to-End 端到端Energy-based model 基于能量的模型Ensemble learning 集成学习Ensemble pruning 集成修剪Error Correcting Output Codes/ECOC 纠错输出码Error rate 错误率Error-ambiguity decomposition 误差-分歧分解Euclidean distance 欧⽒距离Evolutionary computation 演化计算Expectation-Maximization 期望最⼤化Expected loss 期望损失Exploding Gradient Problem 梯度爆炸问题Exponential loss function 指数损失函数Extreme Learning Machine/ELM 超限学习机Letter FFactorization 因⼦分解False negative 假负类False positive 假正类False Positive Rate/FPR 假正例率Feature engineering 特征⼯程Feature selection 特征选择Feature vector 特征向量Featured Learning 特征学习Feedforward Neural Networks/FNN 前馈神经⽹络Fine-tuning 微调Flipping output 翻转法Fluctuation 震荡Forward stagewise algorithm 前向分步算法Frequentist 频率主义学派Full-rank matrix 满秩矩阵Functional neuron 功能神经元Letter GGain ratio 增益率Game theory 博弈论Gaussian kernel function ⾼斯核函数Gaussian Mixture Model ⾼斯混合模型General Problem Solving 通⽤问题求解Generalization 泛化Generalization error 泛化误差Generalization error bound 泛化误差上界Generalized Lagrange function ⼴义拉格朗⽇函数Generalized linear model ⼴义线性模型Generalized Rayleigh quotient ⼴义瑞利商Generative Adversarial Networks/GAN ⽣成对抗⽹络Generative Model ⽣成模型Generator ⽣成器Genetic Algorithm/GA 遗传算法Gibbs sampling 吉布斯采样Gini index 基尼指数Global minimum 全局最⼩Global Optimization 全局优化Gradient boosting 梯度提升Gradient Descent 梯度下降Graph theory 图论Ground-truth 真相/真实Letter HHard margin 硬间隔Hard voting 硬投票Harmonic mean 调和平均Hesse matrix 海塞矩阵Hidden dynamic model 隐动态模型Hidden layer 隐藏层Hidden Markov Model/HMM 隐马尔可夫模型Hierarchical clustering 层次聚类Hilbert space 希尔伯特空间Hinge loss function 合页损失函数Hold-out 留出法Homogeneous 同质Hybrid computing 混合计算Hyperparameter 超参数Hypothesis 假设Hypothesis test 假设验证Letter IICML 国际机器学习会议Improved iterative scaling/IIS 改进的迭代尺度法Incremental learning 增量学习Independent and identically distributed/i.i.d. 独⽴同分布Independent Component Analysis/ICA 独⽴成分分析Indicator function 指⽰函数Individual learner 个体学习器Induction 归纳Inductive bias 归纳偏好Inductive learning 归纳学习Inductive Logic Programming/ILP 归纳逻辑程序设计Information entropy 信息熵Information gain 信息增益Input layer 输⼊层Insensitive loss 不敏感损失Inter-cluster similarity 簇间相似度International Conference for Machine Learning/ICML 国际机器学习⼤会Intra-cluster similarity 簇内相似度Intrinsic value 固有值Isometric Mapping/Isomap 等度量映射Isotonic regression 等分回归Iterative Dichotomiser 迭代⼆分器Letter KKernel method 核⽅法Kernel trick 核技巧Kernelized Linear Discriminant Analysis/KLDA 核线性判别分析K-fold cross validation k 折交叉验证/k 倍交叉验证K-Means Clustering K – 均值聚类K-Nearest Neighbours Algorithm/KNN K近邻算法Knowledge base 知识库Knowledge Representation 知识表征Letter LLabel space 标记空间Lagrange duality 拉格朗⽇对偶性Lagrange multiplier 拉格朗⽇乘⼦Laplace smoothing 拉普拉斯平滑Laplacian correction 拉普拉斯修正Latent Dirichlet Allocation 隐狄利克雷分布Latent semantic analysis 潜在语义分析Latent variable 隐变量Lazy learning 懒惰学习Learner 学习器Learning by analogy 类⽐学习Learning rate 学习率Learning Vector Quantization/LVQ 学习向量量化Least squares regression tree 最⼩⼆乘回归树Leave-One-Out/LOO 留⼀法linear chain conditional random field 线性链条件随机场Linear Discriminant Analysis/LDA 线性判别分析Linear model 线性模型Linear Regression 线性回归Link function 联系函数Local Markov property 局部马尔可夫性Local minimum 局部最⼩Log likelihood 对数似然Log odds/logit 对数⼏率Logistic Regression Logistic 回归Log-likelihood 对数似然Log-linear regression 对数线性回归Long-Short Term Memory/LSTM 长短期记忆Loss function 损失函数Letter MMachine translation/MT 机器翻译Macron-P 宏查准率Macron-R 宏查全率Majority voting 绝对多数投票法Manifold assumption 流形假设Manifold learning 流形学习Margin theory 间隔理论Marginal distribution 边际分布Marginal independence 边际独⽴性Marginalization 边际化Markov Chain Monte Carlo/MCMC 马尔可夫链蒙特卡罗⽅法Markov Random Field 马尔可夫随机场Maximal clique 最⼤团Maximum Likelihood Estimation/MLE 极⼤似然估计/极⼤似然法Maximum margin 最⼤间隔Maximum weighted spanning tree 最⼤带权⽣成树Max-Pooling 最⼤池化Mean squared error 均⽅误差Meta-learner 元学习器Metric learning 度量学习Micro-P 微查准率Micro-R 微查全率Minimal Description Length/MDL 最⼩描述长度Minimax game 极⼩极⼤博弈Misclassification cost 误分类成本Mixture of experts 混合专家Momentum 动量Moral graph 道德图/端正图Multi-class classification 多分类Multi-document summarization 多⽂档摘要Multi-layer feedforward neural networks 多层前馈神经⽹络Multilayer Perceptron/MLP 多层感知器Multimodal learning 多模态学习Multiple Dimensional Scaling 多维缩放Multiple linear regression 多元线性回归Multi-response Linear Regression /MLR 多响应线性回归Mutual information 互信息Letter NNaive bayes 朴素贝叶斯Naive Bayes Classifier 朴素贝叶斯分类器Named entity recognition 命名实体识别Nash equilibrium 纳什均衡Natural language generation/NLG ⾃然语⾔⽣成Natural language processing ⾃然语⾔处理Negative class 负类Negative correlation 负相关法Negative Log Likelihood 负对数似然Neighbourhood Component Analysis/NCA 近邻成分分析Neural Machine Translation 神经机器翻译Neural Turing Machine 神经图灵机Newton method ⽜顿法NIPS 国际神经信息处理系统会议No Free Lunch Theorem/NFL 没有免费的午餐定理Noise-contrastive estimation 噪⾳对⽐估计Nominal attribute 列名属性Non-convex optimization ⾮凸优化Nonlinear model ⾮线性模型Non-metric distance ⾮度量距离Non-negative matrix factorization ⾮负矩阵分解Non-ordinal attribute ⽆序属性Non-Saturating Game ⾮饱和博弈Norm 范数Normalization 归⼀化Nuclear norm 核范数Numerical attribute 数值属性Letter OObjective function ⽬标函数Oblique decision tree 斜决策树Occam’s razor 奥卡姆剃⼑Odds ⼏率Off-Policy 离策略One shot learning ⼀次性学习One-Dependent Estimator/ODE 独依赖估计On-Policy 在策略Ordinal attribute 有序属性Out-of-bag estimate 包外估计Output layer 输出层Output smearing 输出调制法Overfitting 过拟合/过配Oversampling 过采样Letter PPaired t-test 成对 t 检验Pairwise 成对型Pairwise Markov property 成对马尔可夫性Parameter 参数Parameter estimation 参数估计Parameter tuning 调参Parse tree 解析树Particle Swarm Optimization/PSO 粒⼦群优化算法Part-of-speech tagging 词性标注Perceptron 感知机Performance measure 性能度量Plug and Play Generative Network 即插即⽤⽣成⽹络Plurality voting 相对多数投票法Polarity detection 极性检测Polynomial kernel function 多项式核函数Pooling 池化Positive class 正类Positive definite matrix 正定矩阵Post-hoc test 后续检验Post-pruning 后剪枝potential function 势函数Precision 查准率/准确率Prepruning 预剪枝Principal component analysis/PCA 主成分分析Principle of multiple explanations 多释原则Prior 先验Probability Graphical Model 概率图模型Proximal Gradient Descent/PGD 近端梯度下降Pruning 剪枝Pseudo-label 伪标记Letter QQuantized Neural Network 量⼦化神经⽹络Quantum computer 量⼦计算机Quantum Computing 量⼦计算Quasi Newton method 拟⽜顿法Letter RRadial Basis Function/RBF 径向基函数Random Forest Algorithm 随机森林算法Random walk 随机漫步Recall 查全率/召回率Receiver Operating Characteristic/ROC 受试者⼯作特征Rectified Linear Unit/ReLU 线性修正单元Recurrent Neural Network 循环神经⽹络Recursive neural network 递归神经⽹络Reference model 参考模型Regression 回归Regularization 正则化Reinforcement learning/RL 强化学习Representation learning 表征学习Representer theorem 表⽰定理reproducing kernel Hilbert space/RKHS 再⽣核希尔伯特空间Re-sampling 重采样法Rescaling 再缩放Residual Mapping 残差映射Residual Network 残差⽹络Restricted Boltzmann Machine/RBM 受限玻尔兹曼机Restricted Isometry Property/RIP 限定等距性Re-weighting 重赋权法Robustness 稳健性/鲁棒性Root node 根结点Rule Engine 规则引擎Rule learning 规则学习Letter SSaddle point 鞍点Sample space 样本空间Sampling 采样Score function 评分函数Self-Driving ⾃动驾驶Self-Organizing Map/SOM ⾃组织映射Semi-naive Bayes classifiers 半朴素贝叶斯分类器Semi-Supervised Learning 半监督学习semi-Supervised Support Vector Machine 半监督⽀持向量机Sentiment analysis 情感分析Separating hyperplane 分离超平⾯Sigmoid function Sigmoid 函数Similarity measure 相似度度量Simulated annealing 模拟退⽕Simultaneous localization and mapping 同步定位与地图构建Singular Value Decomposition 奇异值分解Slack variables 松弛变量Smoothing 平滑Soft margin 软间隔Soft margin maximization 软间隔最⼤化Soft voting 软投票Sparse representation 稀疏表征Sparsity 稀疏性Specialization 特化Spectral Clustering 谱聚类Speech Recognition 语⾳识别Splitting variable 切分变量Squashing function 挤压函数Stability-plasticity dilemma 可塑性-稳定性困境Statistical learning 统计学习Status feature function 状态特征函Stochastic gradient descent 随机梯度下降Stratified sampling 分层采样Structural risk 结构风险Structural risk minimization/SRM 结构风险最⼩化Subspace ⼦空间Supervised learning 监督学习/有导师学习support vector expansion ⽀持向量展式Support Vector Machine/SVM ⽀持向量机Surrogat loss 替代损失Surrogate function 替代函数Symbolic learning 符号学习Symbolism 符号主义Synset 同义词集Letter TT-Distribution Stochastic Neighbour Embedding/t-SNE T – 分布随机近邻嵌⼊Tensor 张量Tensor Processing Units/TPU 张量处理单元The least square method 最⼩⼆乘法Threshold 阈值Threshold logic unit 阈值逻辑单元Threshold-moving 阈值移动Time Step 时间步骤Tokenization 标记化Training error 训练误差Training instance 训练⽰例/训练例Transductive learning 直推学习Transfer learning 迁移学习Treebank 树库Tria-by-error 试错法True negative 真负类True positive 真正类True Positive Rate/TPR 真正例率Turing Machine 图灵机Twice-learning ⼆次学习Letter UUnderfitting ⽋拟合/⽋配Undersampling ⽋采样Understandability 可理解性Unequal cost ⾮均等代价Unit-step function 单位阶跃函数Univariate decision tree 单变量决策树Unsupervised learning ⽆监督学习/⽆导师学习Unsupervised layer-wise training ⽆监督逐层训练Upsampling 上采样Letter VVanishing Gradient Problem 梯度消失问题Variational inference 变分推断VC Theory VC维理论Version space 版本空间Viterbi algorithm 维特⽐算法Von Neumann architecture 冯 · 诺伊曼架构Letter WWasserstein GAN/WGAN Wasserstein⽣成对抗⽹络Weak learner 弱学习器Weight 权重Weight sharing 权共享Weighted voting 加权投票法Within-class scatter matrix 类内散度矩阵Word embedding 词嵌⼊Word sense disambiguation 词义消歧Letter ZZero-data learning 零数据学习Zero-shot learning 零次学习Aapproximations近似值arbitrary随意的affine仿射的arbitrary任意的amino acid氨基酸amenable经得起检验的axiom公理,原则abstract提取architecture架构,体系结构;建造业absolute绝对的arsenal军⽕库assignment分配algebra线性代数asymptotically⽆症状的appropriate恰当的Bbias偏差brevity简短,简洁;短暂broader⼴泛briefly简短的batch批量Cconvergence 收敛,集中到⼀点convex凸的contours轮廓constraint约束constant常理commercial商务的complementarity补充coordinate ascent同等级上升clipping剪下物;剪报;修剪component分量;部件continuous连续的covariance协⽅差canonical正规的,正则的concave⾮凸的corresponds相符合;相当;通信corollary推论concrete具体的事物,实在的东西cross validation交叉验证correlation相互关系convention约定cluster⼀簇centroids 质⼼,形⼼converge收敛computationally计算(机)的calculus计算Dderive获得,取得dual⼆元的duality⼆元性;⼆象性;对偶性derivation求导;得到;起源denote预⽰,表⽰,是…的标志;意味着,[逻]指称divergence 散度;发散性dimension尺度,规格;维数dot⼩圆点distortion变形density概率密度函数discrete离散的discriminative有识别能⼒的diagonal对⾓dispersion分散,散开determinant决定因素disjoint不相交的Eencounter遇到ellipses椭圆equality等式extra额外的empirical经验;观察ennmerate例举,计数exceed超过,越出expectation期望efficient⽣效的endow赋予explicitly清楚的exponential family指数家族equivalently等价的Ffeasible可⾏的forary初次尝试finite有限的,限定的forgo摒弃,放弃fliter过滤frequentist最常发⽣的forward search前向式搜索formalize使定形Ggeneralized归纳的generalization概括,归纳;普遍化;判断(根据不⾜)guarantee保证;抵押品generate形成,产⽣geometric margins⼏何边界gap裂⼝generative⽣产的;有⽣产⼒的Hheuristic启发式的;启发法;启发程序hone怀恋;磨hyperplane超平⾯Linitial最初的implement执⾏intuitive凭直觉获知的incremental增加的intercept截距intuitious直觉instantiation例⼦indicator指⽰物,指⽰器interative重复的,迭代的integral积分identical相等的;完全相同的indicate表⽰,指出invariance不变性,恒定性impose把…强加于intermediate中间的interpretation解释,翻译Jjoint distribution联合概率Llieu替代logarithmic对数的,⽤对数表⽰的latent潜在的Leave-one-out cross validation留⼀法交叉验证Mmagnitude巨⼤mapping绘图,制图;映射matrix矩阵mutual相互的,共同的monotonically单调的minor较⼩的,次要的multinomial多项的multi-class classification⼆分类问题Nnasty讨厌的notation标志,注释naïve朴素的Oobtain得到oscillate摆动optimization problem最优化问题objective function⽬标函数optimal最理想的orthogonal(⽮量,矩阵等)正交的orientation⽅向ordinary普通的occasionally偶然的Ppartial derivative偏导数property性质proportional成⽐例的primal原始的,最初的permit允许pseudocode伪代码permissible可允许的polynomial多项式preliminary预备precision精度perturbation 不安,扰乱poist假定,设想positive semi-definite半正定的parentheses圆括号posterior probability后验概率plementarity补充pictorially图像的parameterize确定…的参数poisson distribution柏松分布pertinent相关的Qquadratic⼆次的quantity量,数量;分量query疑问的Rregularization使系统化;调整reoptimize重新优化restrict限制;限定;约束reminiscent回忆往事的;提醒的;使⼈联想…的(of)remark注意random variable随机变量respect考虑respectively各⾃的;分别的redundant过多的;冗余的Ssusceptible敏感的stochastic可能的;随机的symmetric对称的sophisticated复杂的spurious假的;伪造的subtract减去;减法器simultaneously同时发⽣地;同步地suffice满⾜scarce稀有的,难得的split分解,分离subset⼦集statistic统计量successive iteratious连续的迭代scale标度sort of有⼏分的squares平⽅Ttrajectory轨迹temporarily暂时的terminology专⽤名词tolerance容忍;公差thumb翻阅threshold阈,临界theorem定理tangent正弦Uunit-length vector单位向量Vvalid有效的,正确的variance⽅差variable变量;变元vocabulary词汇valued经估价的;宝贵的Wwrapper包装分类:。
Multicamera People Tracking with a Probabilistic Occupancy Map
Multicamera People Tracking witha Probabilistic Occupancy MapFranc¸ois Fleuret,Je´roˆme Berclaz,Richard Lengagne,and Pascal Fua,Senior Member,IEEE Abstract—Given two to four synchronized video streams taken at eye level and from different angles,we show that we can effectively combine a generative model with dynamic programming to accurately follow up to six individuals across thousands of frames in spite of significant occlusions and lighting changes.In addition,we also derive metrically accurate trajectories for each of them.Our contribution is twofold.First,we demonstrate that our generative model can effectively handle occlusions in each time frame independently,even when the only data available comes from the output of a simple background subtraction algorithm and when the number of individuals is unknown a priori.Second,we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences,provided that a reasonable heuristic is used to rank these individuals and that we avoid confusing them with one another.Index Terms—Multipeople tracking,multicamera,visual surveillance,probabilistic occupancy map,dynamic programming,Hidden Markov Model.Ç1I NTRODUCTIONI N this paper,we address the problem of keeping track of people who occlude each other using a small number of synchronized videos such as those depicted in Fig.1,which were taken at head level and from very different angles. This is important because this kind of setup is very common for applications such as video surveillance in public places.To this end,we have developed a mathematical framework that allows us to combine a robust approach to estimating the probabilities of occupancy of the ground plane at individual time steps with dynamic programming to track people over time.This results in a fully automated system that can track up to six people in a room for several minutes by using only four cameras,without producing any false positives or false negatives in spite of severe occlusions and lighting variations. As shown in Fig.2,our system also provides location estimates that are accurate to within a few tens of centimeters, and there is no measurable performance decrease if as many as20percent of the images are lost and only a small one if 30percent are.This involves two algorithmic steps:1.We estimate the probabilities of occupancy of theground plane,given the binary images obtained fromthe input images via background subtraction[7].Atthis stage,the algorithm only takes into accountimages acquired at the same time.Its basic ingredientis a generative model that represents humans assimple rectangles that it uses to create synthetic idealimages that we would observe if people were at givenlocations.Under this model of the images,given thetrue occupancy,we approximate the probabilities ofoccupancy at every location as the marginals of aproduct law minimizing the Kullback-Leibler diver-gence from the“true”conditional posterior distribu-tion.This allows us to evaluate the probabilities ofoccupancy at every location as the fixed point of alarge system of equations.2.We then combine these probabilities with a color and amotion model and use the Viterbi algorithm toaccurately follow individuals across thousands offrames[3].To avoid the combinatorial explosion thatwould result from explicitly dealing with the jointposterior distribution of the locations of individuals ineach frame over a fine discretization,we use a greedyapproach:we process trajectories individually oversequences that are long enough so that using areasonable heuristic to choose the order in which theyare processed is sufficient to avoid confusing peoplewith each other.In contrast to most state-of-the-art algorithms that recursively update estimates from frame to frame and may therefore fail catastrophically if difficult conditions persist over several consecutive frames,our algorithm can handle such situations since it computes the global optima of scores summed over many frames.This is what gives it the robustness that Fig.2demonstrates.In short,we combine a mathematically well-founded generative model that works in each frame individually with a simple approach to global optimization.This yields excellent performance by using basic color and motion models that could be further improved.Our contribution is therefore twofold.First,we demonstrate that a generative model can effectively handle occlusions at each time frame independently,even when the input data is of very poor quality,and is therefore easy to obtain.Second,we show that multiperson tracking can be reliably achieved by processing individual trajectories separately over long sequences.. F.Fleuret,J.Berclaz,and P.Fua are with the Ecole Polytechnique Fe´de´ralede Lausanne,Station14,CH-1015Lausanne,Switzerland.E-mail:{francois.fleuret,jerome.berclaz,pascal.fua}@epfl.ch..R.Lengagne is with GE Security-VisioWave,Route de la Pierre22,1024Ecublens,Switzerland.E-mail:richard.lengagne@.Manuscript received14July2006;revised19Jan.2007;accepted28Mar.2007;published online15May2007.Recommended for acceptance by S.Sclaroff.For information on obtaining reprints of this article,please send e-mail to:tpami@,and reference IEEECS Log Number TPAMI-0521-0706.Digital Object Identifier no.10.1109/TPAMI.2007.1174.0162-8828/08/$25.00ß2008IEEE Published by the IEEE Computer SocietyIn the remainder of the paper,we first briefly review related works.We then formulate our problem as estimat-ing the most probable state of a hidden Markov process and propose a model of the visible signal based on an estimate of an occupancy map in every time frame.Finally,we present our results on several long sequences.2R ELATED W ORKState-of-the-art methods can be divided into monocular and multiview approaches that we briefly review in this section.2.1Monocular ApproachesMonocular approaches rely on the input of a single camera to perform tracking.These methods provide a simple and easy-to-deploy setup but must compensate for the lack of 3D information in a single camera view.2.1.1Blob-Based MethodsMany algorithms rely on binary blobs extracted from single video[10],[5],[11].They combine shape analysis and tracking to locate people and maintain appearance models in order to track them,even in the presence of occlusions.The Bayesian Multiple-BLob tracker(BraMBLe)system[12],for example,is a multiblob tracker that generates a blob-likelihood based on a known background model and appearance models of the tracked people.It then uses a particle filter to implement the tracking for an unknown number of people.Approaches that track in a single view prior to computing correspondences across views extend this approach to multi camera setups.However,we view them as falling into the same category because they do not simultaneously exploit the information from multiple views.In[15],the limits of the field of view of each camera are computed in every other camera from motion information.When a person becomes visible in one camera,the system automatically searches for him in other views where he should be visible.In[4],a background/foreground segmentation is performed on calibrated images,followed by human shape extraction from foreground objects and feature point selection extraction. Feature points are tracked in a single view,and the system switches to another view when the current camera no longer has a good view of the person.2.1.2Color-Based MethodsTracking performance can be significantly increased by taking color into account.As shown in[6],the mean-shift pursuit technique based on a dissimilarity measure of color distributions can accurately track deformable objects in real time and in a monocular context.In[16],the images are segmented pixelwise into different classes,thus modeling people by continuously updated Gaussian mixtures.A standard tracking process is then performed using a Bayesian framework,which helps keep track of people,even when there are occlusions.In such a case,models of persons in front keep being updated, whereas the system stops updating occluded ones,which may cause trouble if their appearances have changed noticeably when they re-emerge.More recently,multiple humans have been simulta-neously detected and tracked in crowded scenes[20]by using Monte-Carlo-based methods to estimate their number and positions.In[23],multiple people are also detected and tracked in front of complex backgrounds by using mixture particle filters guided by people models learned by boosting.In[9],multicue3D object tracking is addressed by combining particle-filter-based Bayesian tracking and detection using learned spatiotemporal shapes.This ap-proach leads to impressive results but requires shape, texture,and image depth information as input.Finally, Smith et al.[25]propose a particle-filtering scheme that relies on Markov chain Monte Carlo(MCMC)optimization to handle entrances and departures.It also introduces a finer modeling of interactions between individuals as a product of pairwise potentials.2.2Multiview ApproachesDespite the effectiveness of such methods,the use of multiple cameras soon becomes necessary when one wishes to accurately detect and track multiple people and compute their precise3D locations in a complex environment. Occlusion handling is facilitated by using two sets of stereo color cameras[14].However,in most approaches that only take a set of2D views as input,occlusion is mainly handled by imposing temporal consistency in terms of a motion model,be it Kalman filtering or more general Markov models.As a result,these approaches may not always be able to recover if the process starts diverging.2.2.1Blob-Based MethodsIn[19],Kalman filtering is applied on3D points obtained by fusing in a least squares sense the image-to-world projections of points belonging to binary blobs.Similarly,in[1],a Kalman filter is used to simultaneously track in2D and3D,and objectFig.1.Images from two indoor and two outdoor multicamera video sequences that we use for our experiments.At each time step,we draw a box around people that we detect and assign to them an ID number that follows them throughout thesequence.Fig.2.Cumulative distributions of the position estimate error on a3,800-frame sequence(see Section6.4.1for details).locations are estimated through trajectory prediction during occlusion.In[8],a best hypothesis and a multiple-hypotheses approaches are compared to find people tracks from 3D locations obtained from foreground binary blobs ex-tracted from multiple calibrated views.In[21],a recursive Bayesian estimation approach is used to deal with occlusions while tracking multiple people in multiview.The algorithm tracks objects located in the intersections of2D visual angles,which are extracted from silhouettes obtained from different fixed views.When occlusion ambiguities occur,multiple occlusion hypotheses are generated,given predicted object states and previous hypotheses,and tested using a branch-and-merge strategy. The proposed framework is implemented using a customized particle filter to represent the distribution of object states.Recently,Morariu and Camps[17]proposed a method based on dimensionality reduction to learn a correspondence between the appearance of pedestrians across several views. This approach is able to cope with the severe occlusion in one view by exploiting the appearance of the same pedestrian on another view and the consistence across views.2.2.2Color-Based MethodsMittal and Davis[18]propose a system that segments,detects, and tracks multiple people in a scene by using a wide-baseline setup of up to16synchronized cameras.Intensity informa-tion is directly used to perform single-view pixel classifica-tion and match similarly labeled regions across views to derive3D people locations.Occlusion analysis is performed in two ways:First,during pixel classification,the computa-tion of prior probabilities takes occlusion into account. Second,evidence is gathered across cameras to compute a presence likelihood map on the ground plane that accounts for the visibility of each ground plane point in each view. Ground plane locations are then tracked over time by using a Kalman filter.In[13],individuals are tracked both in image planes and top view.The2D and3D positions of each individual are computed so as to maximize a joint probability defined as the product of a color-based appearance model and2D and 3D motion models derived from a Kalman filter.2.2.3Occupancy Map MethodsRecent techniques explicitly use a discretized occupancy map into which the objects detected in the camera images are back-projected.In[2],the authors rely on a standard detection of stereo disparities,which increase counters associated to square areas on the ground.A mixture of Gaussians is fitted to the resulting score map to estimate the likely location of individuals.This estimate is combined with a Kallman filter to model the motion.In[26],the occupancy map is computed with a standard visual hull procedure.One originality of the approach is to keep for each resulting connex component an upper and lower bound on the number of objects that it can contain. Based on motion consistency,the bounds on the various components are estimated at a certain time frame based on the bounds of the components at the previous time frame that spatially intersect with it.Although our own method shares many features with these techniques,it differs in two important respects that we will highlight:First,we combine the usual color and motion models with a sophisticated approach based on a generative model to estimating the probabilities of occu-pancy,which explicitly handles complex occlusion interac-tions between detected individuals,as will be discussed in Section5.Second,we rely on dynamic programming to ensure greater stability in challenging situations by simul-taneously handling multiple frames.3P ROBLEM F ORMULATIONOur goal is to track an a priori unknown number of people from a few synchronized video streams taken at head level. In this section,we formulate this problem as one of finding the most probable state of a hidden Markov process,given the set of images acquired at each time step,which we will refer to as a temporal frame.We then briefly outline the computation of the relevant probabilities by using the notations summarized in Tables1and2,which we also use in the following two sections to discuss in more details the actual computation of those probabilities.3.1Computing the Optimal TrajectoriesWe process the video sequences by batches of T¼100frames, each of which includes C images,and we compute the most likely trajectory for each individual.To achieve consistency over successive batches,we only keep the result on the first 10frames and slide our temporal window.This is illustrated in Fig.3.We discretize the visible part of the ground plane into a finite number G of regularly spaced2D locations and we introduce a virtual hidden location H that will be used to model entrances and departures from and into the visible area.For a given batch,let L t¼ðL1t;...;L NÃtÞbe the hidden stochastic processes standing for the locations of individuals, whether visible or not.The number NÃstands for the maximum allowable number of individuals in our world.It is large enough so that conditioning on the number of visible ones does not change the probability of a new individual entering the scene.The L n t variables therefore take values in f1;...;G;Hg.Given I t¼ðI1t;...;I C tÞ,the images acquired at time t for 1t T,our task is to find the values of L1;...;L T that maximizePðL1;...;L T j I1;...;I TÞ:ð1ÞAs will be discussed in Section 4.1,we compute this maximum a posteriori in a greedy way,processing one individual at a time,including the hidden ones who can move into the visible scene or not.For each one,the algorithm performs the computation,under the constraint that no individual can be at a visible location occupied by an individual already processed.In theory,this approach could lead to undesirable local minima,for example,by connecting the trajectories of two separate people.However,this does not happen often because our batches are sufficiently long.To further reduce the chances of this,we process individual trajectories in an order that depends on a reliability score so that the most reliable ones are computed first,thereby reducing the potential for confusion when processing the remaining ones. This order also ensures that if an individual remains in the hidden location,then all the other people present in the hidden location will also stay there and,therefore,do not need to be processed.FLEURET ET AL.:MULTICAMERA PEOPLE TRACKING WITH A PROBABILISTIC OCCUPANCY MAP269Our experimental results show that our method does not suffer from the usual weaknesses of greedy algorithms such as a tendency to get caught in bad local minima.We thereforebelieve that it compares very favorably to stochastic optimization techniques in general and more specifically particle filtering,which usually requires careful tuning of metaparameters.3.2Stochastic ModelingWe will show in Section 4.2that since we process individual trajectories,the whole approach only requires us to define avalid motion model P ðL n t þ1j L nt ¼k Þand a sound appearance model P ðI t j L n t ¼k Þ.The motion model P ðL n t þ1j L nt ¼k Þ,which will be intro-duced in Section 4.3,is a distribution into a disc of limited radiusandcenter k ,whichcorresponds toalooseboundonthe maximum speed of a walking human.Entrance into the scene and departure from it are naturally modeled,thanks to the270IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.30,NO.2,FEBRUARY 2008TABLE 2Notations (RandomQuantities)Fig.3.Video sequences are processed by batch of 100frames.Only the first 10percent of the optimization result is kept and the rest is discarded.The temporal window is then slid forward and the optimiza-tion is repeated on the new window.TABLE 1Notations (DeterministicQuantities)hiddenlocation H,forwhichweextendthemotionmodel.The probabilities to enter and to leave are similar to the transition probabilities between different ground plane locations.In Section4.4,we will show that the appearance model PðI t j L n t¼kÞcan be decomposed into two terms.The first, described in Section4.5,is a very generic color-histogram-based model for each individual.The second,described in Section5,approximates the marginal conditional probabil-ities of occupancy of the ground plane,given the results of a background subtractionalgorithm,in allviewsacquired atthe same time.This approximation is obtained by minimizing the Kullback-Leibler divergence between a product law and the true posterior.We show that this is equivalent to computing the marginal probabilities of occupancy so that under the product law,the images obtained by putting rectangles of human sizes at occupied locations are likely to be similar to the images actually produced by the background subtraction.This represents a departure from more classical ap-proaches to estimating probabilities of occupancy that rely on computing a visual hull[26].Such approaches tend to be pessimistic and do not exploit trade-offs between the presence of people at different locations.For instance,if due to noise in one camera,a person is not seen in a particular view,then he would be discarded,even if he were seen in all others.By contrast,in our probabilistic framework,sufficient evidence might be present to detect him.Similarly,the presence of someone at a specific location creates an occlusion that hides the presence behind,which is not accounted for by the hull techniques but is by our approach.Since these marginal probabilities are computed indepen-dently at each time step,they say nothing about identity or correspondence with past frames.The appearance similarity is entirely conveyed by the color histograms,which has experimentally proved sufficient for our purposes.4C OMPUTATION OF THE T RAJECTORIESIn Section4.1,we break the global optimization of several people’s trajectories into the estimation of optimal individual trajectories.In Section 4.2,we show how this can be performed using the classical Viterbi’s algorithm based on dynamic programming.This requires a motion model given in Section 4.3and an appearance model described in Section4.4,which combines a color model given in Section4.5 and a sophisticated estimation of the ground plane occu-pancy detailed in Section5.We partition the visible area into a regular grid of G locations,as shown in Figs.5c and6,and from the camera calibration,we define for each camera c a family of rectangular shapes A c1;...;A c G,which correspond to crude human silhouettes of height175cm and width50cm located at every position on the grid.4.1Multiple TrajectoriesRecall that we denote by L n¼ðL n1;...;L n TÞthe trajectory of individual n.Given a batch of T temporal frames I¼ðI1;...;I TÞ,we want to maximize the posterior conditional probability:PðL1¼l1;...;L Nül NÃj IÞ¼PðL1¼l1j IÞY NÃn¼2P L n¼l n j I;L1¼l1;...;L nÀ1¼l nÀ1ÀÁ:ð2ÞSimultaneous optimization of all the L i s would beintractable.Instead,we optimize one trajectory after theother,which amounts to looking for^l1¼arg maxlPðL1¼l j IÞ;ð3Þ^l2¼arg maxlPðL2¼l j I;L1¼^l1Þ;ð4Þ...^l Nüarg maxlPðL Nül j I;L1¼^l1;L2¼^l2;...Þ:ð5ÞNote that under our model,conditioning one trajectory,given other ones,simply means that it will go through noalready occupied location.In other words,PðL n¼l j I;L1¼^l1;...;L nÀ1¼^l nÀ1Þ¼PðL n¼l j I;8k<n;8t;L n t¼^l k tÞ;ð6Þwhich is PðL n¼l j IÞwith a reduced set of the admissiblegrid locations.Such a procedure is recursively correct:If all trajectoriesestimated up to step n are correct,then the conditioning onlyimproves the estimate of the optimal remaining trajectories.This would suffice if the image data were informative enoughso that locations could be unambiguously associated toindividuals.In practice,this is obviously rarely the case.Therefore,this greedy approach to optimization has un-desired side effects.For example,due to partly missinglocalization information for a given trajectory,the algorithmmight mistakenly start following another person’s trajectory.This is especially likely to happen if the tracked individualsare located close to each other.To avoid this kind of failure,we process the images bybatches of T¼100and first extend the trajectories that havebeen found with high confidence,as defined below,in theprevious batches.We then process the lower confidenceones.As a result,a trajectory that was problematic in thepast and is likely to be problematic in the current batch willbe optimized last and,thus,prevented from“stealing”somebody else’s location.Furthermore,this approachincreases the spatial constraints on such a trajectory whenwe finally get around to estimating it.We use as a confidence score the concordance of theestimated trajectories in the previous batches and thelocalization cue provided by the estimation of the probabil-istic occupancy map(POM)described in Section5.Moreprecisely,the score is the number of time frames where theestimated trajectory passes through a local maximum of theestimated probability of occupancy.When the POM does notdetect a person on a few frames,the score will naturallydecrease,indicating a deterioration of the localizationinformation.Since there is a high degree of overlappingbetween successive batches,the challenging segment of atrajectory,which is due to the failure of the backgroundsubtraction or change in illumination,for instance,is met inseveral batches before it actually happens during the10keptframes.Thus,the heuristic would have ranked the corre-sponding individual in the last ones to be processed whensuch problem occurs.FLEURET ET AL.:MULTICAMERA PEOPLE TRACKING WITH A PROBABILISTIC OCCUPANCY MAP2714.2Single TrajectoryLet us now consider only the trajectory L n ¼ðL n 1;...;L nT Þof individual n over T temporal frames.We are looking for thevalues ðl n 1;...;l nT Þin the subset of free locations of f 1;...;G;Hg .The initial location l n 1is either a known visible location if the individual is visible in the first frame of the batch or H if he is not.We therefore seek to maximizeP ðL n 1¼l n 1;...;L n T ¼l nt j I 1;...;I T Þ¼P ðI 1;L n 1¼l n 1;...;I T ;L n T ¼l nT ÞP ðI 1;...;I T Þ:ð7ÞSince the denominator is constant with respect to l n ,we simply maximize the numerator,that is,the probability of both the trajectories and the images.Let us introduce the maximum of the probability of both the observations and the trajectory ending up at location k at time t :Èt ðk Þ¼max l n 1;...;l nt À1P ðI 1;L n 1¼l n 1;...;I t ;L nt ¼k Þ:ð8ÞWe model jointly the processes L n t and I t with a hidden Markov model,that isP ðL n t þ1j L n t ;L n t À1;...Þ¼P ðL n t þ1j L nt Þð9ÞandP ðI t ;I t À1;...j L n t ;L nt À1;...Þ¼YtP ðI t j L n t Þ:ð10ÞUnder such a model,we have the classical recursive expressionÈt ðk Þ¼P ðI t j L n t ¼k Þ|fflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflffl}Appearance modelmax P ðL n t ¼k j L nt À1¼ Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Motion modelÈt À1ð Þð11Þto perform a global search with dynamic programming,which yields the classic Viterbi algorithm.This is straight-forward,since the L n t s are in a finite set of cardinality G þ1.4.3Motion ModelWe chose a very simple and unconstrained motion model:P ðL n t ¼k j L nt À1¼ Þ¼1=Z Áe À k k À k if k k À k c 0otherwise ;&ð12Þwhere the constant tunes the average human walkingspeed,and c limits the maximum allowable speed.This probability is isotropic,decreases with the distance from location k ,and is zero for k k À k greater than a constantmaximum distance.We use a very loose maximum distance cof one square of the grid per frame,which corresponds to a speed of almost 12mph.We also define explicitly the probabilities of transitions to the parts of the scene that are connected to the hidden location H .This is a single door in the indoor sequences and all the contours of the visible area in the outdoor sequences in Fig.1.Thus,entrance and departure of individuals are taken care of naturally by the estimation of the maximum a posteriori trajectories.If there are enough evidence from the images that somebody enters or leaves the room,then this procedure will estimate that the optimal trajectory does so,and a person will be added to or removed from the visible area.4.4Appearance ModelFrom the input images I t ,we use background subtraction to produce binary masks B t such as those in Fig.4.We denote as T t the colors of the pixels inside the blobs and treat the rest of the images as background,which is ignored.Let X tk be a Boolean random variable standing for the presence of an individual at location k of the grid at time t .In Appendix B,we show thatP ðI t j L n t ¼k Þzfflfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflfflffl{Appearance model/P ðL n t ¼k j X kt ¼1;T t Þ|fflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflfflffl}Color modelP ðX kt ¼1j B t Þ|fflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflffl}Ground plane occupancy:ð13ÞThe ground plane occupancy term will be discussed in Section 5,and the color model term is computed as follows.4.5Color ModelWe assume that if someone is present at a certain location k ,then his presence influences the color of the pixels located at the intersection of the moving blobs and the rectangle A c k corresponding to the location k .We model that dependency as if the pixels were independent and identically distributed and followed a density in the red,green,and blue (RGB)space associated to the individual.This is far simpler than the color models used in either [18]or [13],which split the body area in several subparts with dedicated color distributions,but has proved sufficient in practice.If an individual n was present in the frames preceding the current batch,then we have an estimation for any camera c of his color distribution c n ,since we have previously collected the pixels in all frames at the locations272IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE,VOL.30,NO.2,FEBRUARY2008Fig.4.The color model relies on a stochastic modeling of the color of the pixels T c t ðk Þsampled in the intersection of the binary image B c t produced bythe background subtraction and the rectangle A ck corresponding to the location k .。
Algorithms for bigram and trigram word clustering
Speech Communication24199819–37Algorithms for bigram and trigram word clustering1¨Sven Martin),Jorg Liermann,Hermann Ney2¨Lehrstuhl fur Informatik VI,RWTH Aachen,UniÕersity of Technology,Ahornstraße55,,D-52056Aachen,GermanyReceived5June1996;revised15January1997;accepted23September1997AbstractIn this paper,we describe an efficient method for obtaining word classes for class language models.The method employs an exchange algorithm using the criterion of perplexity improvement.The novel contributions of this paper are the extension of the class bigram perplexity criterion to the class trigram perplexity criterion,the description of an efficient implementation for speeding up the clustering process,the detailed computational complexity analysis of the clustering algorithm,and, finally,experimental results on large text corpora of about1,4,39and241million words including examples of word classes,test corpus perplexities in comparison to word language models,and speech recognition results.q1998Elsevier Science B.V.All rights reserved.Zusammenfassung¨In diesem Bericht beschreiben wir eine effiziente Methode zur Erzeugung von Wortklassen fur klassenbasierte Sprachmodelle.Die Methode beruht auf einem Austauschalgorithmus unter Verwendung des Kriteriums der Perplexi-¨¨tatsverbesserung.Die neuen Beitrage dieser Arbeit sind die Erweiterung des Kriteriums der Klassenbigramm-Perplexitat zum¨Kriterium der Klassentrigramm-Perplexitat,die Beschreibung einer effizienten Implementierung zur Beschleunigung des¨Klassenbildungsprozesses,die detaillierte Komplexitatsanalyse dieser Implementierung,und schließlich experimentelle¨¨¨Ergebnisse auf großen Textkorpora mit ungefahr1,4,39und241Millionen Wortern,einschließlich Beispielen fur erzeugte¨Wortklassen,Test Korpus Perplexitaten im Vergleich zu wortbasierten Sprachmodellen und Erkennungsergebnissen auf Sprachdaten.q1998Elsevier Science B.V.All rights reserved.´´Resume´´` Dans cet article,nous decrivons une methode efficace d’obtention des classes de mots pour des modeles de langage.´´`´´Cette methode emploie un algorithme d’echange qui utilise le critere d’amelioration de la perplexite.Les contributions ´`´nouvelles apportees par ce travail concernent l’extension aux trigrammes du critere de perplexite de bigrammes de classes,la ´´´´´´description d’une implementation efficace pour accelerer le processus de regroupement,l’analyse detaillee de la complexite´´calculatoire,et,finalement,des resultats experimentaux sur de grands corpus de textes de1,4,39et241millions de mots,)Corresponding author.Email:martin@informatik.rwth-aachen.de.1This paper is based on a communication presented at the ESCA Conference EUROSPEECH’95and has been recommended by the EUROSPEECH’95Scientific Committee.2Email:ney@informatik.rwth-aachen.de.0167-6393r98r$19.00q1998Elsevier Science B.V.All rights reserved.Ž.PII S0167-63939700062-9()S.Martin et al.r Speech Communication 24199819–3720incluant des exemples de classes de mots produites,de perplexites de corpus de test comparees aux modeles de langage de ´´`mots,et des resultats de reconnaissance de parole.q 1998Elsevier Science B.V.All rights reserved.´Keywords:Stochastic language modeling;Statistical clustering;Word equivalence classes;Wall Street Journal corpus1.IntroductionThe need for a stochastic language model in speech recognition arises from Bayes’decision rule Ž.for minimum error rate Bahl et al.,1983.The word sequence w ...w to be recognized from the se-1N quence of acoustic observations x ...x is deter-1T mined as that word sequence w ...w for which the 1N Ž<.posterior probability Pr w ...w x ...x attains 1N 1T its maximum.This rule can be rewritten in the form <arg max Pr w ...w P Pr x ...x w ...w ,4Ž.Ž1N 1T 1N w ...w 1NŽ<.where Pr x ...x w ...w is the conditional 1T 1N probability of,given the word sequence w ...w ,1N observing the sequence of acoustic measurements Ž.x ...x and where Pr w ...w is the prior proba-1T 1N bility of producing the word sequence w ...w .1N The task of the stochastic language model is to provide estimates of these prior probabilities Ž.Pr w ...w .Using the definition of conditional 1N probabilities,we obtain the decomposition:N<Pr w ...w sPr w w ...w .Ž.Ž.Ł1N n 1n y 1n s 1For large vocabulary speech recognition,these conditional probabilities are typically used in the Ž.following way Bahl et al.,1983.The dependence of the conditional probability of observing a word w n at a position n is assumed to be restricted to its Ž.immediate m y 1predecessor words w q n y m 1...w .The resulting model is that of a Markov n y 1chain and is referred to as m -gram model.For m s 2and m s 3,we obtain the widely used bigram and trigram models,respectively.These bigram and tri-gram models are estimated from a text corpus during a training phase.But even for these restricted mod-els,most of the possible events,i.e.,word pairs and word triples,are never seen in training because there are so many of them.Therefore in order to allow for events not seen in training,the probability distribu-tions obtained in these m -gram approaches are smoothed with more general ually,Ž.these are also m -grams with a smaller value for m or a more sophisticated approach like a singleton Ždistribution Jelinek,1991;Ney et al.,1994;Ney et .al.,1997.In this paper,we try a different approach for smoothing by using word equivalence classes,or word classes for short.Here,each word belongs to exactly one word class.If a certain word m -gram did not appear in the training corpus,it is still possible that the m -gram of the word classes corresponding to these words did occur and thus a word class based m -gram language model,or class m -gram model for short,can be estimated.More general,as the number of word classes is smaller than the number of words,the number of model parameters is reduced so that each parameter can be estimated more reliably.On the other hand,reducing the number of model pa-rameters makes the model coarser and thus the pre-diction of the next word less precise.So there has to be a tradeoff between these two extremes.Typically,word classes are based on syntactic semantic concepts and are defined by linguistic ex-perts.In this case,they are called parts of speech Ž.POS .Generalizing the concept of word similarities,we can also define word classes by using a statistical criterion,which in most cases,but not necessarily,is maximum likelihood or,equivalently,perplexity ŽJelinek,1991;Brown et al.,1992;Kneser and Ney,.1993;Ney et al.,1994.With the latter two ap-proaches,word classes are defined using a clustering algorithm based on minimizing the perplexity of a class bigram language model on the training corpus,which we will call bigram clustering for short.The contributions of this paper are:Øthe extension of the clustering algorithm from the bigram criterion to the trigram criterion;Øthe detailed analysis of the computational com-plexity of both bigram and trigram clustering algorithms;Øthe design and discussion of an efficient imple-mentation of both clustering algorithms;Øsystematic tests using the 39-million word Wall Street Journal corpus concerning perplexity and()S.Martin et al.r Speech Communication24199819–3721Table1List of symbolsW vocabulary sizeu,Õ,w,x words in a running text;usually w is the word under discussion,r its successor,y its predecessor and u the predecessor toÕw word in text corpus position nnŽ.S w set of successor words to word w in the training corpusŽ.P w set of predecessor words to word w in the training corpusŽ.Ž.SÕ,w set of successor words to bigramÕ,w in the training corpusŽ.Ž.PÕ,w set of predecessor words to bigramÕ,w in the training corpusG number of word classesG:w™g class mapping functionwg,k word classesŽ.N training corpus sizeB number of distinct word bigrams in the training corpusT number of distinct word trigrams in the training corpusŽ.N P number of occurrences in the training corpus of the event in parenthesesŽ.F G log-likelihood for a class bigram modelbiŽ.F G log-likelihood for a class trigram modeltriPP perplexityI number of iterations of the clustering algorithmŽ.Ž.G P,wÝ1i.e.,number of seen predecessor word classes to word wg:NŽg,w.)0Ž.Ž.G w,PÝ1i.e.,number of seen successor word classes to word wg:NŽw,g.)0y1Ž.Ž.W PÝG P,w i.e.,average number of seen predecessor word classesP w wy1Ž.Ž.W PÝG w,P i.e.,average number of seen successor word classesw P wŽ.Ž.G P,P,wÝ1i.e.,number of seen word class bigrams preceding word wg,g:NŽg,g,w.)01212Ž.Ž.G P,w,PÝ1i.e.,number of seen word class pairs embracing word wg,g:NŽg,w,g.)01212Ž.Ž.G w,P,PÝ1i.e.,number of seen word class bigrams succeeding word wg,g:NŽw,g,g.)01212b absolute discounting value for smoothingŽ.N g number of distinct words appearing r times in word class grŽ.G g,P number of distinct word classes seen r times right after word class grÕÕŽ.G P,g number of distinct word classes seen r times right beforeword class gr w wŽ.G P,P number of distinct word class bigrams seen r timesrŽ.b g generalized distribution for smoothingwclustering times for various numbers of word classes and initialization methods;Øspeech recognition results using the North Ameri-can Business corpus.The original exchange algorithm presented in thisŽ. paper was published in Kneser and Ney,1993with good results on the LOB corpus.There is a differentŽ. approach described in Brown et al.,1992employ-ing a bottom-up algorithm.There are also ap-Žproaches based on simulated annealing Jardino and .Adda,1994.Word classes can also be derived fromŽan automated semantic analysis Bellegarda et al., .Ž1996,or by morphological features Lafferty and.Mercer,1993.The organization of this paper is as follows: Section2gives a definition of class models,explains the outline of the clustering algorithm and the exten-sion to a trigram based statistical clustering criterion.Section3presents an efficient implementation of the clustering algorithm.Section4analyses the computa-tional complexity of this efficient implementation. Section5reports on text corpus experiments con-cerning the performance of the clustering algorithm in terms of CPU time,resulting word classes and training and test perplexities.Section6shows the results for the speech recognition experiments.Sec-tion7discusses the results and their usefulness to language models.In this paper,we introduce a large number of symbols and quantities;they are summa-rized in Table1.2.Class models and clustering algorithmIn this section,we will present our class bigram and trigram models and we will derive their log()S.Martin et al.r Speech Communication 24199819–3722likelihood function,which serves as our statistical criterion for obtaining word classes.With our ap-proach,word classes result from a clustering algo-rithm,which exchanges a word between a fixed number of word classes and assigns it to the word class where it optimizes the log likelihood.We will discuss alternative strategies for finding word classes.We will also describe smoothing methods for the class models trained,which are necessary to avoid zero probabilities on test corpora.2.1.Class bigram modelsWe partition the vocabulary of size W into a fixed number G of word classes.The partition is repre-sented by the so-calledclass or category mapping function G :w ™g Ž.w mapping each word w of the vocabulary to its word class g .Assigning a word to only one word class is w a possible drawback which is justified by the sim-plicity and efficiency of the clustering process.For the rest of this paper,we will use the letters g and k Ž.for arbitrary word classes.For a word bigram Õ,w Ž.we use g ,g to denote the corresponding class Õw bigram.For class models,we have two types of probabil-ity distributions:Ž<.Øa transition probability function p g g which 1w Õrepresents the first-order Markov chain probabil-ity for predicting the word class g from its w predecessor word class g ;ÕŽ<.Øa membership probability function p w g esti-0mating the word w from word class g .Since a word belongs to exactly one word class,we have )0if g s g ,w <p w g Ž.0½s 0if g /g .w Therefore,we can use the somewhat sloppy notation Ž<.p w g .0w For a class bigram model,we have then:<<<p w Õs p w g P p q g .1Ž.Ž.Ž.Ž.0w 1w ÕNote that this model is a proper probability function,and that we make an independency assumption be-tween the prediction of a word from its word class and the prediction of a word class from its predeces-sor word classes.Such a model leads to a drastic Žreduction in the number of free parameters:G P G y .Ž<.Ž.1probabilities for the table p g g ,W y G 1w ÕŽ<.probabilities for the table p w g ,and W indices 0w for the mapping G :w y g .w For maximum likelihood estimation,we construct Ž.the log likelihood function using Eq.1:N<F G slog Pr w w ...w Ž.Ž.Ýbi n 1n y 1n s f<s N Õ,w P log p w ÕŽ.Ž.ÝÕ,w<s N w P log p w g Ž.Ž.Ý0w w<qN g ,g P log p g g 2Ž.Ž.Ž.ÝÕw 1w Õg ,g ÕwŽ.with N P being the number of occurrences of the event given in the parentheses in the training data.To construct a class bigram model,we first hypothe-size a mapping function G .Then,for this hypothe-sized mapping function G ,the probabilities Ž<.Ž<.Ž.p w g and p g g in Eq.2can be estimated 0w 1w Õby adding the Lagrange multipliers for the normal-ization constraints and taking the derivatives.This Ž.results in relative frequencies Ney et al.,1994:N w Ž.<p w g s ,3Ž.Ž.0w N g Ž.w N g ,g Ž.Õw <p g g s.4Ž.Ž.1w ÕN g Ž.ÕŽ.Ž.Using the estimates given by Eqs.3and 4,we Ž.can now express the log likelihood function F G bi for a mapping G in terms of the counts:<F G s N Õ,w P log p w ÕŽ.Ž.Ž.Ýbi Õ,wN w Ž.s N w P logŽ.ÝN g Ž.w wN g ,g Ž.Õw q N g ,g P logŽ.ÝÕw N g Ž.Õg ,g ÕwsN g ,g P log N g ,g Ž.Ž.ÝÕw Õw g ,g Õwy 2P N g P log N g Ž.Ž.Ýgq N w P log N w 5Ž.Ž.Ž.Ýw()S.Martin et al.r Speech Communication 24199819–3723s N w log N w Ž.Ž.ÝwN g ,g Ž.Õw qN g ,g log .6Ž.Ž.ÝÕw N g N g Ž.Ž.Õw g ,gÕwŽ.Ž.In Brown et al.,1992the second sum of Eq.6isinterpreted as the mutual information between the word classes g and g .Note,however,that the Õw derivation given here is based on the maximum likelihood criterion only.2.2.Class trigram modelsConstructing the log likelihood function for the class trigram model<<<p w u ,Õs p w g P p g g ,g 7Ž.Ž.Ž.Ž.0w 2w u Õresults in<F G s N w P log p w g Ž.Ž.Ž.Ýtri 0w wqN g ,g ,g Ž.Ýu Õw g ,g ,g u Õw<P log p g g ,g .8Ž.Ž.2w u ÕŽ.Taking the derivatives of Eq.8for maximum likelihood parameter estimation also results in rela-tive frequencies N g ,g ,g Ž.u Õw <p g g ,g s9Ž.Ž.2w u ÕN g ,g Ž.u ÕŽ.Ž.Ž.and,using Eqs.3,7–9:<F G sN u ,Õ,w P log p w u ,ÕŽ.Ž.Ž.Ýtri u ,Õ,wN w Ž.s N w P logŽ.ÝN g Ž.w wN g ,g ,g Ž.u Õw q N g ,g ,g P logŽ.Ýu Õw N g ,g Ž.u Õg ,g ,g u ÕwsN g ,g ,g P log N g ,g ,g Ž.Ž.Ýu Õw u Õw g ,g ,g u ÕwyN g ,g P log N g ,g Ž.Ž.Ýu Õu Õg ,g u Õy N g P log N g q N w P log N w Ž.Ž.Ž.Ž.ÝÝw w g wws N w log N w Ž.Ž.ÝwN g ,g ,g Ž.u Õw qN g ,g ,g log.Ž.Ýu Õw N g ,g N g Ž.Ž.u Õw g ,g ,g u Õw10Ž.2.3.Exchange algorithmTo find the unknown mapping G :w y g ,we w will show now how to apply a clustering algorithm.The goal of this algorithm is to find a class mapping function G such that the perplexity of the class model is minimized over the training corpus.We use an exchange algorithm similar to the exchange algo-Žrithms used in conventional clustering ISODATA Ž..Duda and Hart,1973,pp.227–228,where an observation vector is exchanged from one cluster to another cluster in order to improve the criterion.In the case of language modeling,the optimization Ž.criterion is the log-likelihood,i.e.,Eq.5for the Ž.class bigram model and Eq.10for the class trigram model.The algorithm employs a technique of local optimization by looping through each element of the set,moving it tentatively to each of the G word classes and assigning it to that word class resulting in the lowest perplexity.The whole procedure is repeated until a stopping criterion is met.The outline of our algorithm is depicted in Fig.1.We will use the term to remo Õe for taking a word out of the word class to which it has been assigned in the previous iteration,the term to mo Õe for insert-ing a word into a word class,and the term to exchange for a combination of a removal followed by a move.For initialization,we use the following method:Ž.we consider the most frequent G y 1words,and each of these words defines its own word class.The remaining words are assigned to an additional word class.As a side effect,all the words with a zero Ž.unigram count N w are assigned to this word class and remain there,because exchanging them has no effect on the training corpus perplexity.The stopping criterion is a prespecified number of iterations.In addition,the algorithm stops if no words are ex-changed any more.()S.Martin et al.r Speech Communication 24199819–3724Fig.1.Outline of the exchange algorithm for word clustering.Thus,in this method,we exploit the training corpus in two ways:1.in order to find the optimal partitioning;2.in order to evaluate the perplexity.An alternative approach would be to use two different data sets for these two tasks,or to simulate unseen events using leaving-one-out.That would result in an upper bound and possibly in more robust word classes,but at the cost of higher mathematical Ž.and computational expenses.Kneser and Ney,1993employs leaving one out for clustering.However,the improvement was not very significant,and so we will use the simpler original method here.An effi-cient implementation of this clustering algorithm will be presented in Section 3.parison with alternati Õe optimization strate -giesIt is interesting to compare the exchange algo-rithm for word clustering with two other approaches described in the literature,namely simulated anneal -Ž.ing Jardino and Adda,1993and bottom-up cluster -Ž.ing Brown et al.,1992.In simulated annealing ,the baseline optimization strategy is similar to the strategy of the exchange algorithm.The important difference is according to the simulated annealing concept that we accept tem-porary degradations of the optimization criterion.The decision of whether to accept a degradation or not is made dependent on the so called cooling parameter.This approach is usually referred to as Metropolis algorithm.Another difference is that the words to be exchanged from one word class to another and the target word classes are selected by the so-called Monte Carlo ing the correct cooling parameter,simulated annealing converges to the global optimum.In our own experimental tests Ž.unpublished results ,we made the experience that there was only a marginal improvement in the per-plexity criterion at dramatically increased computa-Ž.tional costs.In Jardino,1996,simulated annealing is applied to a large training corpus from the Wall Street Journal,but no CPU times are given.In Ž.addition in Jardino and Adda,1994,the authors introduce a modification of the clustering model allowing several word classes for each word,at least in principle.This modification,however,is more related to the definition of the clustering model and not that much to the optimization strategy.In this paper,we do not consider such types of stochastic class mappings.The other optimization strategy,bottom-up clus -Ž.tering ,as presented in Brown et al.,1992,is also Ž.based on the perplexity criterion given by Eq.6.However,instead of the exchange algorithm,the authors use the well-known hierarchical bottom-up Žclustering algorithm as described in Duda and Hart,.1973,pp.230and 235.The typical iteration step here is to reduce the number of word classes by one.This is achieved by merging that pair of word classes for which the perplexity degradation is the smallest.This process is repeated until the desired number of word classes has been obtained.The iteration process is initialized by defining a separate word class for Ž.each word.In Brown et al.,1992,the authors describe special methods to keep the computational complexity of the algorithm as small as possible.Obviously,like the exchange algorithm,this bottom up clustering strategy achieves only a local optimum.Ž.As reported in Brown et al.,1992,the exchange algorithm can be used to improve the results ob-tained by bottom-up clustering.From this result and our own experimental results for the various initial-Žization methods of the exchange algorithm see Sec-.tion 5.4,we may conclude that there is no basic performance difference between bottom-up cluster-ing and exchange clustering.()S.Martin et al.r Speech Communication 24199819–37252.5.Smoothing methodsŽ.Ž.Ž.On the training corpus,Eqs.3,4and 9are well-defined.However,even though the parameter estimation for class models is more robust than for word models,some of the class bigrams or trigrams in a test corpus may have zero frequencies in the training corpus,resulting in zero probabilities.To avoid this,smoothing must be used on the test corpus.However,for the clustering process on the training corpus,the unsmoothed relative frequencies Ž.Ž.Ž.of Eqs.3,4and 9are still used.To smooth the transition probability,we use the method of absolute interpolation with a singleton Ž.generalized distribution Ney et al.,1995,1997:N g ,g y bŽ.Õw <p g g s max 0,Ž.1w Õž/N g Ž.Õbq G y G g ,P PP b g ,Ž.Ž.Ž.0Õw N g Ž.ÕG P ,P Ž.1b s,G P ,P q 2P G P ,P Ž.Ž.12G P ,g Ž.1w b g s,Ž.w G P ,P Ž.1with b standing for the history-independent discount-Ž.ing value,g g ,P for the number of word classes r ÕŽ.seen r times right after word class g ,g P ,g for Õr w the number of word classes seen r times right before Ž.word class g ,and g P ,P for the number of w r distinct word class bigrams seen r times in the Ž.training corpus.b g is the so-called singleton w Ž.generalized distribution Ney et al.,1995,1997.The same method is used for the class trigram model.To smooth the membership distribution,we use the method of absolute discounting with backing off Ž.Ney et al.,1995,1997:N w y b Ž.°g Õif N w )0,Ž.N g Ž.w ~<p w g sŽ.0w b 1g w N g PPif N w s 0,Ž.Ž.Ýr w ¢N g N g Ž.Ž.w 0w r )0N G Ž.1w b s,g w N g q 2P N g Ž.Ž.1w 2w N g [1,Ž.Ýr w XXŽ.w :g s g ,N w s rw w with b standing for the word class dependent g w Ž.discounting value and N g for the number of r w words appearing r times and belonging to word class g .The reason for a different smoothing w method for the membership distribution is that no singleton generalized distribution can be constructed from unigram counts.Without singletons,backing Ž.off works better than interpolation Ney et al.,1997.However,no smoothing is applied to word classes with no unseen words.With our clustering algo-rithm,there is only one word class containing unseen words.Therefore,the effect of the kind of smoothing used for the membership distribution is negligible.Thus,for the sake of consistency,absolute interpola-tion could be used to smooth both distributions.3.Efficient clustering implementationA straightforward implementation of our cluster-ing algorithm presented in Section 2.3is time con-suming and prohibitive even for a small number of word classes G .In this section,we will present our techniques to improve computational performance in order to obtain word classes for large numbers of word classes.A detailed complexity analysis of the resulting algorithm will be presented in Section 4.3.1.Bigram clusteringŽ.We will use the log-likelihood Eq.5as the criterion for bigram clustering,which is equivalent to the perplexity criterion.The exchange of a word between word classes is entirely described by alter-ing the affected counts of this formula.3.1.1.Efficient method for count generationŽ.All the counts of Eq.5are computed once,stored in tables and updated after a word exchange.As we will see later,we need additional counts N w ,g s N w ,x ,11Ž.Ž.Ž.Ýx :g s gx N g ,w sN Õ,w 12Ž.Ž.Ž.ÝÕ:g s gÕ()S.Martin et al.r Speech Communication 24199819–3726Fig.2.Efficient procedure for count generation.describing how often a word class g appears right after and right before,respectively,a word w .These counts are recounted anew for each word currently under consideration,because updating them,if nec-essary,would require the same effort as recounting,and would require more memory because of the large tables.Ž.Ž.For a fixed word w in Eqs.11and 12,we need to know the predecessor and the successor words,which are stored as lists for each word w ,and the corresponding bigram counts.However,we ob-serve that if word Õprecedes w ,then w succeeds Õ.Ž.Consequently,the bigram Õ,w is stored twice,once in the list of successors to Õ,and once in the list of predecessors to w ,thus resulting in high memory consumption.However,dropping one type of list would result in a high search effort.Therefore we keep both lists,but with bigram counts stored only in the list of ing four bytes for the counts and two bytes for the word indexes,we reduce the memory requirements by 1r 3at the cost of a minor Ž.search effort for obtaining the count N Õ,w from the list of successors to Õby binary search.The Ž.Ž.count generation procedure for Eqs.11and 12is depicted in Fig.2.3.1.2.Baseline perplexity recomputationŽ.We will examine how the counts in Eq.5must be updated in a word exchange.We observe that removing a word w from word class g and moving w it to a word class k only affects those counts of Eq.Ž.5that involve g or k ;all the other counts,and,w consequently,their contributions to the perplexity remain unchanged.Thus,to compute the change in Ž.perplexity,we recompute only those terms in Eq.5which involve the affected counts.We consider in detail how to remove a word from word class g .Moving a word to a word class k isw similar.First,we have to reduce the word class unigram count:N g [N g y N w .Ž.Ž.Ž.w w Then,we have to decrement the transition counts from g to a word class g /g and from an w w arbitrary word class g /g by the number of times w w appears right before or right after g ,respectively:;g /g :N g ,g [N g ,g y N g ,w ,13Ž.Ž.Ž.Ž.w w w ;g /g :N g ,g [N g ,g y N w ,g .14Ž.Ž.Ž.Ž.w w w Ž.Changing the self-transition count N g ,g is a bit w w more complicated.We have to reduce this count by the number of times w appears right before or right after another word of g .However,if w follows w Ž.itself in the corpus,N w ,w is considered in both Ž.Ž.Eqs.11and 12.Therefore,it is subtracted twice from the transition count and must be added once for compensation:N g ,g [N g ,g y N g ,w Ž.Ž.Ž.w w w w w y N w ,g q N w ,w .15Ž.Ž.Ž.w Ž.Finally,we have to update the counts N g ,w and w Ž.N w ,g :w N g ,w [N g ,w y N w ,w ,Ž.Ž.Ž.w w N w ,g [N w ,g y N w ,w .Ž.Ž.Ž.w w Ž.We can view Eq.15as an application of the inclusion r exclusion principle from combinatorics Ž.Takacs,1984.If two subsets A and B of a set C ´are to be removed from C ,the intersection of A and B can only be removed once.Fig.3gives an inter-pretation of this principle applied to our problem of count updating.Viewing these updates in terms of the inclusion r exclusion principle will help to under-stand the mathematically more complicated update formulae for trigram clustering.。
数模美赛论文常用词汇
exclusively专门undobtedly毫无疑问的notable 值得注意的tremedous/significant极大的notion概念definition定义——defineInterpret……as…… 理解……为invoke(+模型援引,引用equation方程式,等式function 因变量——提示符号的含义matrix矩阵,模型constant 常数,常量It requires I t o be a constant for …to be truealgorithm演算方法——a general algorithm 通用算法simplify the algorithm 简化算法we have produced a general algrrithm to solve this tpye of problems.derivative微分,倒数antiderivative 不定积分optimal results 最优结果invesgate the problem from different point of view调查问题——investgation调查survey 调查subproblem 子问题,次要问题——major problem 主要问题metric 度量标准,指标digit 数字delete some digitselement /component 元素解题思路seek/explore——explore different ideas探索不同的想法we seek to device a new model for solving the problem by exploring the new direction suggested by their investigations.解决方案design/device ——develop/establish/conductBased on our analysis, we design a model for the problem using integral linear programming(线性积分). We then devise a polynominal-time apprximation algorithm to produce near optimal ing integral linear programming.We then device a polynominal-time approximation toWe conduct sensitivity analysis on…to find…xxx analysis is also performed.解决结果tackle/solveWe tackle the problem using the new technique we developed in the previous section.While it is difficult to solve the problem completely, we are able to solve a major subproblem.计划与打算approach/proposeWe approach the problem using the proposed method.We propose a new approach to tackling the problem.词组Based on…以……为基础According to根据Devide …into…——subdivide into细分…is applied to…使用了……模型来……——we apply our model into将我们的模型运用于Model proves to be efficient in other sports.模型被证明在其他方面有效….,which indicates that………反映了…,which led to the change of…导致了……的变化We…..only to find that..我们……只是发现了……… doesn’t matter ……是无关的Take…as example/as a case study 举例formulate and justify the assumptions 阐述并证明假说design/establish a model设计模型devise an algorithm 设计一个运算法/计算程序carry out numerical simulations 进行数学模拟for our problem a relationship exists that(… 我们的问题中存在一个关系式,使……we will assume/suppose that…我们假设……compare with different approaches 与不同的措施相比较There are at least two notions of where the sweet spot should be—an impact location on the bat that either· minimizesthe discomfort to the hands, or· maximizes the outgoing velocity Of the ball.We focus exclusively on the second definition我们专注于第二种定义We interpret the error of +2 as a normal distribution,.一with standard deviation of 1。
Research Statement
Research StatementParikshit GopalanMy research focuses on fundamental algebraic problems such as polynomial reconstruction and interpolation arising from various areas of theoretical computer science.My main algorith-mic contributions include thefirst algorithm for list-decoding a well-known family of codes called Reed-Muller codes[13],and thefirst algorithms for agnostically learning parity functions[3]and decision trees[11]under the uniform distribution.On the complexity-theoretic side,my contribu-tions include the best-known hardness results for reconstructing low-degree multivariate polyno-mials from noisy data[12]and the discovery of a connection between representations of Boolean functions by polynomials and communication complexity[2].1IntroductionMany important recent developments in theoretical computer science,such as probabilistic proof checking,deterministic primality testing and advancements in algorithmic coding theory,share a common feature:the extensive use of techniques from algebra.My research has centered around the application of these methods to problems in Coding theory,Computational learning,Hardness of approximation and Boolean function complexity.While atfirst glance,these might seem like four research areas that are not immediately related, there are several beautiful connections between these areas.Perhaps the best illustration of these links is the noisy parity problem where the goal is to recover a parity function from a corrupted set of evaluations.The seminal Goldreich-Levin algorithm solves a version of this problem;this result initiated the study of list-decoding algorithms for error-correcting codes[5].An alternate solution is the Kushilevitz-Mansour algorithm[19],which is a crucial component in algorithms for learning decision trees and DNFs[17].H˚a stad’s ground-breaking work on the hardness of this problem has revolutionized our understanding of inapproximability[16].All these results rely on insights into the Fourier structure of Boolean functions.As I illustrate below,my research has contributed to a better understanding of these connec-tions,and yielded progress on some important open problems in these areas.2Coding TheoryThe broad goal of coding theory is to enable meaningful communication in the presence of noise, by suitably encoding the messages.The natural algorithmic problem associated with this task is that of decoding or recovering the transmitted message from a corrupted encoding.The last twenty years have witnessed a revolution with the discovery of several powerful decoding algo-rithms for well-known families of error-correcting codes.A key role has been played by the notion of list-decoding;a relaxation of the classical decoding problem where we are willing to settle for a small list of candidate transmitted messages rather than insisting on a unique answer.This relaxation allows one to break the classical half the minimum distance barrier for decoding error-correcting codes.We now know powerful list-decoding algorithms for several important code families,these algorithms have also made a huge impact on complexity theory[5,15,23].List-Decoding Reed-Muller Codes:In recent work with Klivans and Zuckerman,we give the first such list-decoding algorithm for a well-studied family of codes known as Reed-Muller codes, obtained from low-degree polynomials over thefinitefield F2[13].The highlight of this work is that our algorithm is able to tolerate error-rates which are much higher than what is known as the Johnson bound in coding theory.Our results imply new combinatorial bounds on the error-correcting capability of these codes.While Reed-Muller codes have been studied extensively in both coding theory and computer science communities,our result is thefirst to show that they are resilient to remarkably high error-rates.Our algorithm is based on a novel view of the Goldreich-Levin algorithm as a reduction from list-decoding to unique-decoding;our view readily extends to polynomials of arbitrary degree over anyfield.Our result complements recent work on the Gowers norm,showing that Reed-Muller codes are testable up to large distances[21].Hardness of Polynomial Reconstruction:In the polynomial reconstruction problem,one is asked to recover a low-degree polynomial from its evaluations at a set of points and some of the values could be incorrect.The reconstruction problem is ubiquitous in both coding theory and computational learning.Both the Noisy parity problem and the Reed-Muller decoding problem are instances of this problem.In joint work with Khot and Saket,we address the complexity of this problem and establish thefirst hardness results for multivariate polynomials of arbitrary degree [12].Previously,the only hardness known was for degree1,which follows from the celebrated work of H˚a stad[16].Our work introduces a powerful new algebraic technique called global fold-ing which allows one to bypass a module called consistency testing that is crucial to most hardness results.I believe this technique willfind other applications.Average-Case Hardness of NP:Algorithmic advances in decoding of error-correcting codes have helped us gain a deeper understand of the connections between worst-case and average case complexity[23,24].In recent work with Guruswami,we use this paradigm to explore the average-case complexity of problems in NP against algorithms in P[8].We present thefirst hardness amplification result in this setting by giving a construction of an error-correcting code where most of the symbols can be recovered correctly from a corrupted codeword by a deterministic algorithm that probes very few locations in the codeword.The novelty of our work is that our decoder is deterministic,whereas previous algorithms for this task were all randomized.3Computational LearningComputational learning aims to understand the algorithmic issues underlying how we learn from examples,and to explore how the complexity of learning is influenced by factors such as the ability to ask queries and the possibility of incorrect answers.Learning algorithms for a class of concept typically rely on understanding the structure of that concept class,which naturally ties learning to Boolean function complexity.Learning in the presence of noise has several connections to decoding from errors.My work in this area addresses the learnability of basic concept classes such as decision trees,parities and halfspaces.Learning Decision Trees Agnostically:The problem of learning decision trees is one of the central open problems in computational learning.Decision trees are also a popular hypothesis class in practice.In recent work with Kalai and Klivans,we give a query algorithm for learning decision trees with respect to the uniform distribution on inputs in the agnostic model:given black-box access to an arbitrary Boolean function,our algorithmfinds a hypothesis that agrees with it on almost as many inputs as the best decision tree[11].Equivalently,we can learn decision trees even when the data is corrupted adversarially;this is thefirst polynomial-time algorithm for learning decision trees in a harsh noise model.Previous decision-tree learning algorithms applied only to the noiseless setting.Our algorithm can be viewed as the agnostic analog of theKushilevitz-Mansour algorithm[19].The core of our algorithm is a procedure to implicitly solve a convex optimization problem in high dimensions using approximate gradient projection.The Noisy Parity Problem:The Noisy parity problem has come to be widely regarded as a hard problem.In work with Feldman et al.,we present evidence supporting this belief[3].We show that in the setting of learning from random examples(without queries),several outstanding open problems such as learning juntas,decision trees and DNFs reduce to restricted versions of the problem of learning parities with random noise.Our result shows that in some sense, noisy parity captures the gap between learning from random examples and learning with queries, as it is believed to be hard in the former setting and is known to be easy in the latter.On the positive side,we present thefirst non-trivial algorithm for the noisy parity problem under the uniform distribution in the adversarial noise model.Our result shows that somewhat surprisingly, adversarial noise is no harder to handle than random noise.Hardness of Learning Halfspaces:The problem of learning halfspaces is a fundamental prob-lem in computational learning.One could hope to design algorithms that are robust even in the presence of a few incorrectly labeled points.Indeed,such algorithms are known in the setting where the noise is random.In work with Feldman et al.,we show that the setting of adversarial errors might be intractable:given a set of points where99%are correctly labeled by some halfs-pace,it is NP-hard tofind a halfspace that correctly labels even51%of the points[3].4Prime versus Composite problemsMy thesis work focuses on new aspects of an old and famous problem:the difference between primes and composites.Beyond basic problems like primality and factoring,there are many other computational issues that are not yet well understood.For instance,in circuit complexity,we have excellent lower bounds for small-depth circuits with mod2gates,but the same problem for circuits with mod6gates is wide open.Likewise in combinatorics,set systems where sizes of the sets need to satisfy certain modular conditions are well studied.Again the prime case is well understood,but little is known for composites.In all these problems,the algebraic techniques that work well in the prime case break down for composites.Boolean function complexity:Perhaps the simplest class of circuits for which we have been unable to show lower bounds is small-depth circuits with And,Or and Mod m gates where m is composite;indeed this is one of the frontier open problems in circuit complexity.When m is prime, such bounds were proved by Razborov and Smolensky[20,22].One reason for this gap is that we do not fully understand the computational power of polynomials over composites;Barrington et.al were thefirst to show that such polynomials are surprisingly powerful[1].In joint work with Bhatnagar and Lipton,we solve an important special case:when the polynomials are symmetric in their variables[2].We show an equivalence between computing Boolean functions by symmetric polynomials over composites and multi-player communication protocols,which enables us to apply techniques from communication complexity and number theory to this problem.We use these techniques to show tight degree bounds for various classes of functions where no bounds were known previously.Our viewpoint simplifies previously known results in this area,and reveals new connections to well-studied questions about Diophantine equations.Explicit Ramsey Graphs:A basic open problem regarding polynomials over composites is: Can asymmetry in the variables help us compute a symmetric function with low degree?I show a connec-tion between this question and an important open problem in combinatorics,which is to explicitly construct Ramsey graphs or graphs with no large cliques and independent sets[6].While good Ramsey graphs are known to exist by probabilistic arguments,explicit constructions have proved elusive.I propose a new algebraic framework for constructing Ramsey graphs and showed howseveral known constructions can all be derived from this framework in a unified manner.I show that all known constructions rely on symmetric polynomials,and that such constructions cannot yield better Ramsey graphs.Thus the question of symmetry versus asymmetry of variables is precisely the barrier to better constructions by such techniques.Interpolation over Composites:A basic problem in computational algebra is polynomial interpolation,which is to recover a polynomial from its evaluations.Interpolation and related algorithmic tasks which are easy for primes become much harder,even intractable over compos-ites.This difference stems from the fact that over primes,the number of roots of a polynomial is bounded by the degree,but no such theorem holds for composites.In lieu of this theorem I presented an algorithmic bound;I show how to compute a bound on the degree of a polynomial given its zero set[7].I use this to give thefirst optimal algorithms for interpolation,learning and zero-testing over composites.These algorithms are based on new structural results about the ze-roes of polynomials.These results were subsequently useful in ruling out certain approaches for better Ramsey constructions[6].5Other Research HighlightsMy other research work spans areas of theoretical computer science ranging from algorithms for massive data sets to computational complexity.I highlight some of this work below.Data Stream Algorithms:Algorithmic problems arising from complex networks like the In-ternet typically involve huge volumes of data.This has led to increased interest in highly efficient algorithmic models like sketching and streaming,which can meaningfully deal with such massive data sets.A large body of work on streaming algorithms focuses one estimating how sorted the input is.This is motivated by the realization that sorting the input is intractable in the one-pass data stream model.In joint work with Krauthgamer,Jayram and Kumar,we presented thefirst sub-linear space data stream algorithms to estimate two well-studied measures of sortedness:the distance from monotonicity(or Ulam distance for permutations),and the length of the Longest Increasing Subsequence or LIS.In more recent work with Anna G´a l,we prove optimal lower bounds for estimating the length of the LIS in the data-stream model[4].This is established by proving a direct-sum theorem for the communication complexity of a related problem.The novelty of our techniques is the model of communication that they address.As a corollary,we obtain a separation between two models of communication that are commonly studied in relation to data stream algorithms.Structural Properties of SAT solutions:The solution space of random SAT formulae has been studied with a view to better understanding connections between computational hardness and phase transitions from satisfiable to unsatisfiable.Recent algorithmic approaches rely on connectivity properties of the space and break down in the absence of connectivity.In joint work with Kolaitis,Maneva and Papadimitriou,we consider the problem:Given a Boolean formula,do its solutions form a connected subset of the hypercube?We classify the worst-case complexity of various connectivity properties of the solution space of SAT formulae in Schaefer’s framework[14].We show that the jump in the computational hardness is accompanied by a jump in the diameter of the solution space from linear to exponential.Complexity of Modular Counting Problems:In joint work with Guruswami and Lipton,we address the complexity of counting the roots of a multivariate polynomial over afinitefield F q modulo some number r[9].We establish a dichotomy showing that the problem is easy when r is a power of the characteristic of thefield and intractable otherwise.Our results give several examples of problems whose decision versions are easy,but the modular counting version is hard.6Future Research DirectionsMy broad research goal is to gain a complete understanding of the complexity of problems arising in coding theory,computational learning and related areas;I believe that the right tools for this will come from Boolean function complexity and hardness of approximation.Below I outline some of the research directions I would like to pursue in the future.List-decoding algorithms have allowed us to break the unique-decoding barrier for error-correcting codes.It is natural to ask if one can perhaps go beyond the list-decoding radius and solve the problem offinding the codeword nearest to a received word at even higher error rates. On the negative side,we do not currently know any examples of codes where one can do this.But I think that recent results on Reed-Muller codes do offer some hope[13,21].Algorithms for solving the nearest codeword problem if they exist,could also have exciting implications in computational learning.There are concept classes which are well-approximated by low-degree polynomials over finitefields lying just beyond the threshold of what is currently known to be learnable efficiently [20,22].Decoding algorithms for Reed-Muller codes that can tolerate very high error rates might present an approach to learning such concept classes.One of the challenges in algorithmic coding theory is to determine whether known algorithms for list-decoding Reed-Solomon codes[15]and Reed-Muller codes[13,23]are optimal.This raises both computational and combinatorial questions.I believe that my work with Khot et al.rep-resents a goodfirst step towards understanding the complexity of the decoding/reconstruction problem for multivariate polynomials.Proving similar results for univariate polynomials is an excellent challenge which seems to require new ideas in hardness of approximation.There is a large body of work proving strong NP-hardness results for problems in computa-tional learning.However,all such results only address the proper learning scenario where the learning algorithm is restricted to produce a hypothesis from some particular class H which is typically the same as the concept class C.In contrast,known learning algorithms are mostly im-proper algorithms which could use more complicated hypotheses.For hardness results that are independent of the hypothesis H used by the algorithm,one currently has to resort to crypto-graphic assumptions.In ongoing work with Guruswami and Raghavendra,we are investigating the possibility of proving NP-hardness for improper learning.Finally,I believe that there are several interesting directions to explore in the agnostic learn-ing model.An exciting insight in this area comes from the work of Kalai et al.who show that 1regression is a powerful tool for noise-tolerant learning[18].A powerful paradigm in com-putational learning is to prove that the concept has some kind of polynomial approximation and then recover the approximation.Algorithms based on 1regression require a weaker polynomial approximation in comparison with previous algorithms(which use 2regression),but use more powerful machinery for the recovery step.Similar ideas might allow us to extend the boundaries of efficient learning even in the noiseless model;this is a possibility I am currently exploring.Having worked in areas ranging from data stream algorithms to Boolean function complexity, I view myself as both an algorithm designer and a complexity theorist.I have often found that working on one aspect of a problem gives insights into the other;indeed much of my work has originated from such insights([12]and[13],[10]and[4],[6]and[7]).Ifind that this is increasingly the case across several areas in theoretical computer science.My aim is to maintain this balance between upper and lower bounds in my future work.References[1]D.A.Barrington,R.Beigel,and S.Rudich.Representing Boolean functions as polynomialsmodulo composite putational Complexity,4:367–382,1994.[2]N.Bhatnagar,P.Gopalan,and R.J.Lipton.Symmetric polynomials over Z m and simultane-ous communication protocols.Journal of Computer&System Sciences(special issue for FOCS’03), 72(2):450–459,2003.[3]V.Feldman,P.Gopalan,S.Khot,and A.K.Ponnuswami.New results for learning noisyparities and halfspaces.In Proc.47th IEEE Symp.on Foundations of Computer Science(FOCS’06), 2006.[4]A.G´a l and P.Gopalan.Lower bounds on streaming algorithms for approximating the lengthof the longest increasing subsequence.In Proc.48th IEEE Symp.on Foundations of Computer Science(FOCS’07),2007.[5]O.Goldreich and L.Levin.A hard-core predicate for all one-way functions.In Proc.21st ACMSymposium on the Theory of Computing(STOC’89),pages25–32,1989.[6]P.Gopalan.Constructing Ramsey graphs from Boolean function representations.In Proc.21stIEEE symposium on Computational Complexity(CCC’06),2006.[7]P.Gopalan.Query-efficient algorithms for polynomial interpolation over composites.In Proc.17th ACM-SIAM symposium on Discrete algorithms(SODA’06),2006.[8]P.Gopalan and V.Guruswami.Deterministic hardness amplification via local GMD decod-ing.Submitted to23rd IEEE Symp.on Computational Complexity(CCC’08),2008.[9]P.Gopalan,V.Guruswami,and R.J.Lipton.Algorithms for modular counting of roots of mul-tivariate polynomials.In tin American Symposium on Theoretical Informatics(LATIN’06), 2006.[10]P.Gopalan,T.S.Jayram,R.Krauthgamer,and R.Kumar.Estimating the sortedness of a datastream.In Proc.18th ACM-SIAM Symposium on Discrete Algorithms(SODA’07),2007.[11]P.Gopalan,A.T.Kalai,and A.R.Klivans.Agnostically learning decision trees.In Proc.40thACM Symp.on Theory of Computing(STOC’08),2008.[12]P.Gopalan,S.Khot,and R.Saket.Hardness of reconstructing multivariate polynomials overfinitefields.In Proc.48th IEEE Symp.on Foundations of Computer Science(FOCS’07),2007. [13]P.Gopalan,A.R.Klivans,and D.Zuckerman.List-decoding Reed-Muller codes over smallfields.In Proc.40th ACM Symp.on Theory of Computing(STOC’08),2008.[14]P.Gopalan,P.G.Kolaitis,E.N.Maneva,and puting the connec-tivity properties of the satisfiability solution space.In Proc.33rd Intl.Colloqium on Automata, Languages and Programming(ICALP’06),2006.[15]V.Guruswami and M.Sudan.Improved decoding of Reed-Solomon and Algebraic-Geometric codes.IEEE Transactions on Information Theory,45(6):1757–1767,1999.[16]J.H˚a stad.Some optimal inapproximability results.J.ACM,48(4):798–859,2001.[17]J.Jackson.An efficient membership-query algorithm for learning DNF with respect to theuniform distribution.Journal of Computer and System Sciences,55:414–440,1997.[18]A.T.Kalai,A.R.Klivans,Y.Mansour,and R.A.Servedio.Agnostically learning halfspaces.In Proc.46th IEEE Symp.on Foundations of Computer Science,pages11–20,2005.[19]E.Kushilevitz and Y.Mansour.Learning decision trees using the Fourier spectrum.SIAMJournal of Computing,22(6):1331–1348,1993.[20]A.Razborov.Lower bounds for the size of circuits of bounded depth with basis{∧,⊕}.Mathematical Notes of the Academy of Science of the USSR,(41):333–338,1987.[21]A.Samorodnitsky.Low-degree tests at large distances.In Proc.39th ACM Symposium on theTheory of Computing(STOC’07),pages506–515,2007.[22]R.Smolensky.Algebraic methods in the theory of lower bounds for Boolean circuit com-plexity.Proc.19th Annual ACM Symposium on Theoretical Computer Science,(STOC’87),pages 77–82,1987.[23]M.Sudan,L.Trevisan,and S.P.Vadhan.Pseudorandom generators without the XOR lemma.put.Syst.Sci.,62(2):236–266,2001.[24]L.Trevisan.List-decoding using the XOR lemma.In Proc.44th IEEE Symposium on Foundationsof Computer Science(FOCS’03),pages126–135,2003.。
2024年华为人工智能方向HCIA考试复习题库(含答案)
2024年华为人工智能方向HCIA考试复习题库(含答案)一、单选题1.以下哪—项不属于MindSpore全场景部署和协同的关键特性?A、统一模型R带来一致性的部署体验。
B、端云协同FederalMetaLearning打破端云界限,多设备协同模型。
C、数据+计算整图到Ascend芯片。
D、软硬协同的图优化技术屏蔽场景差异。
参考答案:C2.在对抗生成网络当中,带有标签的数据应该被放在哪里?A、作为生成模型的输出值B、作为判别模型的输入值C、作为判别模型的输出值D、作为生成模型的输入值参考答案:B3.下列属性中TensorFlow2.0不支持创建tensor的方法是?A、zerosB、fillC、createD、constant参考答案:C4.以下哪一项是HiAI3.0相对于2.0提升的特点?A、单设备B、分布式C、多设备D、端云协同参考答案:B5.以下哪个不是MindSpore中Tensor常见的操作?A、asnumpy()B、dim()C、for()D、size()参考答案:C6.优化器是训练神经网络的重要组成部分,使用优化器的目的不包含以下哪项:A、加快算法收敛速度B、减少手工参数的设置难度C、避过过拟合问题D、避过局部极值参考答案:C7.K折交叉验证是指将测试数据集划分成K个子数据集。
A、TRUEB、FALSE参考答案:B8.机器学习是深度学习的一部分。
人工智能也是深度学习的一部分。
A、TrueB、False参考答案:B9.在神经网络中,我们是通过以下哪个方法在训练网络的时候更新参数,从而最小化损失函数的?A、正向传播算法B、池化计算C、卷积计算D、反向传播算法参考答案:D10.以下不属于TensorFlow2.0的特点是?A、多核CPU加速B、分布式C、多语言D、多平台参考答案:A11.以下关于机器学习中分类模型与回归模型的说法,哪一项说法是正确的?A、对回归问题和分类问题的评价,最常用的指标都是准确率和召回率B、输出变量为有限个离散变量的预测问题是回归问题,输出变量为连续变量的预测问题是分类问题C、回归问题知分类问题都有可能发生过拟合D、逻辑回归是一种典型的回归模型参考答案:C12.ModelArts平台中的数据管理中不支持视频数据格式。
Microeconometrics using stata
Microeconometrics Using StataContentsList of tables xxxv List of figures xxxvii Preface xxxix 1Stata basics1............................................................................................1.1Interactive use 1..............................................................................................1.2 Documentation 2..........................................................................1.2.1Stata manuals 2...........................................................1.2.2Additional Stata resources 3.......................................................................1.2.3The help command 3................................1.2.4The search, findit, and hsearch commands 41.3 Command syntax and operators 5...................................................................................................................................1.3.1Basic command syntax 5................................................1.3.2 Example: The summarize command 61.3.3Example: The regress command 7..............................................................................1.3.4Abbreviations, case sensitivity, and wildcards 9................................1.3.5Arithmetic, relational, and logical operators 9.........................................................................1.3.6Error messages 10........................................................................................1.4 Do-files and log files 10.............................................................................1.4.1Writing a do-file 101.4.2Running do-files 11.........................................................................................................................................................................1.4.3Log files 12..................................................................1.4.4 A three-step process 131.4.5Comments and long lines 13......................................................................................................1.4.6Different implementations of Stata 141.5Scalars and matrices (15)1.5.1Scalars (15)1.5.2Matrices (15)1.6 Using results from Stata commands (16)1.6.1Using results from the r-class command summarize (16)1.6.2Using results from the e-class command regress (17)1.7 Global and local macros (19)1.7.1Global macros (19)1.7.2Local macros (20)1.7.3Scalar or macro? (21)1.8 Looping commands (22)1.8.1The foreach loop (23)1.8.2The forvalues loop (23)1.8.3The while loop (24)1.8.4The continue command (24)1.9 Some useful commands (24)1.10 Template do-file (25)1.11 User-written commands (25)1.12 Stata resources (26)1.13 Exercises (26)2 Data management and graphics292.1Introduction (29)2.2 Types of data (29)2.2.1Text or ASCII data (30)2.2.2Internal numeric data (30)2.2.3String data (31)2.2.4Formats for displaying numeric data (31)2.3Inputting data (32)2.3.1General principles (32)2.3.2Inputting data already in Stata format (33)2.3.3Inputting data from the keyboard (34)2.3.4Inputting nontext data (34)2.3.5Inputting text data from a spreadsheet (35)2.3.6Inputting text data in free format (36)2.3.7Inputting text data in fixed format (36)2.3.8Dictionary files (37)2.3.9Common pitfalls (37)2.4 Data management (38)2.4.1PSID example (38)2.4.2Naming and labeling variables (41)2.4.3Viewing data (42)2.4.4Using original documentation (43)2.4.5Missing values (43)2.4.6Imputing missing data (45)2.4.7Transforming data (generate, replace, egen, recode) (45)The generate and replace commands (46)The egen command (46)The recode command (47)The by prefix (47)Indicator variables (47)Set of indicator variables (48)Interactions (49)Demeaning (50)2.4.8Saving data (51)2.4.9Selecting the sample (51)2.5 Manipulating datasets (53)2.5.1Ordering observations and variables (53)2.5.2Preserving and restoring a dataset (53)2.5.3Wide and long forms for a dataset (54)2.5.4Merging datasets (54)2.5.5Appending datasets (56)2.6 Graphical display of data (57)2.6.1Stata graph commands (57)Example graph commands (57)Saving and exporting graphs (58)Learning how to use graph commands (59)2.6.2Box-and-whisker plot (60)2.6.3Histogram (61)2.6.4Kernel density plot (62)2.6.5Twoway scatterplots and fitted lines (64)2.6.6Lowess, kernel, local linear, and nearest-neighbor regression652.6.7Multiple scatterplots (67)2.7 Stata resources (68)2.8Exercises (68)3Linear regression basics713.1Introduction (71)3.2 Data and data summary (71)3.2.1Data description (71)3.2.2Variable description (72)3.2.3Summary statistics (73)3.2.4More-detailed summary statistics (74)3.2.5Tables for data (75)3.2.6Statistical tests (78)3.2.7Data plots (78)3.3Regression in levels and logs (79)3.3.1Basic regression theory (79)3.3.2OLS regression and matrix algebra (80)3.3.3Properties of the OLS estimator (81)3.3.4Heteroskedasticity-robust standard errors (82)3.3.5Cluster–robust standard errors (82)3.3.6Regression in logs (83)3.4Basic regression analysis (84)3.4.1Correlations (84)3.4.2The regress command (85)3.4.3Hypothesis tests (86)3.4.4Tables of output from several regressions (87)3.4.5Even better tables of regression output (88)3.5Specification analysis (90)3.5.1Specification tests and model diagnostics (90)3.5.2Residual diagnostic plots (91)3.5.3Influential observations (92)3.5.4Specification tests (93)Test of omitted variables (93)Test of the Box–Cox model (94)Test of the functional form of the conditional mean (95)Heteroskedasticity test (96)Omnibus test (97)3.5.5Tests have power in more than one direction (98)3.6Prediction (100)3.6.1In-sample prediction (100)3.6.2Marginal effects (102)3.6.3Prediction in logs: The retransformation problem (103)3.6.4Prediction exercise (104)3.7 Sampling weights (105)3.7.1Weights (106)3.7.2Weighted mean (106)3.7.3Weighted regression (107)3.7.4Weighted prediction and MEs (109)3.8 OLS using Mata (109)3.9Stata resources (111)3.10 Exercises (111)4Simulation1134.1Introduction (113)4.2 Pseudorandom-number generators: Introduction (114)4.2.1Uniform random-number generation (114)4.2.2Draws from normal (116)4.2.3Draws from t, chi-squared, F, gamma, and beta (117)4.2.4 Draws from binomial, Poisson, and negative binomial . . . (118)Independent (but not identically distributed) draws frombinomial (118)Independent (but not identically distributed) draws fromPoisson (119)Histograms and density plots (120)4.3 Distribution of the sample mean (121)4.3.1Stata program (122)4.3.2The simulate command (123)4.3.3Central limit theorem simulation (123)4.3.4The postfile command (124)4.3.5Alternative central limit theorem simulation (125)4.4 Pseudorandom-number generators: Further details (125)4.4.1Inverse-probability transformation (126)4.4.2Direct transformation (127)4.4.3Other methods (127)4.4.4Draws from truncated normal (128)4.4.5Draws from multivariate normal (129)Direct draws from multivariate normal (129)Transformation using Cholesky decomposition (130)4.4.6Draws using Markov chain Monte Carlo method (130)4.5 Computing integrals (132)4.5.1Quadrature (133)4.5.2Monte Carlo integration (133)4.5.3Monte Carlo integration using different S (134)4.6Simulation for regression: Introduction (135)4.6.1Simulation example: OLS with X2 errors (135)4.6.2Interpreting simulation output (138)Unbiasedness of estimator (138)Standard errors (138)t statistic (138)Test size (139)Number of simulations (140)4.6.3Variations (140)Different sample size and number of simulations (140)Test power (140)Different error distributions (141)4.6.4Estimator inconsistency (141)4.6.5Simulation with endogenous regressors (142)4.7Stata resources (144)4.8Exercises (144)5GLS regression1475.1Introduction (147)5.2 GLS and FGLS regression (147)5.2.1GLS for heteroskedastic errors (147)5.2.2GLS and FGLS (148)5.2.3Weighted least squares and robust standard errors (149)5.2.4Leading examples (149)5.3 Modeling heteroskedastic data (150)5.3.1Simulated dataset (150)5.3.2OLS estimation (151)5.3.3Detecting heteroskedasticity (152)5.3.4FGLS estimation (154)5.3.5WLS estimation (156)5.4System of linear regressions (156)5.4.1SUR model (156)5.4.2The sureg command (157)5.4.3Application to two categories of expenditures (158)5.4.4Robust standard errors (160)5.4.5Testing cross-equation constraints (161)5.4.6Imposing cross-equation constraints (162)5.5Survey data: Weighting, clustering, and stratification (163)5.5.1Survey design (164)5.5.2Survey mean estimation (167)5.5.3Survey linear regression (167)5.6Stata resources (169)5.7Exercises (169)6Linear instrumental-variables regression1716.1Introduction (171)6.2 IV estimation (171)6.2.1Basic IV theory (171)6.2.2Model setup (173)6.2.3IV estimators: IV, 2SLS, and GMM (174)6.2.4Instrument validity and relevance (175)6.2.5Robust standard-error estimates (176)6.3 IV example (177)6.3.1The ivregress command (177)6.3.2Medical expenditures with one endogenous regressor . . . (178)6.3.3Available instruments (179)6.3.4IV estimation of an exactly identified model (180)6.3.5IV estimation of an overidentified model (181)6.3.6Testing for regressor endogeneity (182)6.3.7Tests of overidentifying restrictions (185)6.3.8IV estimation with a binary endogenous regressor (186)6.4 Weak instruments (188)6.4.1Finite-sample properties of IV estimators (188)6.4.2Weak instruments (189)Diagnostics for weak instruments (189)Formal tests for weak instruments (190)6.4.3The estat firststage command (191)6.4.4Just-identified model (191)6.4.5Overidentified model (193)6.4.6More than one endogenous regressor (195)6.4.7Sensitivity to choice of instruments (195)6.5 Better inference with weak instruments (197)6.5.1Conditional tests and confidence intervals (197)6.5.2LIML estimator (199)6.5.3Jackknife IV estimator (199)6.5.4 Comparison of 2SLS, LIML, JIVE, and GMM (200)6.6 3SLS systems estimation (201)6.7Stata resources (203)6.8Exercises (203)7Quantile regression2057.1Introduction (205)7.2 QR (205)7.2.1Conditional quantiles (206)7.2.2Computation of QR estimates and standard errors (207)7.2.3The qreg, bsqreg, and sqreg commands (207)7.3 QR for medical expenditures data (208)7.3.1Data summary (208)7.3.2QR estimates (209)7.3.3Interpretation of conditional quantile coefficients (210)7.3.4Retransformation (211)7.3.5Comparison of estimates at different quantiles (212)7.3.6Heteroskedasticity test (213)7.3.7Hypothesis tests (214)7.3.8Graphical display of coefficients over quantiles (215)7.4 QR for generated heteroskedastic data (216)7.4.1Simulated dataset (216)7.4.2QR estimates (219)7.5 QR for count data (220)7.5.1Quantile count regression (221)7.5.2The qcount command (222)7.5.3Summary of doctor visits data (222)7.5.4Results from QCR (224)7.6Stata resources (226)7.7Exercises (226)8Linear panel-data models: Basics2298.1Introduction (229)8.2 Panel-data methods overview (229)8.2.1Some basic considerations (230)8.2.2Some basic panel models (231)Individual-effects model (231)Fixed-effects model (231)Random-effects model (232)Pooled model or population-averaged model (232)Two-way-effects model (232)Mixed linear models (233)8.2.3Cluster-robust inference (233)8.2.4The xtreg command (233)8.2.5Stata linear panel-data commands (234)8.3 Panel-data summary (234)8.3.1Data description and summary statistics (234)8.3.2Panel-data organization (236)8.3.3Panel-data description (237)8.3.4Within and between variation (238)8.3.5Time-series plots for each individual (241)8.3.6Overall scatterplot (242)8.3.7Within scatterplot (243)8.3.8Pooled OLS regression with cluster—robust standard errors ..2448.3.9Time-series autocorrelations for panel data (245)8.3.10 Error correlation in the RE model (247)8.4 Pooled or population-averaged estimators (248)8.4.1Pooled OLS estimator (248)8.4.2Pooled FGLS estimator or population-averaged estimator (248)8.4.3The xtreg, pa command (249)8.4.4Application of the xtreg, pa command (250)8.5 Within estimator (251)8.5.1Within estimator (251)8.5.2The xtreg, fe command (251)8.5.3Application of the xtreg, fe command (252)8.5.4Least-squares dummy-variables regression (253)8.6 Between estimator (254)8.6.1Between estimator (254)8.6.2Application of the xtreg, be command (255)8.7 RE estimator (255)8.7.1RE estimator (255)8.7.2The xtreg, re command (256)8.7.3Application of the xtreg, re command (256)8.8 Comparison of estimators (257)8.8.1Estimates of variance components (257)8.8.2Within and between R-squared (258)8.8.3Estimator comparison (258)8.8.4Fixed effects versus random effects (259)8.8.5Hausman test for fixed effects (260)The hausman command (260)Robust Hausman test (261)8.8.6Prediction (262)8.9 First-difference estimator (263)8.9.1First-difference estimator (263)8.9.2Strict and weak exogeneity (264)8.10 Long panels (265)8.10.1 Long-panel dataset (265)8.10.2 Pooled OLS and PFGLS (266)8.10.3 The xtpcse and xtgls commands (267)8.10.4 Application of the xtgls, xtpcse, and xtscc commands . . . (268)8.10.5 Separate regressions (270)8.10.6 FE and RE models (271)8.10.7 Unit roots and cointegration (272)8.11 Panel-data management (274)8.11.1 Wide-form data (274)8.11.2 Convert wide form to long form (274)8.11.3 Convert long form to wide form (275)8.11.4 An alternative wide-form data (276)8.12 Stata resources (278)8.13 Exercises (278)9Linear panel-data models: Extensions2819.1Introduction (281)9.2 Panel IV estimation (281)9.2.1Panel IV (281)9.2.2The xtivreg command (282)9.2.3Application of the xtivreg command (282)9.2.4Panel IV extensions (284)9.3 Hausman-Taylor estimator (284)9.3.1Hausman-Taylor estimator (284)9.3.2The xthtaylor command (285)9.3.3Application of the xthtaylor command (285)9.4 Arellano-Bond estimator (287)9.4.1Dynamic model (287)9.4.2IV estimation in the FD model (288)9.4.3 The xtabond command (289)9.4.4Arellano-Bond estimator: Pure time series (290)9.4.5Arellano-Bond estimator: Additional regressors (292)9.4.6Specification tests (294)9.4.7 The xtdpdsys command (295)9.4.8 The xtdpd command (297)9.5 Mixed linear models (298)9.5.1Mixed linear model (298)9.5.2 The xtmixed command (299)9.5.3Random-intercept model (300)9.5.4Cluster-robust standard errors (301)9.5.5Random-slopes model (302)9.5.6Random-coefficients model (303)9.5.7Two-way random-effects model (304)9.6 Clustered data (306)9.6.1Clustered dataset (306)9.6.2Clustered data using nonpanel commands (306)9.6.3Clustered data using panel commands (307)9.6.4Hierarchical linear models (310)9.7Stata resources (311)9.8Exercises (311)10 Nonlinear regression methods31310.1 Introduction (313)10.2 Nonlinear example: Doctor visits (314)10.2.1 Data description (314)10.2.2 Poisson model description (315)10.3 Nonlinear regression methods (316)10.3.1 MLE (316)10.3.2 The poisson command (317)10.3.3 Postestimation commands (318)10.3.4 NLS (319)10.3.5 The nl command (319)10.3.6 GLM (321)10.3.7 The glm command (321)10.3.8 Other estimators (322)10.4 Different estimates of the VCE (323)10.4.1 General framework (323)10.4.2 The vce() option (324)10.4.3 Application of the vce() option (324)10.4.4 Default estimate of the VCE (326)10.4.5 Robust estimate of the VCE (326)10.4.6 Cluster–robust estimate of the VCE (327)10.4.7 Heteroskedasticity- and autocorrelation-consistent estimateof the VCE (328)10.4.8 Bootstrap standard errors (328)10.4.9 Statistical inference (329)10.5 Prediction (329)10.5.1 The predict and predictnl commands (329)10.5.2 Application of predict and predictnl (330)10.5.3 Out-of-sample prediction (331)10.5.4 Prediction at a specified value of one of the regressors (321)10.5.5 Prediction at a specified value of all the regressors (332)10.5.6 Prediction of other quantities (333)10.6 Marginal effects (333)10.6.1 Calculus and finite-difference methods (334)10.6.2 MEs estimates AME, MEM, and MER (334)10.6.3 Elasticities and semielasticities (335)10.6.4 Simple interpretations of coefficients in single-index models (336)10.6.5 The mfx command (337)10.6.6 MEM: Marginal effect at mean (337)Comparison of calculus and finite-difference methods . . . (338)10.6.7 MER: Marginal effect at representative value (338)10.6.8 AME: Average marginal effect (339)10.6.9 Elasticities and semielasticities (340)10.6.10 AME computed manually (342)10.6.11 Polynomial regressors (343)10.6.12 Interacted regressors (344)10.6.13 Complex interactions and nonlinearities (344)10.7 Model diagnostics (345)10.7.1 Goodness-of-fit measures (345)10.7.2 Information criteria for model comparison (346)10.7.3 Residuals (347)10.7.4 Model-specification tests (348)10.8 Stata resources (349)10.9 Exercises (349)11 Nonlinear optimization methods35111.1 Introduction (351)11.2 Newton–Raphson method (351)11.2.1 NR method (351)11.2.2 NR method for Poisson (352)11.2.3 Poisson NR example using Mata (353)Core Mata code for Poisson NR iterations (353)Complete Stata and Mata code for Poisson NR iterations (353)11.3 Gradient methods (355)11.3.1 Maximization options (355)11.3.2 Gradient methods (356)11.3.3 Messages during iterations (357)11.3.4 Stopping criteria (357)11.3.5 Multiple maximums (357)11.3.6 Numerical derivatives (358)11.4 The ml command: if method (359)11.4.1 The ml command (360)11.4.2 The If method (360)11.4.3 Poisson example: Single-index model (361)11.4.4 Negative binomial example: Two-index model (362)11.4.5 NLS example: Nonlikelihood model (363)11.5 Checking the program (364)11.5.1 Program debugging using ml check and ml trace (365)11.5.2 Getting the program to run (366)11.5.3 Checking the data (366)11.5.4 Multicollinearity and near coilinearity (367)11.5.5 Multiple optimums (368)11.5.6 Checking parameter estimation (369)11.5.7 Checking standard-error estimation (370)11.6 The ml command: d0, dl, and d2 methods (371)11.6.1 Evaluator functions (371)11.6.2 The d0 method (373)11.6.3 The dl method (374)11.6.4 The dl method with the robust estimate of the VCE (374)11.6.5 The d2 method (375)11.7 The Mata optimize() function (376)11.7.1 Type d and v evaluators (376)11.7.2 Optimize functions (377)11.7.3 Poisson example (377)Evaluator program for Poisson MLE (377)The optimize() function for Poisson MLE (378)11.8 Generalized method of moments (379)11.8.1 Definition (380)11.8.2 Nonlinear IV example (380)11.8.3 GMM using the Mata optimize() function (381)11.9 Stata resources (383)11.10 Exercises (383)12 Testing methods38512.1 Introduction (385)12.2 Critical values and p-values (385)12.2.1 Standard normal compared with Student's t (386)12.2.2 Chi-squared compared with F (386)12.2.3 Plotting densities (386)12.2.4 Computing p-values and critical values (388)12.2.5 Which distributions does Stata use? (389)12.3 Wald tests and confidence intervals (389)12.3.1 Wald test of linear hypotheses (389)12.3.2 The test command (391)Test single coefficient (392)Test several hypotheses (392)Test of overall significance (393)Test calculated from retrieved coefficients and VCE (393)12.3.3 One-sided Wald tests (394)12.3.4 Wald test of nonlinear hypotheses (delta method) (395)12.3.5 The testnl command (395)12.3.6 Wald confidence intervals (396)12.3.7 The lincom command (396)12.3.8 The nlcom command (delta method) (397)12.3.9 Asymmetric confidence intervals (398)12.4 Likelihood-ratio tests (399)12.4.1 Likelihood-ratio tests (399)12.4.2 The lrtest command (401)12.4.3 Direct computation of LR tests (401)12.5 Lagrange multiplier test (or score test) (402)12.5.1 LM tests (402)12.5.2 The estat command (403)12.5.3 LM test by auxiliary regression (403)12.6 Test size and power (405)12.6.1 Simulation DGP: OLS with chi-squared errors (405)12.6.2 Test size (406)12.6.3 Test power (407)12.6.4 Asymptotic test power (410)12.7 Specification tests (411)12.7.1 Moment-based tests (411)12.7.2 Information matrix test (411)12.7.3 Chi-squared goodness-of-fit test (412)12.7.4 Overidentifying restrictions test (412)12.7.5 Hausman test (412)12.7.6 Other tests (413)12.8 Stata resources (413)12.9 Exercises (413)13 Bootstrap methods41513.1 Introduction (415)13.2 Bootstrap methods (415)13.2.1 Bootstrap estimate of standard error (415)13.2.2 Bootstrap methods (416)13.2.3 Asymptotic refinement (416)13.2.4 Use the bootstrap with caution (416)13.3 Bootstrap pairs using the vce(bootstrap) option (417)13.3.1 Bootstrap-pairs method to estimate VCE (417)13.3.2 The vce(bootstrap) option (418)13.3.3 Bootstrap standard-errors example (418)13.3.4 How many bootstraps? (419)13.3.5 Clustered bootstraps (420)13.3.6 Bootstrap confidence intervals (421)13.3.7 The postestimation estat bootstrap command (422)13.3.8 Bootstrap confidence-intervals example (423)13.3.9 Bootstrap estimate of bias (423)13.4 Bootstrap pairs using the bootstrap command (424)13.4.1 The bootstrap command (424)13.4.2 Bootstrap parameter estimate from a Stata estimationcommand (425)13.4.3 Bootstrap standard error from a Stata estimation command (426)13.4.4 Bootstrap standard error from a user-written estimationcommand (426)13.4.5 Bootstrap two-step estimator (427)13.4.6 Bootstrap Hausman test (429)13.4.7 Bootstrap standard error of the coefficient of variation . . (430)13.5 Bootstraps with asymptotic refinement (431)13.5.1 Percentile-t method (431)13.5.2 Percentile-t Wald test (432)13.5.3 Percentile-t Wald confidence interval (433)13.6 Bootstrap pairs using bsample and simulate (434)13.6.1 The bsample command (434)13.6.2 The bsample command with simulate (434)13.6.3 Bootstrap Monte Carlo exercise (436)13.7 Alternative resampling schemes (436)13.7.1 Bootstrap pairs (437)13.7.2 Parametric bootstrap (437)13.7.3 Residual bootstrap (439)13.7.4 Wild bootstrap (440)13.7.5 Subsampling (441)13.8 The jackknife (441)13.8.1 Jackknife method (441)13.8.2 The vice(jackknife) option and the jackknife command . . (442)13.9 Stata resources (442)13.10 Exercises (442)14 Binary outcome models44514.1 Introduction (445)14.2 Some parametric models (445)14.2.1 Basic model (445)14.2.2 Logit, probit, linear probability, and clog-log models . . . (446)14.3 Estimation (446)14.3.1 Latent-variable interpretation and identification (447)14.3.2 ML estimation (447)14.3.3 The logit and probit commands (448)14.3.4 Robust estimate of the VCE (448)14.3.5 OLS estimation of LPM (448)14.4 Example (449)14.4.1 Data description (449)14.4.2 Logit regression (450)14.4.3 Comparison of binary models and parameter estimates . (451)14.5 Hypothesis and specification tests (452)14.5.1 Wald tests (453)14.5.2 Likelihood-ratio tests (453)14.5.3 Additional model-specification tests (454)Lagrange multiplier test of generalized logit (454)Heteroskedastic probit regression (455)14.5.4 Model comparison (456)14.6 Goodness of fit and prediction (457)14.6.1 Pseudo-R2 measure (457)14.6.2 Comparing predicted probabilities with sample frequencies (457)14.6.3 Comparing predicted outcomes with actual outcomes . . . (459)14.6.4 The predict command for fitted probabilities (460)14.6.5 The prvalue command for fitted probabilities (461)14.7 Marginal effects (462)14.7.1 Marginal effect at a representative value (MER) (462)14.7.2 Marginal effect at the mean (MEM) (463)14.7.3 Average marginal effect (AME) (464)14.7.4 The prchange command (464)14.8 Endogenous regressors (465)14.8.1 Example (465)14.8.2 Model assumptions (466)14.8.3 Structural-model approach (467)The ivprobit command (467)Maximum likelihood estimates (468)Two-step sequential estimates (469)14.8.4 IVs approach (471)14.9 Grouped data (472)14.9.1 Estimation with aggregate data (473)14.9.2 Grouped-data application (473)14.10 Stata resources (475)14.11 Exercises (475)15 Multinomial models47715.1 Introduction (477)15.2 Multinomial models overview (477)15.2.1 Probabilities and MEs (477)15.2.2 Maximum likelihood estimation (478)15.2.3 Case-specific and alternative-specific regressors (479)15.2.4 Additive random-utility model (479)15.2.5 Stata multinomial model commands (480)15.3 Multinomial example: Choice of fishing mode (480)15.3.1 Data description (480)15.3.2 Case-specific regressors (483)15.3.3 Alternative-specific regressors (483)15.4 Multinomial logit model (484)15.4.1 The mlogit command (484)15.4.2 Application of the mlogit command (485)15.4.3 Coefficient interpretation (486)15.4.4 Predicted probabilities (487)15.4.5 MEs (488)15.5 Conditional logit model (489)15.5.1 Creating long-form data from wide-form data (489)15.5.2 The asclogit command (491)15.5.3 The clogit command (491)15.5.4 Application of the asclogit command (492)15.5.5 Relationship to multinomial logit model (493)15.5.6 Coefficient interpretation (493)15.5.7 Predicted probabilities (494)15.5.8 MEs (494)15.6 Nested logit model (496)15.6.1 Relaxing the independence of irrelevant alternatives as-sumption (497)15.6.2 NL model (497)15.6.3 The nlogit command (498)15.6.4 Model estimates (499)15.6.5 Predicted probabilities (501)15.6.6 MEs (501)15.6.7 Comparison of logit models (502)15.7 Multinomial probit model (503)15.7.1 MNP (503)15.7.2 The mprobit command (503)15.7.3 Maximum simulated likelihood (504)15.7.4 The asmprobit command (505)15.7.5 Application of the asmprobit command (505)15.7.6 Predicted probabilities and MEs (507)15.8 Random-parameters logit (508)15.8.1 Random-parameters logit (508)15.8.2 The mixlogit command (508)15.8.3 Data preparation for mixlogit (509)15.8.4 Application of the mixlogit command (509)15.9 Ordered outcome models (510)15.9.1 Data summary (511)15.9.2 Ordered outcomes (512)15.9.3 Application of the ologit command (512)15.9.4 Predicted probabilities (513)15.9.5 MEs (513)15.9.6 Other ordered models (514)15.10 Multivariate outcomes (514)15.10.1 Bivariate probit (515)15.10.2 Nonlinear SUR (517)15.11 Stata resources (518)15.12 Exercises (518)16 Tobit and selection models52116.1 Introduction (521)16.2 Tobit model (521)16.2.1 Regression with censored data (521)16.2.2 Tobit model setup (522)16.2.3 Unknown censoring point (523)。
隐枚举法matlab_概述及解释说明
隐枚举法matlab 概述及解释说明1. 引言1.1 概述隐枚举法是一种常见的优化算法,通过穷举搜索解空间中的所有可能解来找到最优解。
它在许多领域都有广泛的应用,包括工程、经济和科学等领域。
而MATLAB 作为一款功能强大且易于使用的数值计算软件,在隐枚举法中发挥着重要作用。
1.2 文章结构本文将从以下几个方面对隐枚举法MATLAB进行概述和说明。
首先,我们将介绍隐枚举法的基本概念和原理,包括其定义、背景知识以及在MATLAB中的应用。
然后,我们将详细阐述隐枚举法在MATLAB中的实际实现步骤和技巧,并给出具体的应用示例和结果分析。
接着,我们将探讨隐枚举法MATLAB在工程领域中推广与应用前景,并分析可能遇到的挑战及对策。
最后,我们将总结全文并展望未来关于隐枚举法MATLAB研究方向。
1.3 目的本文旨在深入了解并阐述隐枚举法在MATLAB中的应用,为读者提供关于该方法的全面概述和说明。
通过本文的阅读,读者可以了解隐枚举法在MATLAB中的基本原理,并学习如何使用MATLAB实现和应用隐枚举法解决问题。
同时,我们也希望能够探讨隐枚举法MATLAB在不同领域中的潜在应用前景,并提出可能遇到的挑战及对策,为相关研究和工程实践提供参考依据。
2. 隐枚举法MATLAB的基本概念和原理2.1 隐枚举法的定义和背景知识隐枚举法(Implicit Enumeration Method)是一种求解优化问题的方法,它通过将问题转化为隐枚举形式来寻找最优解。
在隐枚举法中,将变量的可行解空间分成若干个小区域,并通过遍历这些小区域来搜索最优解。
隐枚举法背后的基本思想是穷举搜索,通过在搜索过程中逐渐缩小搜索范围,以获得最优解。
这种方法通常适用于离散问题或较小规模的连续问题,在一些实际工程和科学领域中得到了广泛应用。
2.2 MATLAB在隐枚举法中的应用MATLAB作为一种功能强大的数值计算平台,提供了许多有助于实现和应用隐枚举法的工具和函数。
distributed coordination of multi-agent systems with quantizaed-observer based encoding-decoding
Distributed Coordination of Multi-Agent Systems With Quantized-Observer Based Encoding-DecodingTao Li,Member,IEEE,and Lihua Xie,Fellow,IEEEAbstract—Integrative design of communication mechanism and coordinated control law is an interesting and important problem for multi-agent networks.In this paper,we consider distributed coordination of discrete-time second-order multi-agent systems with partially measurable state and a limited communication data rate.A quantized-observer based encoding-decoding scheme is designed,which integrates the state observation with encoding/de-coding.A distributed coordinated control law is proposed for each agent which is given in terms of the states of its encoder and decoders.It is shown that for a connected network,2-bit quantizers suffice for the exponential asymptotic synchronization of the states of the agents.The selection of controller parameters and the performance limit are discussed.It is shown that the alge-braic connectivity and the spectral radius of the Laplacian matrix of the communication graph play key roles in the closed-loop performance.The spectral radius of the Laplacian matrix is related to the selection of control gains,while the algebraic con-nectivity is related to the spectral radius of the closed-loop state matrix.Furthermore,it is shown that as the number of agents increases,the asymptotic convergence rate can be approximated as a function of the number of agents,the number of quantization levels(communication data rate)and the ratio of the algebraic connectivity to the spectral radius of the Laplacian matrix of the communication graph.Index Terms—Data rate,digital communication,distributed co-ordination,encoding and decoding,multi-agent systems,quantized observer.I.I NTRODUCTIONI N recent years,distributed cooperative control of multi-agent systems has attracted unprecedented attention of the control community([1]–[14])in view of its wide applications in many emergingfields such as smart grids,intelligent trans-portation,formationflight,etc.In particular,the problem of multi-agent consensus has been the focus of many researches; see,e.g.,[5]and the reference therein.Manuscript received March01,2011;revised September03,2011;accepted April05,2012.Date of publication May14,2012;date of current version November21,2012.Recommended by Associate Editor L.Schenato.This work was supported by the National Natural Science Foundation of China (NSFC)under grants61004029,60934006and61120106011.This paper was presented in part at the30th Chinese Control Conference,July22-24,2011, Yantai,China.Recommended by Associate Editor L.Schenato.T.Li is with the Key Laboratory of Systems and Control,Institute of Systems Science,Academy of Mathematics and Systems Science,Chinese Academy of Sciences,Beijing100190,China(e-mail:litao@).L.Xie is with EXQUISITUS,Centre for E-City,School of Electrical and Electronic Engineering,Nanyang Technological University,Singapore639798 (e-mail:elhxie@.sg).Color versions of one or more of thefigures in this paper are available online at .Digital Object Identifier10.1109/TAC.2012.2199152Quantized consensus is an important problem due to that digital communications are widely adopted and has attracted recurring interest([15]–[24]).Kashyap et al.([15])developed an average-consensus algorithm with integer-valued states, which can ensure the asymptotic convergence of agents’states to an integer approximation of the average of the initial states. They gave an upper bound for the expected convergence time for fully connected networks and linear networks.Frasca et al. ([19]),Carli et al.([20]),and Li et al.([24])considered the av-erage-consensus problem with real-valued states and quantized communications.In[19]and[20],static uniform quantizers and dynamic logarithmic quantizers with an infinite number of quantization levels were considered,respectively.In[20]and [24],average-consensus algorithms with dynamicfinite-level uniform quantizers were proposed.Especially,in[24],it is shown that if the network is connected,then the control param-eters can be properly chosen such that the average-consensus can be achieved with an exponential convergence rate by using a single-bit quantizer.The work of[24]was extended to the cases with link failures in[25]and time-delay in[26], respectively.The aforementioned works are concerned with thefirst-order integrator systems with measurable states.In many applications, however,we encounter higher order systems with partially mea-surable states.Dynamic output feedback control of multi-agent systems of general higher order dynamics wasfirst studied by Fax and Murray([3]).Tuna proposed a controller design algo-rithm for synchronization of discrete-time linear systems based on static relative output feedback([27]).Qu et al.([28])dealt with static output feedback of multi-agent systems via feedback linearization,where the control input of an agent is given in terms of its own output and the relative output errors with re-spect to its neighbors.Li et al.([29])and You and Xie([30]) considered distributed coordination based on dynamic relative output feedback.Hong et al.([31])developed a distributed ob-server for leader-following systems where the leader and the followers are described by second-order integrators and each follower constructs a state observer based on the leader’s posi-tion,neighbors’positions and leader’s control input to estimate the leader’s velocity.More literature on distributed observers can be found in[32]and[33].In this paper,we consider distributed coordination of multi-agent networks based on digital communications.The communications among agents are described by an undirected graph.Each agent is described by a discrete-time second-order integrator,with measurable position but unmeasurable velocity, unlike[20]and[24].Since the states of the agents are only partially measurable,the encoding-decoding scheme in[24]0018-9286/$31.00©2012IEEEcan not be easily extended to this case.Further,unlike[20] where infinite-level logarithmic quantizers are considered,we aim to design an efficient encoding-decoding scheme under a limited data rate for information exchange between agents. Ourfirst challenge is to jointly design state-observation and encoding-decoding for communication and computation effi-ciency while achieving consensus.Note that one natural idea is to design a state-observer for each agent and then encode and transmit the state-estimate to neighbors,which,however, requires a distributed control with complex encoding-decoding scheme in order to eliminate the effect of quantization and estimation errors on thefinal closed-loop system.Further,even such a control scheme can be developed to guarantee conver-gence,the computation and communication loads are generally higher and the performance(i.e.,the convergence rate under the same bit rate)is not definitely better.From the perspective of minimizing communication bit rate and reducing computation load,we propose an integrative ap-proach for observer and encoder-decoder design in this paper. At each time instant,the quantized innovation of each agent’s position is sent to its neighbors,while,at each receiver,an ob-server-based decoder is activated to obtain an estimate of the sender’s position and velocity.Our design can result in a much lower communication requirement due to:1)the encoder inputs,i.e.,agents’positions,contains less variables than the full states;2)the encoder outputs are in fact a kind of quantized innova-tions of agents’positions and it is known that innovations gen-erally can be quantized with much lower numbers of bits than the positions themselves.It is worth pointing out that even if the quantization is ignored,our encoders and decoders are different from the dynamic feedback control law in[3].Here,we do not design a state observer for each agent separately,but send the quantized innovation of each agent’s output directly and inte-grate the state observation and communication process together. Our observer-based encoding-decoding scheme is also different from the distributed observer given in[31],especially,we do not require the knowledge of the other agents’control inputs.We develop a distributed coordinated control law by using the states of the decoders and encoders,provide sufficient con-ditions on the control gains and network topology for the ex-istence offinite-level quantizers to ensure the closed-loop con-vergence,and show that these conditions are also necessary in some sense.We prove that,by selecting the number of quantiza-tion levels(data rate)properly,the asymptotic synchronization of the positions and velocities can be achieved.Furthermore,for a connected network,we can always select the control gains, such that2-bit quantizers can guarantee the exponential conver-gence of the closed-loop system and the convergence rate can be predesigned.It should be noted that compared with classical non-quan-tized and centralized state observers,due to the nonlinearity of the quantization and the coupling of all agents’states,the con-vergence of a given observer-based encoding-decoding scheme depends on the control inputs of all agents and the closed-loop dynamics of the whole network.Different from[24],the rela-tionship between the estimation error and the quantization error does not have a simple form if observer type is not properly se-lected,and it is very difficult to get an explicit expression for the relationship between the spectral radius of the closed-loop state matrix and the eigenvalues of the graph Laplacian.All these significantly complicate the closed-loop analysis and the con-trol parameter selection.Also,different from[24],there is no explicit relationship between the stability margin and the con-trol gain,which makes the performance limit analysis difficult. By using differential calculus and limit analysis,we give a linear approximation of the spectral radius of the closed-loop state ma-trix with respect to the control gain ratio and algebraic connec-tivity of the communication graph,based on which,a relation-ship between the performance limit and the parameters of the network and system is revealed.We show that as the number of agents increases to infinity,the asymptotic highest convergence rate is when using a-level quantizer,where is the ratio of the algebraic connectivity to the spectral radius of the Laplacian matrix of the communica-tion graph.The remainder of this paper is organized as follows.In Section II,we present the model of the network and agents,give the structures of observer-based encoders,observer-based de-coders and distributed coordinated control laws.In Section III, we analyze the closed-loop system and give conditions on the network topology,the control gains and the number of quantization levels to ensure convergence.In Section IV,we discuss the selection of the control gain ratio and show that2-bit quantizers can guarantee the convergence of the closed-loop system by selecting the control gains properly.We also give an explicit form of the asymptotic convergence rate.In Section V, we draw some concluding remarks and propose future research topics.The following notation will be used throughout this paper: denotes a column vector with all ones.denotes the identity matrix with an appropriate size.For a given set,the number of its elements is denoted by.For a given vector or matrix ,we denote its transpose by,its-norm by,its Euclidean norm by,its spectral radius by,and its trace by.For a given positive number,the natural logarithm, the logarithm of with base2,the maximum integer less than or equal to,and the minimum integer greater than or equal to are respectively denoted by,,and.II.P ROBLEM F ORMULATIONA.Agent and Network ModelsWe consider distributed coordination of a network of agents with the second-order dynamics:(1) where,,and are the position, velocity control of the th agent,respectively.Here, is the output of agent,that is,for agent,only its po-sition is measurable.The agents communicate with each other through a network whose topology is modeled as an undirected graph,where the agents and the communication channels between agents are represented by the node set and the edge set,respectively.The weighted adjacency matrix ofLI AND XIE:DISTRIBUTED COORDINATION OF MULTI-AGENT SYSTEMS WITH QUANTIZED-OBSERVER BASED ENCODING-DECODING3025is denoted by.Note that is a sym-metric matrix.An edge by the pair represents a communication channel from to and if and only if.The neighborhood of the th agent is denoted by.For any,,and if and only if.Also,is called the degree of,and is called the degree of.The Laplacian matrix of is defined as,where.The Laplacian matrix is a sym-metric positive semi-definite matrix and its eigenvalues in an ascending order are denoted by,where is the spectral radius of and is called the algebraic connectivity of([34],[35]).A sequence of edges is called a path from node to node.The graph is called a connected graph if for any ,there is a path from to.B.Observer-Based Encoding-DecodingWe consider digital communication channels with limited channel capacity.At each time step,what each agent can send to its neighbors is only a coded version of its current and past measurements.Generally speaking,the encoder of the th agent may take the following form:(2) where and are the output and input of the encoder, respectively,is a Borel measurable function and is a quan-tizer.Note that both the structure and parameters of and may be time-varying and the encoder may have infinite memory. In this paper,we propose afinite memory encoder of agent as(3) where is an exponentially decaying scaling function to be defined later.In the above,and are the internal states of the encoder and is afiquantizer given by(4) where is the number of quantization levels of.After is received by one of the th agent’s neighbors,say agent,a decoder will be activated:(5) where and are the outputs of the decoder.Remark1:In the above,is a quantized innovation with scaling.From the dynamic(1)of the th agent,we know that to get estimates for and,following the standard observer design,the decoder can be in the form(6) where and are the observer gains.It can be easily verified that if and the quantizer is the identity function,then(6)degenerates to the classical deadbeat posterior state observer based on output.However,since is not available for the neighbors of the th agent,we adopt decoder(5)instead.Remark2:From(3)and(5),we have(7) We will show that and can be viewed as the estimates for and,respectively.Denoteas the quantization error in encoder,as the estimation error for andas the estimation error for.By(3)and some direct calculation,we get(8) and(9) It can be seen that if the quantization error is bounded, then due to the vanishing of,the estimation errorsand will both to zero asymptotically as.Note that here,for the velocity estimation,there is one step delay.Remark3:The relationship among the estimation errors ,and the quantization error is not asin thefirst-order case It will be seen later that(8)and(9)will play an important role in the closed-loop analysis.Observe that the estimation errors for velocities depend on two steps of quantization errors,which, as we can see later,leads to an additional bit required for the quantizers as compared to thefirst-order case([24]). Remark4:From the above,we can see that both the en-coder(3)and the decoder(5)can be viewed as the state ob-servers based on the output and the quantized innovation. We call the encoder(3)an observer-based encoder and the de-coder(5)an observer-based decoder.Though the velocityis not measurable,the th agent and its neighbors can make an estimate for the overall state by using an ob-server-based encoder and an observer-based decoder.At each time step,each agent only needs to send the quantized innova-tion of its output to its neighbors,then the neighbors can use observer-based decoders to get estimates for the state of the3026IEEE TRANSACTIONS ON AUTOMATIC CONTROL,VOL.57,NO.12,DECEMBER2012agent.However,generally speaking,there is no separation prin-ciple for the encoder-decoder design and the control design. Compared with classical non-quantized and centralized state observers,due to the nonlinearity of the quantization and the coupling of all agents’states,the convergence of a given ob-server-based encoding-decoding scheme depends on the control inputs of all agents and the closed-loop dynamics of the whole network,which significantly complicates the analysis as seen below.C.Distributed Control LawIn this paper,we aim at designing a distributed coordinated control law based on quantized communications such that(10) We propose a distributed coordinated control law of the form(11) where and are the control gains.From(3),(5)and(11),we can see that the control input of each agent only depends on the state of its own encoder and the states of the decoders associated with the channels from its neighbors.Remark5:Since the states of agents are only partially mea-surable,the encoding-decoding scheme in[24]where agents of single integrator dynamics are considered cannot be easily extended to this case.The challenge is to design state observers and encoders-decoders jointly so that they can achieve con-sensus with efficient communications and computation.One natural idea is to design a state-observer for each agent and then encode and transmit the state estimate to neighbors.For example,we may adopt the following state-observer for the th agent:(12)is then encoded and transmitted to the neigh-bors of the th agent.However,since the control inputand estimation error are not available for its neighbors,to eliminate the effect of quantization and estima-tion errors on thefinal closed-loop system,we may need a more complex encoding-decoding scheme and a control law than(3), (5)and(11).Further,even if we canfind such a scheme to guar-antee convergence,the computation and communication loads are higher and the performance(i.e.,the convergence rate under the same bit rate)is not definitely better.From the perspective of bit rate constraint and reducing computation load,we propose an integrative approach for the state-observer and encoder-de-coder design.III.C ONVERGENCE A NALYSISThis section is devoted to the convergence analysis of the proposed distributed control law in the last section.To this end, we introduce the following notation:where.We also define the unitary matrix(13) where is the unit eigenvector of associated with,that is,,,.Under the protocol(3),(5)and(11),due to the quantization, the closed-loop system is a nonlinear discontinuous system. Generally speaking,the convergence analysis is difficult, however,by using the estimation error expressions(8)and (9),the closed-loop equation can be converted into a linear equation with time-varying disturbances,whose homogeneous part is just the closed-loop equation without quantization.Then by properly selecting the number of quantization levels,the quantizers can be kept unsaturated and the convergence of the closed-loop system can be achieved.We make the following assumptions.A1)There are known positive constants,,,, such that,,,.A2)The communication graph is connected.A3).A4).The following lemma,whose proof can be found in Ap-pendix,will be used in the analysis of the homogeneous part of the closed-loop system.Lemma3.1:Let(14) Then,i),if and only if As-sumptions(A2)–(A4)hold.ii)Let(15)LI AND XIE:DISTRIBUTED COORDINATION OF MULTI-AGENT SYSTEMS WITH QUANTIZED-OBSERVER BASED ENCODING-DECODING3027If Assumptions(A3)–(A4)hold,then the eigenvalues of are0,,and ,where(16) In the above,the arguments,of and were omitted,and,where.From Lemma 3.1,we know that if Assumptions (A2)–(A4)hold,then is diagonalizable.Let ,,be nonsingular matrices,such that whereDenote,. In the following,the dependence of,and on and will be omitted when there is no confusion.The following theorem gives sufficient conditions on the con-trol gains and network topology for the existence offinite-level quantizers to ensure the closed-loop convergence.Theorem3.1:Suppose Assumptions(A1)–(A4)hold.Let the scaling function,where(17) and.If the numbers of quantization levels of the quantizer,satisfy(18) and(19)where, then under the protocol(3),(5)and(11),the closed-loop system satisfies(20) Furthermore,the convergence rate is given by(21)Proof:The proof can be divided into three steps.First, we convert the closed-loop system into non-coupled linear equations with nonlinear disturbances.The disturbances are combinations of the estimation errors which are related to the quantization errors as observed from by(8)and(9).Second, we estimate the bound of the synchronization errors in terms of the quantization errors and system and control parameters. Finally,we prove the boundness of the quantization error by properly choosing the control parameters and the number of quantization levels,which will lead to the convergence of the closed-loop system.Step1)From(7)and(11),it follows that(22) Substitute the control law above into the system(1),we haveLet,,where is defined in(13).Denote the th components of and by and,respectively.Then we have, and(23) Denote,then the(23)can be rewritten as(24) where with.It is clear that to get(20),we only need to prove,.3028IEEE TRANSACTIONS ON AUTOMATIC CONTROL,VOL.57,NO.12,DECEMBER2012Step2)By(24),we have(25) Further,by(8)and(9),noting that,we haveThen it follows from(25)that(26) By the definition of,and,we get(27) Step3)By Lemma A.2,we get.This together with(26)gives,, which further implies(20).Then from, (26)and(27),we get(21).Observe that the distributed control law in Theorem3.1re-lies on,which requires each agent to know the graph and may not be practical.This restriction is relaxed by the following corollary.Corollary3.1:Suppose Assumptions(A1)–(A4)hold.Let the scaling function,where(28) and.If the numbers of quantization levels of the quantizer,satisfy(29) and(30) where then under the protocol(3),(5)and(11),the closed-loop system satisfies(31) and the convergence rate is given by(32)Proof:Noting that and,by Theorem3.1,we get the conclusion of this corollary.Remark6:From Theorem3.1and Corollary3.1,we can see that the convergence factor can be properly chosen to tune the convergence rate of the closed-loop system.By Corollary3.1, we may select the control parameters by the following steps.i)Choosing,such that Assumptions(A3)–(A4)hold.ii) Choosing and then according to(28).iii)Choosing the number of quantization levels according to(29)and(30). Remark7:Corollary3.1tells us that to select proper and the number of quantization levels,we do not need to know, that is,the exact Laplacian matrix.Furthermore,Assumption A4)holds if,so the selection of the con-trol gains may not need the knowledge of.However,from the definition of,we can see that the selection of needs the knowledge of the eigenvalues of the Laplacian ma-trix.Hence,we still need some global knowledge of the net-work topology to select the control parameters.In the case when the network topology can be predesigned,this is not a problem. However,in some applications,the network topology may notLI AND XIE:DISTRIBUTED COORDINATION OF MULTI-AGENT SYSTEMS WITH QUANTIZED-OBSERVER BASED ENCODING-DECODING3029be known to each agent,for example,under switching topolo-gies due to changing environment.In this situation,the problem of estimating the eigenvalues of the Laplacian matrix in a dis-tributed manner becomes relevant.Franceschelli et al.([36]) gave an algorithm to estimate the eigenvalues of a Laplacian matrix by each agent using the fast Fourier transform.The com-bination of the eigenvalue estimation algorithm with our pro-posed distributed coordinate control algorithm is an interesting future research topic.Remark8:From Lemma3.1and the proof of Theorem3.1, we can see that A2-A4)are necessary and sufficient for the sta-bility of the homogeneous part of the closed-loop systems(24). Since,we can see that a smaller degree, which implies lower local connectivity,will instead give more flexibility for selecting the control gains.In the main theorem of[15](Theorem1of[15]),the authors proved that under their algorithm,as time goes on the states of agents converge to a ball centered at the average of the initial states with radius less than or equal to the quantization interval, with probability1.They also proved that there always exists a finite time such that the states of the agents enter and stay in the ball with a positive probability when.An upper bound for the mathematical expectation of the convergence time for fully connected networks and linear networks was also pro-vided.In this paper,we focus on the case with real-valued states and the asymptotic convergence to exact synchronization.The algorithm given here can guarantee convergence to synchro-nization with an arbitrary precision as time goes on.In the fol-lowing,we will give an analysis on the convergence time for a given precision for connected networks.For any given, denote and,which are respectively the time for the positions and veloci-ties of all the agents with precision.Theorem3.2:Suppose the conditions of Theorem3.1hold,and.Then under the protocol(3),(5)and(11),for sufficiently small, the convergence time for the position and velocity respectively satisfies(33) whereProof:The proof can be found in the Appendix. Remark9:Similar to Corollary 3.1,the constantin Theorem 3.2can be replacedby Fig.1.Curves of of Example1.,which gives us a relationship between the upper bound of the convergence time and the number of agents.IV.P ARAMETER D ESIGN AND P ERFORMANCE L IMIT A NALYSIS In this section,we shall investigate controller parameter se-lection and analyze the asymptotic consensus convergence rate.A.Selecting the Control Gain RatioSelecting the control gains and is equivalent to selecting a control gain ratio and the position control gain. It is easily seen that Assumptions A3)-A4)hold if and only if and.Further will max-imize,which implies the largest stability margin of the homogeneous part of the closed-loop system(24).1)Example1:We consider a10-node network withand.The curves of with respect to with different control gain ratios are shown in Fig.1.It can be seen that will go to1as or,andfirst decreases and then increases with respect to.The of the inflection point of reaches its maximum when.Further,it can be proved theoretically that when is sufficiently small, is almost a linear,monotone decreasing function of .We have the following result.Lemma4.1:If Assumptions A2)-A4)hold,then for any given ,we have(34)Proof:The proof can be found in Appendix.For Example1,the curves of andwith different are shown in Fig.2.B.Selecting the Control Parameters Under a Given Communication Data RateIn Theorem3.1,we give a criterion for selecting the number of quantization levels(communication data rate)under given control gains and a convergence rate.In the following theorem,3030IEEE TRANSACTIONS ON AUTOMATIC CONTROL,VOL.57,NO.12,DECEMBER2012Fig.2.Curves of and of Example1with different,where dot are for and the solid lines are for.we will consider how to select the control parameters under a given communication data rate.Theorem4.1:Suppose Assumptions A1)and A2)hold.For any given,,denote(35) Then,i)is nonempty.ii)If,,and the numbers of the quantization levels of satisfy(36)then under the protocol given by(3),(5)and(11)with,the closed-loop system satisfieswhere is a constant satisfying(37)Proof:From Lemma4.1,we have(38) which impliesFrom the aforementioned,noting that the ex-ists,and(35),we have(i).For any given integer and constant,if ,,(36)and(37)hold,then it is easily verified that,Assumptions A3)-A4)and(18)hold. Then noting that and,we know that(17)and(19)also hold.By Theorem3.1,we get ii).Remark10:In[24],it is shown that for a connected network withfirst-order agents,average-consensus can be achieved with an exponential convergence rate based on merely1-bit informa-tion exchange between agents.Here,we prove that for the case with second-order agents,2-bit quantizers suffice for the expo-nential asymptotic synchronization of agents’pared with[24],from(A.2),we can see that the additional bit is used to overcome the uncertainty in estimating the velocity of the agent.Remark11:Compared with[24],the performance limit analysis for the second order agents with partial measur-able states is much more challenging.In[24],the spec-tral radius of the closed-loop matrix has the simple form:,where is the control gain.In this paper,it is very difficult to get an explicit expression for the relationship between the closed-loop spectral radiusand the eigenvalues of the Laplacian matrix.By differential mean theorem and limit analysis,we develop Lemma4.1to give a linear approximation of with respect to the control gains and the algebraic connectivity.From(38),we can see that Lemma4.1plays a vital role in establishing Theorem 4.1.Different from[24],there is also no explicit relationship between the stability margin and the control gain ,which also poses a significant challenge in the asymptotic convergence rate analysis as seen later in Section IV-C.1)Example2:We consider a network with10nodes andweights,which means that,if,other-wise,.The edges of the graph are randomly generated according to,for any unordered pair. Here,,.The initial states are chosen as and,.The con-trol gain and,which give.The scaling factor is taken as 0.9998.According to Theorem3.1,the2-bit quantizer can be used.The evolution of the states is shown in Fig.3.It can be seen that both the positions and the velocities of the agents are asymptotically synchronized.Next,we set.In this。
人工智能导论测试题库及答案精选全文
精选全文完整版(可编辑修改)人工智能导论测试题库及答案1、在关联规则分析过程中,对原始数据集进行事务型数据处理的主要原因是。
A、提高数据处理速度B、节省存储空间C、方便算法计算D、形成商品交易矩阵答案:C2、计算机视觉可应用于下列哪些领域()。
A、安防及监控领域B、金融领域的人脸识别身份验证C、医疗领域的智能影像诊断D、机器人/无人车上作为视觉输入系统E、以上全是答案:E3、1943年,神经网络的开山之作《A logical calculus of ideas immanent in nervous activity》,由()和沃尔特.皮茨完成。
A、沃伦.麦卡洛克B、明斯基C、唐纳德.赫布D、罗素答案:A4、对于自然语言处理问题,哪种神经网络模型结构更适合?()。
A、多层感知器B、卷积神经网络C、循环神经网络D、感知器答案:C5、图像的空间离散化叫做:A、灰度化B、二值化C、采样D、量化答案:C6、()越多,所得图像层次越丰富,灰度分辨率高,图像质量好。
A、分辨率B、像素数量C、量化等级D、存储的数据量答案:C7、一个完整的人脸识别系统主要包含人脸图像采集和检测、人脸图像特征提取和人脸识别四个部分。
A、人脸分类器B、人脸图像预处理C、人脸数据获取D、人脸模型训练答案:B8、下列不属于人工智能学派的是()。
A、符号主义B、连接主义C、行为主义D、机会主义答案:D9、关于正负样本的说法正确是。
A、样本数量多的那一类是正样本B、样本数量少的那一类是负样本C、正负样本没有明确的定义D、想要正确识别的那一类为正样本答案:D10、以下不属于完全信息博弈的游戏有()。
A、井字棋B、黑白棋C、围棋D、桥牌答案:D11、下列关于人工智能的说法中,哪一项是错误的。
A、人工智能是一门使机器做那些人需要通过智能来做的事情的学科B、人工智能主要研究知识的表示、知识的获取和知识的运用C、人工智能是研究机器如何像人一样合理思考、像人一样合理行动的学科D、人工智能是研究机器如何思维的一门学科答案:D12、认为智能不需要知识、不需要表示、不需要推理;人工智能可以像人类智能一样逐步进化;智能行为只能在现实世界中与周围环境交互作用而表现出来。
An Overview of Recent Progress in the Study of Distributed Multi-agent Coordination
An Overview of Recent Progress in the Study of Distributed Multi-agent CoordinationYongcan Cao,Member,IEEE,Wenwu Yu,Member,IEEE,Wei Ren,Member,IEEE,and Guanrong Chen,Fellow,IEEEAbstract—This article reviews some main results and progress in distributed multi-agent coordination,focusing on papers pub-lished in major control systems and robotics journals since 2006.Distributed coordination of multiple vehicles,including unmanned aerial vehicles,unmanned ground vehicles and un-manned underwater vehicles,has been a very active research subject studied extensively by the systems and control community. The recent results in this area are categorized into several directions,such as consensus,formation control,optimization, and estimation.After the review,a short discussion section is included to summarize the existing research and to propose several promising research directions along with some open problems that are deemed important for further investigations.Index Terms—Distributed coordination,formation control,sen-sor networks,multi-agent systemI.I NTRODUCTIONC ONTROL theory and practice may date back to thebeginning of the last century when Wright Brothers attempted theirfirst testflight in1903.Since then,control theory has gradually gained popularity,receiving more and wider attention especially during the World War II when it was developed and applied tofire-control systems,missile nav-igation and guidance,as well as various electronic automation devices.In the past several decades,modern control theory was further advanced due to the booming of aerospace technology based on large-scale engineering systems.During the rapid and sustained development of the modern control theory,technology for controlling a single vehicle, albeit higher-dimensional and complex,has become relatively mature and has produced many effective tools such as PID control,adaptive control,nonlinear control,intelligent control, This work was supported by the National Science Foundation under CAREER Award ECCS-1213291,the National Natural Science Foundation of China under Grant No.61104145and61120106010,the Natural Science Foundation of Jiangsu Province of China under Grant No.BK2011581,the Research Fund for the Doctoral Program of Higher Education of China under Grant No.20110092120024,the Fundamental Research Funds for the Central Universities of China,and the Hong Kong RGC under GRF Grant CityU1114/11E.The work of Yongcan Cao was supported by a National Research Council Research Associateship Award at AFRL.Y.Cao is with the Control Science Center of Excellence,Air Force Research Laboratory,Wright-Patterson AFB,OH45433,USA.W.Yu is with the Department of Mathematics,Southeast University,Nanjing210096,China and also with the School of Electrical and Computer Engineering,RMIT University,Melbourne VIC3001,Australia.W.Ren is with the Department of Electrical Engineering,University of California,Riverside,CA92521,USA.G.Chen is with the Department of Electronic Engineering,City University of Hong Kong,Hong Kong SAR,China.Copyright(c)2009IEEE.Personal use of this material is permitted. However,permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@.and robust control methodologies.In the past two decades in particular,control of multiple vehicles has received increas-ing demands spurred by the fact that many benefits can be obtained when a single complicated vehicle is equivalently replaced by multiple yet simpler vehicles.In this endeavor, two approaches are commonly adopted for controlling multiple vehicles:a centralized approach and a distributed approach. The centralized approach is based on the assumption that a central station is available and powerful enough to control a whole group of vehicles.Essentially,the centralized ap-proach is a direct extension of the traditional single-vehicle-based control philosophy and strategy.On the contrary,the distributed approach does not require a central station for control,at the cost of becoming far more complex in structure and organization.Although both approaches are considered practical depending on the situations and conditions of the real applications,the distributed approach is believed more promising due to many inevitable physical constraints such as limited resources and energy,short wireless communication ranges,narrow bandwidths,and large sizes of vehicles to manage and control.Therefore,the focus of this overview is placed on the distributed approach.In distributed control of a group of autonomous vehicles,the main objective typically is to have the whole group of vehicles working in a cooperative fashion throughout a distributed pro-tocol.Here,cooperative refers to a close relationship among all vehicles in the group where information sharing plays a central role.The distributed approach has many advantages in achieving cooperative group performances,especially with low operational costs,less system requirements,high robustness, strong adaptivity,andflexible scalability,therefore has been widely recognized and appreciated.The study of distributed control of multiple vehicles was perhapsfirst motivated by the work in distributed comput-ing[1],management science[2],and statistical physics[3]. In the control systems society,some pioneering works are generally referred to[4],[5],where an asynchronous agree-ment problem was studied for distributed decision-making problems.Thereafter,some consensus algorithms were studied under various information-flow constraints[6]–[10].There are several journal special issues on the related topics published af-ter2006,including the IEEE Transactions on Control Systems Technology(vol.15,no.4,2007),Proceedings of the IEEE (vol.94,no.4,2007),ASME Journal of Dynamic Systems, Measurement,and Control(vol.129,no.5,2007),SIAM Journal of Control and Optimization(vol.48,no.1,2009),and International Journal of Robust and Nonlinear Control(vol.21,no.12,2011).In addition,there are some recent reviewsand progress reports given in the surveys[11]–[15]and thebooks[16]–[23],among others.This article reviews some main results and recent progressin distributed multi-agent coordination,published in majorcontrol systems and robotics journals since2006.Due to space limitations,we refer the readers to[24]for a more completeversion of the same overview.For results before2006,thereaders are referred to[11]–[14].Specifically,this article reviews the recent research resultsin the following directions,which are not independent but actually may have overlapping to some extent:1.Consensus and the like(synchronization,rendezvous).Consensus refers to the group behavior that all theagents asymptotically reach a certain common agreementthrough a local distributed protocol,with or without predefined common speed and orientation.2.Distributed formation and the like(flocking).Distributedformation refers to the group behavior that all the agents form a pre-designed geometrical configuration throughlocal interactions with or without a common reference.3.Distributed optimization.This refers to algorithmic devel-opments for the analysis and optimization of large-scaledistributed systems.4.Distributed estimation and control.This refers to dis-tributed control design based on local estimation aboutthe needed global information.The rest of this article is organized as follows.In Section II,basic notations of graph theory and stochastic matrices are introduced.Sections III,IV,V,and VI describe the recentresearch results and progress in consensus,formation control, optimization,and estimation.Finally,the article is concludedby a short section of discussions with future perspectives.II.P RELIMINARIESA.Graph TheoryFor a system of n connected agents,its network topology can be modeled as a directed graph denoted by G=(V,W),where V={v1,v2,···,v n}and W⊆V×V are,respectively, the set of agents and the set of edges which directionallyconnect the agents together.Specifically,the directed edgedenoted by an ordered pair(v i,v j)means that agent j can access the state information of agent i.Accordingly,agent i is a neighbor of agent j.A directed path is a sequence of directed edges in the form of(v1,v2),(v2,v3),···,with all v i∈V.A directed graph has a directed spanning tree if there exists at least one agent that has a directed path to every other agent.The union of a set of directed graphs with the same setof agents,{G i1,···,G im},is a directed graph with the sameset of agents and its set of edges is given by the union of the edge sets of all the directed graphs G ij,j=1,···,m.A complete directed graph is a directed graph in which each pair of distinct agents is bidirectionally connected by an edge,thus there is a directed path from any agent to any other agent in the network.Two matrices are used to represent the network topology: the adjacency matrix A=[a ij]∈R n×n with a ij>0if (v j,v i)∈W and a ij=0otherwise,and the Laplacian matrix L=[ℓij]∈R n×n withℓii= n j=1a ij andℓij=−a ij,i=j, which is generally asymmetric for directed graphs.B.Stochastic MatricesA nonnegative square matrix is called(row)stochastic matrix if its every row is summed up to one.The product of two stochastic matrices is still a stochastic matrix.A row stochastic matrix P∈R n×n is called indecomposable and aperiodic if lim k→∞P k=1y T for some y∈R n[25],where 1is a vector with all elements being1.III.C ONSENSUSConsider a group of n agents,each with single-integrator kinematics described by˙x i(t)=u i(t),i=1,···,n,(1) where x i(t)and u i(t)are,respectively,the state and the control input of the i th agent.A typical consensus control algorithm is designed asu i(t)=nj=1a ij(t)[x j(t)−x i(t)],(2)where a ij(t)is the(i,j)th entry of the corresponding ad-jacency matrix at time t.The main idea behind(2)is that each agent moves towards the weighted average of the states of its neighbors.Given the switching network pattern due to the continuous motions of the dynamic agents,coupling coefficients a ij(t)in(2),hence the graph topologies,are generally time-varying.It is shown in[9],[10]that consensus is achieved if the underlying directed graph has a directed spanning tree in some jointly fashion in terms of a union of its time-varying graph topologies.The idea behind consensus serves as a fundamental principle for the design of distributed multi-agent coordination algo-rithms.Therefore,investigating consensus has been a main research direction in the study of distributed multi-agent co-ordination.To bridge the gap between the study of consensus algorithms and many physical properties inherited in practical systems,it is necessary and meaningful to study consensus by considering many practical factors,such as actuation,control, communication,computation,and vehicle dynamics,which characterize some important features of practical systems.This is the main motivation to study consensus.In the following part of the section,an overview of the research progress in the study of consensus is given,regarding stochastic network topologies and dynamics,complex dynamical systems,delay effects,and quantization,mainly after2006.Several milestone results prior to2006can be found in[2],[4]–[6],[8]–[10], [26].A.Stochastic Network Topologies and DynamicsIn multi-agent systems,the network topology among all vehicles plays a crucial role in determining consensus.The objective here is to explicitly identify necessary and/or suffi-cient conditions on the network topology such that consensus can be achieved under properly designed algorithms.It is often reasonable to consider the case when the network topology is deterministic under ideal communication chan-nels.Accordingly,main research on the consensus problem was conducted under a deterministicfixed/switching network topology.That is,the adjacency matrix A(t)is deterministic. Some other times,when considering random communication failures,random packet drops,and communication channel instabilities inherited in physical communication channels,it is necessary and important to study consensus problem in the stochastic setting where a network topology evolves according to some random distributions.That is,the adjacency matrix A(t)is stochastically evolving.In the deterministic setting,consensus is said to be achieved if all agents eventually reach agreement on a common state. In the stochastic setting,consensus is said to be achieved almost surely(respectively,in mean-square or in probability)if all agents reach agreement on a common state almost surely (respectively,in mean-square or with probability one).Note that the problem studied in the stochastic setting is slightly different from that studied in the deterministic setting due to the different assumptions in terms of the network topology. Consensus over a stochastic network topology was perhaps first studied in[27],where some sufficient conditions on the network topology were given to guarantee consensus with probability one for systems with single-integrator kinemat-ics(1),where the rate of convergence was also studied.Further results for consensus under a stochastic network topology were reported in[28]–[30],where research effort was conducted for systems with single-integrator kinematics[28],[29]or double-integrator dynamics[30].Consensus for single-integrator kine-matics under stochastic network topology has been exten-sively studied in particular,where some general conditions for almost-surely consensus was derived[29].Loosely speaking, almost-surely consensus for single-integrator kinematics can be achieved,i.e.,x i(t)−x j(t)→0almost surely,if and only if the expectation of the network topology,namely,the network topology associated with expectation E[A(t)],has a directed spanning tree.It is worth noting that the conditions are analogous to that in[9],[10],but in the stochastic setting. In view of the special structure of the closed-loop systems concerning consensus for single-integrator kinematics,basic properties of the stochastic matrices play a crucial role in the convergence analysis of the associated control algorithms. Consensus for double-integrator dynamics was studied in[30], where the switching network topology is assumed to be driven by a Bernoulli process,and it was shown that consensus can be achieved if the union of all the graphs has a directed spanning tree.Apparently,the requirement on the network topology for double-integrator dynamics is a special case of that for single-integrator kinematics due to the difference nature of thefinal states(constantfinal states for single-integrator kinematics and possible dynamicfinal states for double-integrator dynamics) caused by the substantial dynamical difference.It is still an open question as if some general conditions(corresponding to some specific algorithms)can be found for consensus with double-integrator dynamics.In addition to analyzing the conditions on the network topology such that consensus can be achieved,a special type of consensus algorithm,the so-called gossip algorithm[31],[32], has been used to achieve consensus in the stochastic setting. The gossip algorithm can always guarantee consensus almost surely if the available pairwise communication channels satisfy certain conditions(such as a connected graph).The way of network topology switching does not play any role in the consideration of consensus.The current study on consensus over stochastic network topologies has shown some interesting results regarding:(1) consensus algorithm design for various multi-agent systems,(2)conditions of the network topologies on consensus,and(3)effects of the stochastic network topologies on the con-vergence rate.Future research on this topic includes,but not limited to,the following two directions:(1)when the network topology itself is stochastic,how to determine the probability of reaching consensus almost surely?(2)compared with the deterministic network topology,what are the advantages and disadvantages of the stochastic network topology,regarding such as robustness and convergence rate?As is well known,disturbances and uncertainties often exist in networked systems,for example,channel noise,commu-nication noise,uncertainties in network parameters,etc.In addition to the stochastic network topologies discussed above, the effect of stochastic disturbances[33],[34]and uncertain-ties[35]on the consensus problem also needs investigation. Study has been mainly devoted to analyzing the performance of consensus algorithms subject to disturbances and to present-ing conditions on the uncertainties such that consensus can be achieved.In addition,another interesting direction in dealing with disturbances and uncertainties is to design distributed localfiltering algorithms so as to save energy and improve computational efficiency.Distributed localfiltering algorithms play an important role and are more effective than traditional centralizedfiltering algorithms for multi-agent systems.For example,in[36]–[38]some distributed Kalmanfilters are designed to implement data fusion.In[39],by analyzing consensus and pinning control in synchronization of complex networks,distributed consensusfiltering in sensor networks is addressed.Recently,Kalmanfiltering over a packet-dropping network is designed through a probabilistic approach[40]. Today,it remains a challenging problem to incorporate both dynamics of consensus and probabilistic(Kalman)filtering into a unified framework.plex Dynamical SystemsSince consensus is concerned with the behavior of a group of vehicles,it is natural to consider the system dynamics for practical vehicles in the study of the consensus problem. Although the study of consensus under various system dynam-ics is due to the existence of complex dynamics in practical systems,it is also interesting to observe that system dynamics play an important role in determining thefinal consensus state.For instance,the well-studied consensus of multi-agent systems with single-integrator kinematics often converges to a constantfinal value instead.However,consensus for double-integrator dynamics might admit a dynamicfinal value(i.e.,a time function).These important issues motivate the study of consensus under various system dynamics.As a direct extension of the study of the consensus prob-lem for systems with simple dynamics,for example,with single-integrator kinematics or double-integrator dynamics, consensus with general linear dynamics was also studied recently[41]–[43],where research is mainly devoted tofinding feedback control laws such that consensus(in terms of the output states)can be achieved for general linear systems˙x i=Ax i+Bu i,y i=Cx i,(3) where A,B,and C are constant matrices with compatible sizes.Apparently,the well-studied single-integrator kinematics and double-integrator dynamics are special cases of(3)for properly choosing A,B,and C.As a further extension,consensus for complex systems has also been extensively studied.Here,the term consensus for complex systems is used for the study of consensus problem when the system dynamics are nonlinear[44]–[48]or with nonlinear consensus algorithms[49],[50].Examples of the nonlinear system dynamics include:•Nonlinear oscillators[45].The dynamics are often as-sumed to be governed by the Kuramoto equation˙θi=ωi+Kstability.A well-studied consensus algorithm for(1)is given in(2),where it is now assumed that time delay exists.Two types of time delays,communication delay and input delay, have been considered in the munication delay accounts for the time for transmitting information from origin to destination.More precisely,if it takes time T ij for agent i to receive information from agent j,the closed-loop system of(1)using(2)under afixed network topology becomes˙x i(t)=nj=1a ij(t)[x j(t−T ij)−x i(t)].(7)An interpretation of(7)is that at time t,agent i receives information from agent j and uses data x j(t−T ij)instead of x j(t)due to the time delay.Note that agent i can get its own information instantly,therefore,input delay can be considered as the summation of computation time and execution time. More precisely,if the input delay for agent i is given by T p i, then the closed-loop system of(1)using(2)becomes˙x i(t)=nj=1a ij(t)[x j(t−T p i)−x i(t−T p i)].(8)Clearly,(7)refers to the case when only communication delay is considered while(8)refers to the case when only input delay is considered.It should be emphasized that both communication delay and input delay might be time-varying and they might co-exist at the same time.In addition to time delay,it is also important to consider packet drops in exchanging state information.Fortunately, consensus with packet drops can be considered as a special case of consensus with time delay,because re-sending packets after they were dropped can be easily done but just having time delay in the data transmission channels.Thus,the main problem involved in consensus with time delay is to study the effects of time delay on the convergence and performance of consensus,referred to as consensusabil-ity[52].Because time delay might affect the system stability,it is important to study under what conditions consensus can still be guaranteed even if time delay exists.In other words,can onefind conditions on the time delay such that consensus can be achieved?For this purpose,the effect of time delay on the consensusability of(1)using(2)was investigated.When there exists only(constant)input delay,a sufficient condition on the time delay to guarantee consensus under afixed undirected interaction graph is presented in[8].Specifically,an upper bound for the time delay is derived under which consensus can be achieved.This is a well-expected result because time delay normally degrades the system performance gradually but will not destroy the system stability unless the time delay is above a certain threshold.Further studies can be found in, e.g.,[53],[54],which demonstrate that for(1)using(2),the communication delay does not affect the consensusability but the input delay does.In a similar manner,consensus with time delay was studied for systems with different dynamics, where the dynamics(1)are replaced by other more complex ones,such as double-integrator dynamics[55],[56],complex networks[57],[58],rigid bodies[59],[60],and general nonlinear dynamics[61].In summary,the existing study of consensus with time delay mainly focuses on analyzing the stability of consensus algo-rithms with time delay for various types of system dynamics, including linear and nonlinear dynamics.Generally speaking, consensus with time delay for systems with nonlinear dynam-ics is more challenging.For most consensus algorithms with time delays,the main research question is to determine an upper bound of the time delay under which time delay does not affect the consensusability.For communication delay,it is possible to achieve consensus under a relatively large time delay threshold.A notable phenomenon in this case is that thefinal consensus state is constant.Considering both linear and nonlinear system dynamics in consensus,the main tools for stability analysis of the closed-loop systems include matrix theory[53],Lyapunov functions[57],frequency-domain ap-proach[54],passivity[58],and the contraction principle[62]. Although consensus with time delay has been studied extensively,it is often assumed that time delay is either constant or random.However,time delay itself might obey its own dynamics,which possibly depend on the communication distance,total computation load and computation capability, etc.Therefore,it is more suitable to represent the time delay as another system variable to be considered in the study of the consensus problem.In addition,it is also important to consider time delay and other physical constraints simultaneously in the study of the consensus problem.D.QuantizationQuantized consensus has been studied recently with motiva-tion from digital signal processing.Here,quantized consensus refers to consensus when the measurements are digital rather than analog therefore the information received by each agent is not continuous and might have been truncated due to digital finite precision constraints.Roughly speaking,for an analog signal s,a typical quantizer with an accuracy parameterδ, also referred to as quantization step size,is described by Q(s)=q(s,δ),where Q(s)is the quantized signal and q(·,·) is the associated quantization function.For instance[63],a quantizer rounding a signal s to its nearest integer can be expressed as Q(s)=n,if s∈[(n−1/2)δ,(n+1/2)δ],n∈Z, where Z denotes the integer set.Note that the types of quantizers might be different for different systems,hence Q(s) may differ for different systems.Due to the truncation of the signals received,consensus is now considered achieved if the maximal state difference is not larger than the accuracy level associated with the whole system.A notable feature for consensus with quantization is that the time to reach consensus is usuallyfinite.That is,it often takes afinite period of time for all agents’states to converge to an accuracy interval.Accordingly,the main research is to investigate the convergence time associated with the proposed consensus algorithm.Quantized consensus was probablyfirst studied in[63], where a quantized gossip algorithm was proposed and its convergence was analyzed.In particular,the bound of theconvergence time for a complete graph was shown to be poly-nomial in the network size.In[64],coding/decoding strate-gies were introduced to the quantized consensus algorithms, where it was shown that the convergence rate depends on the accuracy of the quantization but not the coding/decoding schemes.In[65],quantized consensus was studied via the gossip algorithm,with both lower and upper bounds of the expected convergence time in the worst case derived in terms of the principle submatrices of the Laplacian matrix.Further results regarding quantized consensus were reported in[66]–[68],where the main research was also on the convergence time for various proposed quantized consensus algorithms as well as the quantization effects on the convergence time.It is intuitively reasonable that the convergence time depends on both the quantization level and the network topology.It is then natural to ask if and how the quantization methods affect the convergence time.This is an important measure of the robustness of a quantized consensus algorithm(with respect to the quantization method).Note that it is interesting but also more challenging to study consensus for general linear/nonlinear systems with quantiza-tion.Because the difference between the truncated signal and the original signal is bounded,consensus with quantization can be considered as a special case of one without quantization when there exist bounded disturbances.Therefore,if consensus can be achieved for a group of vehicles in the absence of quantization,it might be intuitively correct to say that the differences among the states of all vehicles will be bounded if the quantization precision is small enough.However,it is still an open question to rigorously describe the quantization effects on consensus with general linear/nonlinear systems.E.RemarksIn summary,the existing research on the consensus problem has covered a number of physical properties for practical systems and control performance analysis.However,the study of the consensus problem covering multiple physical properties and/or control performance analysis has been largely ignored. In other words,two or more problems discussed in the above subsections might need to be taken into consideration simul-taneously when studying the consensus problem.In addition, consensus algorithms normally guarantee the agreement of a team of agents on some common states without taking group formation into consideration.To reflect many practical applications where a group of agents are normally required to form some preferred geometric structure,it is desirable to consider a task-oriented formation control problem for a group of mobile agents,which motivates the study of formation control presented in the next section.IV.F ORMATION C ONTROLCompared with the consensus problem where thefinal states of all agents typically reach a singleton,thefinal states of all agents can be more diversified under the formation control scenario.Indeed,formation control is more desirable in many practical applications such as formationflying,co-operative transportation,sensor networks,as well as combat intelligence,surveillance,and reconnaissance.In addition,theperformance of a team of agents working cooperatively oftenexceeds the simple integration of the performances of all individual agents.For its broad applications and advantages,formation control has been a very active research subject inthe control systems community,where a certain geometric pattern is aimed to form with or without a group reference.More precisely,the main objective of formation control is to coordinate a group of agents such that they can achievesome desired formation so that some tasks can befinished bythe collaboration of the agents.Generally speaking,formation control can be categorized according to the group reference.Formation control without a group reference,called formationproducing,refers to the algorithm design for a group of agents to reach some pre-desired geometric pattern in the absenceof a group reference,which can also be considered as the control objective.Formation control with a group reference,called formation tracking,refers to the same task but followingthe predesignated group reference.Due to the existence of the group reference,formation tracking is usually much morechallenging than formation producing and control algorithmsfor the latter might not be useful for the former.As of today, there are still many open questions in solving the formationtracking problem.The following part of the section reviews and discussesrecent research results and progress in formation control, including formation producing and formation tracking,mainlyaccomplished after2006.Several milestone results prior to 2006can be found in[69]–[71].A.Formation ProducingThe existing work in formation control aims at analyzingthe formation behavior under certain control laws,along with stability analysis.1)Matrix Theory Approach:Due to the nature of multi-agent systems,matrix theory has been frequently used in thestability analysis of their distributed coordination.Note that consensus input to each agent(see e.g.,(2))isessentially a weighted average of the differences between the states of the agent’s neighbors and its own.As an extensionof the consensus algorithms,some coupling matrices wereintroduced here to offset the corresponding control inputs by some angles[72],[73].For example,given(1),the controlinput(2)is revised as u i(t)= n j=1a ij(t)C[x j(t)−x i(t)], where C is a coupling matrix with compatible size.If x i∈R3, then C can be viewed as the3-D rotational matrix.The mainidea behind the revised algorithm is that the original controlinput for reaching consensus is now rotated by some angles. The closed-loop system can be expressed in a vector form, whose stability can be determined by studying the distribution of the eigenvalues of a certain transfer matrix.Main research work was conducted in[72],[73]to analyze the collective motions for systems with single-integrator kinematics and double-integrator dynamics,where the network topology,the damping gain,and C were shown to affect the collective motions.Analogously,the collective motions for a team of nonlinear self-propelling agents were shown to be affected by。
The United States of America is a constitutional federal republic
伟大的兔子制作The United States of America is a constitutional federal republic, situated primarily in North America. It comprises 50 states and one federal district, and has several territories with differing degrees of affiliation. It is also referred to, with varying formality, as the United States, the U.S., the U.S.A., the U.S. of A., America, the States, or (poetically) Columbia.Since the mid-20th century, following World War II, the United States has emerged as a dominant global influence in economic, political, military, scientific, technological, and cultural affairs. Because of its influence, the U.S. is considered a superpower and, particularly after the Cold War, a hyperpower by some.The country celebrates its founding date as July 4, 1776, when the Second Continental Congress—representing thirteen British colonies—adopted the Declaration of Independence that rejected British authority in favor of self-determination.The structure of the government was profoundly changed in 1789, when the states replaced the Articles of Confederation with the United States Constitution. The date on which each of the fifty states adopted the Constitution is typically regarded as the date that state "entered the Union" to become part of the United States.美利坚合众国是一个宪法联邦共和国,主要位于北美。
textrank算法的基本原理_概述及解释说明
textrank算法的基本原理概述及解释说明1. 引言1.1 概述在信息爆炸时代,人们每天都会接触到大量的文本信息,如新闻报道、社交媒体评论、学术论文等。
如何从海量的文本中提取出关键信息变得越来越重要。
关键词提取和文本摘要生成是两个基本的自然语言处理任务,旨在帮助用户快速理解和浏览文本内容。
textrank算法是一种基于图模型的无监督算法,通过分析文本中单词之间的相互关系来计算单词或句子的重要性,并根据其重要性对其进行排序。
该算法最初由Mihalcea等人于2004年提出,在自然语言处理领域具有广泛应用。
1.2 文章结构本文将介绍textrank算法的基本原理,并详细解释其在关键词提取和文本摘要生成两个任务中的应用。
接着,我们将通过三个主要步骤来解释算法实现过程,包括数据预处理、构建词图网络以及计算节点重要性得分。
在第四部分,我们将对textrank算法的优点和缺点进行分析,并讨论可能的改进措施。
最后,在结论与展望部分,我们将总结textrank算法的主要发现和贡献,并展望其在未来研究方向和应用场景中的潜力。
1.3 目的本文的目的是深入探讨textrank算法在自然语言处理中的应用。
通过详细解释算法原理和实现过程,我们希望读者能够全面了解textrank算法,并对其在关键词提取和文本摘要生成等任务中的有效性有更深入的认识。
同时,通过分析算法的优缺点和讨论可能的改进措施,我们希望为该领域的研究者提供进一步研究和改进的思路。
最终,我们希望本文能够启发人们对于自然语言处理技术的思考,并促进相关领域的发展与创新。
2. textrank算法的基本原理:2.1 关键词提取:关键词提取是textrank算法的一个重要应用,它可以自动从文本中抽取出关键词。
textrank算法利用单词或短语在文本中的共现关系来计算关键词的重要性。
首先,将文本进行分词处理,得到一组单词或短语。
然后,通过构建一个无向有权图来表示这些单词或短语之间的共现关系。
firms constracts and trade structure
Firms,Contracts,and Trade Structure∗Pol AntràsHarvard University and NBERThis Version:May2003AbstractRoughly one-third of world trade is intrafirm trade.This paper starts by unveiling two systematic patterns in the volume of intrafirm trade.In a panelof industries,the share of intrafirm imports in total U.S.imports is signifi-cantly higher,the higher the capital intensity of the exporting industry.Ina cross-section of countries,the share of intrafirm imports in total U.S.im-ports is significantly higher,the higher the capital-labor ratio of the exportingcountry.I then show that these patterns can be rationalized in a theoreti-cal framework that combines a Grossman-Hart-Moore view of thefirm witha Helpman-Krugman view of international trade.In particular,I develop anincomplete-contracting,property-rights model of the boundaries of thefirm,which I then incorporate into a standard trade model with imperfect compe-tition and product differentiation.The model pins down the boundaries ofmultinationalfirms as well as the international location of production,and itis shown to predict the patterns of intrafirm trade identified above.Econo-metric evidence reveals that the model is consistent with other qualitative andquantitative features of the data.Keywords Property-rights theory,Multinational Firms,International Trade, Intrafirm Trade.JEL Classification Numbers D23,F12,F14,F21,F23,L22,L33∗The shorter,final version was published in the Quarterly Journal of Economics in November 2003.I am grateful to Daron Acemoglu,George-Marios Angeletos,Gene Grossman,and Jaume Ven-tura for invaluable guidance,and to Manuel Amador,Lucia Breierova,Francesco Caselli,Fritz Foley, Gino Gancia,Andrew Hertzberg,Elhanan Helpman,Bengt Holmström,Ben Jones,Oscar Lander-retche,Alexis León,Gerard Padró-i-Miquel,Thomas Philippon,Diego Puga,Jeremy Stein,Joachim Voth,two anonymous referees,and the editor(Edward Glaeser)for very helpful comments.I have also benefited from suggestions by seminar participants at UC Berkeley,Chicago GSB,Columbia, Harvard,LSE MIT,NBER,Northwestern,NYU,Princeton,UC San Diego,Stanford,and Yale. Financial support from the Bank of Spain is gratefully acknowledged.All remaining errors are my own.Correspondence:pantras@.Correspondence:Department of Economics, Harvard University,Littauer230,Cambridge,MA02138.Email:pantras@.1IntroductionRoughly one-third of world trade is intrafirm trade.In1994,42.7percent of the total volume of U.S.imports of goods took place within the boundaries of multinational firms,with the share being36.3percent for U.S.exports of goods(Zeile,1997).In spite of the clear significance of these internationalflows of goods between affiliated units of multinationalfirms,the available empirical studies on intrafirm trade provide little guidance to international trade theorists.In this paper I unveil some novel patterns exhibited by the volume of U.S.intrafirm imports and I argue that these patterns can be rationalized combining a Grossman-Hart-Moore view of thefirm, together with a Helpman-Krugman view of international trade.In a hypothetical world in whichfirm boundaries had no bearing on the pattern of international trade,one would expect only random differences between the behavior of the volume of intrafirm trade and that of the total volume of trade.In particular,the share of intrafirm trade in total trade would not be expected to correlate significantly with any of the classical determinants of international trade.Figure1provides afirst illustration of how different the real world is from this hypothetical world.In a panel consisting of23manufacturing industries and four years of data(1987,1989,1992,and1994),the share of intrafirm imports in total U.S.imports is significantly higher,the higher the capital intensity in production of the exporting industry.Figure1indicates thatfirms in the U.S.tend to import capital-intensive goods,such as chemical products,within the boundaries of their firms,while they tend to import labor-intensive goods,such as textile products,from unaffiliated parties.1Figure2unveils a second strong pattern in the share of intrafirm imports.In a cross-section of28countries,the share of intrafirm imports in total U.S.imports is significantly higher,the higher the capital-labor ratio of the exporting country. U.S.imports from capital-abundant countries,such as Switzerland,tend to take place 1The pattern in Figure1is consistent with Gereffi’s(1999)distinction between‘producer-driven’and‘buyer-driven’international economic networks.Thefirst,he writes,is“characteristic of capi-tal-and technology-intensive industries[...]in which large,usually transnational,manufacturers play the central roles in coordinating production networks”(p.41).Conversely,‘buyer-driven’networks are common in“labor-intensive,consumer goods industries”and are characterized by “highly competitive,locally owned,and globally dispersed production systems”(pp.42-43).The emphasis is my own.Figure1:Share of Intrafirm U.S.Imports and Relative Factor IntensitiesNotes:The Y-axis corresponds to the logarithm of the share of intrafirm imports in total U.S. imports for 23 manufacturingindustries and 4 years: 1987, 1989, 1992, 1994. The X-axis measures the log of that industry’s ratio of capital stock to totalemployment in the corresponding year, using U.S. data. See Table A.1. for industry codes and Appendix A.4. for data sources. Figure2:Share of Intrafirm U.S.Imports and Relative Factor EndowmentsNotes:The Y-axis corresponds to the logarithm of the share of intrafirm imports in total U.S. imports for 28 exportingcountries in 1992. The X-axis measures the log of the exporting country’s physical capital stock divided by its total number ofworkers. See Table A.2. for country codes and Appendix A.4. for details on data sources.between affiliated units of multinationalfirms.Conversely,U.S.imports from capital-scarce countries,such as Egypt,occur mostly at arm’s length.This second fact suggests that the well-known predominance of North-North trade in total world trade is even more pronounced within the intrafirm component of trade.2 Why are capital-intensive goods transacted within the boundaries of multinational firms,while labor-intensive goods are traded at arm’s length?3To answer this ques-tion,I build on the theory of thefirm initially exposited in Coase(1937)and later de-veloped by Williamson(1985)and Grossman and Hart(1986),by which activities take place wherever transaction costs are minimized.In particular,I develop a property-rights model of the boundaries of thefirm in which,in equilibrium,transaction costs of using the market are increasing in the capital intensity of the imported good. To explain the cross-country pattern in Figure2,I embed this partial-equilibrium framework in a general-equilibrium,factor-proportions model of international trade, with imperfect competition and product differentiation,along the lines of Helpman and Krugman(1985).The model pins down the boundaries of multinationalfirms as well as the international location of production.Bilateral tradeflows between any two countries are uniquely determined and the implied relationship between in-trafirm trade and relative factor endowments is shown to correspond to that in Figure 2.The result naturally follows from the interaction of comparative advantage and transaction-cost minimization.In drawingfirm boundaries,I build on the seminal work of Grossman and Hart (1986).I consider a world of incomplete contracts in which ownership corresponds to the entitlement of some residual rights of control.When parties undertake non-contractible,relationship-specific investments,the allocation of residual rights has a critical effect on each party’s ex-post outside option,which in turn determines each party’s ex-ante incentives to invest.Ex-ante efficiency(i.e.,transaction-cost mini-2This is consistent with comparisons based on foreign direct investment(FDI)data.In the year 2000,more than85%of FDIflows occured between developed countries(UNCTAD,2001),while the share of North-North trade in total world trade was only roughly70%(World Trade Organization, 2001).3At this point,a natural question is whether capital intensity and capital abundance are truly the crucial factors driving the correlations in Figures1and2.In particular,these patterns could in principle be driven by other omitted factors.Section4will present formal econometric evidence in favor of the emphasis placed on capital intensity and capital abundance in this paper.mization)then dictates that residual rights should be controlled by the party whose investment contributes most to the value of the relationship.To explain the higher propensity to integrate in capital-intensive industries,I extend the framework of Grossman and Hart(1986)by allowing the transferability of certain investment decisions.In situations in which the default option for one of the parties(a supplier in the model)is too unfavorable,the allocation of residual rights may not suffice to induce adequate levels of investment.In such situations,I show that the hold-up problem faced by the party with the weaker bargaining position may be alleviated by having another party(afinal-good producer in the model)contribute to the former’s relationship-specific investments.Investment-sharing alleviates the hold-up problem faced by suppliers,but naturally increases the exposure offinal-good producers to opportunistic behavior,with the exposure being an increasing function of the contribution to investment costs.If cost sharing is large enough,ex-ante efficiency is shown to command that residual rights of control,and thus ownership,be assigned to thefinal-good producer,thus giving rise to vertical integration.Conversely,when the contribution of thefinal-good producer is relatively minor,the model predicts outsourcing.What determines then the extent of cost sharing?Business practices suggest that, in many situations,investments in physical capital are easier to share than invest-ments in labor input.Dunning(1993,p.455-456)describes several cost-sharing practices of multinationalfirms in their relations with independent subcontractors. Among others,these include provision of used machinery and specialized tools and equipment,prefinancing of machinery and tools,and procurement assistance in ob-taining capital equipment and raw materials.There is no reference to cost sharing in labor costs,other than in labor grom and Roberts(1993)discuss the particular example of General Motors,which pays forfirm-and product-specific capital equipment needed by their suppliers,even when this equipment is located in the suppliers’facilities.Similarly,in his review article on Japanesefirms,Aoki(1990, p.25)describes the close connections betweenfinal-good manufacturers and their suppliers but writes that“suppliers have considerable autonomy in other respects, for example in personnel administration”.Even withinfirm boundaries,cost sharing seems to mostly take place when capital investments are involved.In particular,Ta-ble1indicates that British affiliates of U.S.-based multinationals tend to have much more independence in their employment decisions(e.g.,in hiring of workers)than in theirfinancial decisions(e.g.,in their choice of capital investment projects).Table1.Decision-Making in U.S.based multinationals%of British affiliates in which parent influence on decision is strong or decisiveSetting offinancial targets51Union recognition4 Preparation of yearly budget20Collective bargaining1 Acquisition of funds for working capital44Wage increases8 Choice of capital investment projects33Numbers employed13 Financing of investment projects46Lay-offs/redundancies10 Target rate of return on investment68Hiring of workers10 Sale offixed assets30Recruitment of executives16 Dividend policy82Recruitment of senior managers13 In this paper,I do not intend to explain why cost sharing is more significant in physical capital investments than in labor input investments.This may be the result of suppliers having superior local knowledge in hiring workers,or it may be explained by the fact that managing workers requires a physical presence in the production plant.Regardless of the source of this asymmetry,the model developed in section2 shows that if cost sharing is indeed more significant in capital-intensive industries, the propensity to integrate will also be higher in these industries.In order to explain the trade patterns shown in Figures1and2,I then embed the partial-equilibrium relationship betweenfinal-good producers and suppliers into a general-equilibrium framework with a continuum of goods in each of two industries.In section3,I open this economy to international trade,allowingfinal-good producers to obtain inter-mediate inputs from foreign suppliers.In doing so,I embrace a Helpman-Krugman view of international trade with imperfect competition and product differentiation, by which countries specialize in producing certain varieties of intermediate inputs and export them worldwide.Trade in capital-intensive intermediate inputs will be trans-acted withinfirm boundaries.Trade in labor-intensive goods will instead take place at arm’s length.The model solves for bilateral tradeflows between any two countries,and predicts the share of intrafirm imports in total imports to be increasing in the capital-labor ratio of the exporting country.4This is the correlation implied by Fig-ure2.Moreover,some of the quantitative implications of the model are successfully tested in section4.This paper is related to several branches in the literature.On the one hand, it is related to previous theoretical studies that have rationalized the existence of multinationalfirms in general-equilibrium models of international trade.5Helpman’s (1984)model introduced a distinction betweenfirm-level and plant-level economies of scale that has proven crucial in later work.In his model,multinationals arise only outside the factor price equalization set,when afirm has an incentive to geograph-ically separate the capital-intensive production of an intangible asset(headquarter services)from the more labor-intensive production of goods.Following the work of Markusen(1984)and Brainard(1997),an alternative branch of the literature has developed models rationalizing the emergence of multinationalfirms in the absence of factor endowment differences.In these models,multinationals will exist in equilib-rium whenever transport costs are high and wheneverfirm-specific economies of scale are high relative to plant-specific economies of scale.6,7These two approaches to the multinationalfirm share a common failure to properly model the crucial issue of internalization.These models can explain why a domestic firm might have an incentive to undertake part of its production process abroad,but they fail to explain why this foreign production will occur withinfirm boundaries(i.e., within multinationals),rather than through arm’s length subcontracting or licensing.4This second part of the argument is based on the premise that capital-abundant countries tend to produce mostly capital-intensive commodities.Romalis(2002)has recently shown that the empirical evidence is indeed consistent with factor proportions being a key determinant of the structure of international trade.5The literature builds on the seminal work of Helpman(1984)and Markusen(1984).For extensive reviews see Caves(1996)and Markusen and Maskus(2001).6The intuition for this result is straightforward:whenfirm-specific economies of scale are im-portant,costs are minimized by undertaking all production within a singlefirm.If transport costs are high and plant-specific economies of scale are small,then it will be profitable to set up multiple production plants to service the different local markets.Multinationals are thus of the“horizontal type”.7Recently,the literature seems to have converged to a“unified”view of the multinationalfirm, merging the factor-proportions(or“vertical”)approach of Helpman(1984),together with the “proximity-concentration”trade-offimplicit in Brainard(1997)and others.Markusen and Maskus (2001)refer to this approach as the“Knowledge-Capital Model”and claim that its predictions are widely supported by the evidence.In the same way that a theory of thefirm based purely on technological considerations does not constitute a satisfactory theory of thefirm(c.f.,Tirole,1988,Hart,1995), a theory of the multinationalfirm based solely on economies of scale and transport costs cannot be satisfactory either.As described above,I will instead set forth a purely organizational,property-rights model of the multinationalfirm.My model will make no distinction betweenfirm-specific and plant-specific economies of scale. Furthermore,trade will be costless and factor prices will not differ across countries. Yet multinationals will emerge in equilibrium,and their implied intrafirm tradeflows will match the strong patterns identified above.This paper is also related to previous attempts to model the internalization deci-sion of multinationalsfirms.Following the insights from the seminal work of Casson (1979),Rugman(1981)and others,this literature has constructed models studying the role of informational asymmetries and knowledge non-excludability in determin-ing the choice between direct investment and licensing(e.g.,Ethier,1986,Ethier and Markusen,1996).Among other things,this paper differs from this literature in stressing the importance of capital intensity and the allocation of residual rights in the internalization decision,and perhaps more importantly,in describing and testing the implications of such a decision for the pattern of intrafirm trade.Finally,this paper is also related to an emerging literature on general-equilibrium models of industry structure(e.g.,McLaren,2000,Grossman and Helpman,2002a). My theoretical framework shares some features with the recent contribution by Gross-man and Helpman.In their model,however,the costs of transacting inside thefirm are introduced by having integrated suppliers incur exogenously higher variable costs (as in Williamson,1985).More importantly,theirs is a closed-economy model and therefore does not consider international trade in goods,which of course is central in my contribution.8The rest of the paper is organized as follows.Section2describes the closed-economy version of the model and studies the role of factor intensity in determining 8Although in this paper I show that a Grossman-Hart-Moore view of thefirm is consistent with the facts in Figures1and2,neither my theoretical model nor the available empirical evidence is rich enough to test this view of thefirm against alternative ones.This would be a major undertaking on its own.See Baker and Hubbard(2002)and Whinston(2002)for more formal treatments of these issues.the equilibrium mode of organization in a given industry.Section3describes the multi-country version of the model and discusses the international location of produc-tion as well as the implied patterns of intrafirm trade.Section4presents econometric evidence supporting the view that both capital intensity and capital abundance are significant factors in explaining the pattern of intrafirm U.S.imports.Section5con-cludes.The proofs of the main results are relegated to the Appendix.2The Closed-Economy Model:Ownership and Cap-ital IntensityThis section describes the closed-economy version of the model.In section3below,I will reinterpret the equilibrium of this closed economy as that of an integrated world economy.The features of this equilibrium will then be used to analyze the patterns of specialization and trade in a world in which the endowments of the integrated economy are divided up among countries.2.1Set-upEnvironment Consider a closed economy that employs two factors of production, capital and labor,to produce a continuum of varieties in two sectors,Y and Z. Capital and labor are inelastically supplied and freely mobile across sectors.The economy is inhabited by a unit measure of identical consumers that view the varieties in each industry as differentiated.In particular,letting y(i)and z(i)be consumption of variety i in sectors Y and Z,preferences of the representative consumer are of the formU=µZ n Y0y(i)αdi¶µαµZ n Z0z(i)αdi¶1−µα,(1) where n Y(n Z)is the endogenously determined measure of varieties in industry Y (Z).Consumers allocate a constant shareµ∈(0,1)of their spending in sector Y and a share1−µin sector Z.The elasticity of substitution between any two varieties in a given sector,1/(1−α),is assumed to be greater than one.Technology Goods are also differentiated in the eyes of producers.In particular, each variety y(i)requires a special and distinct intermediate input which I denote byx Y(i).Similarly,in sector Z,each variety z(i)requires a distinct component x Z(i). The specialized intermediate input must be of high quality,otherwise the output of thefinal good is zero.If the input is of high quality,production of thefinal good requires no further costs and y(i)=x Y(i)(or z(i)=x Z(i)in sector Z).Production of a high-quality intermediate input requires capital and labor.For simplicity,technology is assumed to be Cobb-Douglas:x k(i)=µK x,k(i)k¶βkµL x,k(i)k¶1−βk,k∈{Y,Z}(2) where K x,k(i)and L x,k(i)denote the amount of capital and labor employed in pro-duction of variety i in industry k∈{Y,Z}.I assume that industry Y is more capital-intensive than industry Z,i.e.1≥βY>βZ≥0.Low-quality intermediate inputs can be produced at a negligible cost in both sectors.There are alsofixed costs associated with the production of an intermediate in-put.For simplicity,it is assumed thatfixed costs in each industry have the same factor intensity as variable costs,so that the total cost functions are homothetic.In particular,fixed costs for each variety in industry k∈{Y,Z}are frβk w1−βk,where r is the rental rate of capital and w the wage rate.Firm structure There are two types of producers:final-good producers and sup-pliers of intermediate inputs.Before any investment is made,afinal-good producer decides whether it wants to enter a given market,and if so,whether to obtain the component from a vertically integrated supplier or from a stand-alone supplier.An integrated supplier is just a division of thefinal-good producer and thus has no con-trol rights over the amount of input produced.Figuratively,at any point in time the parentfirm could selectivelyfire the manager of the supplying division and seize production.Conversely,a stand-alone supplier does indeed have these residual rights of control.In Hart and Moore’s(1990)words,in such a case thefinal-good producer could only“fire”the entire supplyingfirm,including its production.Integrated and non-integrated suppliers differ only in the residual rights they are entitled to,and in particular both have access to the same technology as specified in(2).9 9This is in contrast with the transaction-cost literature that usually assumes that integration leads to an exogenous increase in variable costs(e.g.Williamson,1985,Grossman and Helpman,As discussed in the introduction,a premise of this paper is that investments in physical capital are easier to share than investments in labor input.To capture this idea,I assume that while the labor input is necessarily provided by the supplier,capital expenditures rK x,k (i )are instead transferable,in the sense that the final-good producer can decide whether to let the supplier incur this factor cost too,or rather rent the capital itself and hand it to the supplier at no charge.10Irrespective of who bears their cost,the investments in capital and labor are chosen simultaneously and non-cooperatively.11Once a final-good producer and its supplier enter the market,they are locked into the relationship:the investments rK x,k (i )and wL x,k (i )are incurred upon entry and are useless outside the relationship.In Williamson’s (1985)words,the initially competitive environment is fundamentally transformed into one of bilateral monopoly.Regardless of firm structure and the choice of cost sharing,fixed costs associated with production of the component are divided as follows:f F r βk w 1−βk for the final-good producer and f S r βk w 1−βk for the supplier,with f F +f S =f .12Free entry into each sector ensures zero expected pro fits for a potential entrant.To simplify the description of the industry equilibrium,I assume that upon entry the supplier makes a lump-sum transfer T k (i )to the final-good producer,which can vary by industry and variety.Ex-ante,there is a large number of identical,potential suppliers for each variety in each industry,so that competition among these suppliers 2002a).10Alternatively,one could assume that labor costs are also transferrable,but that their transfer leads to a signi ficant fall in productivity.This fall in productivity could be explained,in an interna-tional context,by the inability of multinational firms to cope with idiosyncratic labor markets (c.f.,Caves,1996,p.123).11The assumption that the final-good producer decides between bearing all or none of the capital expenditures can be relaxed to a case of partial transferability.For instance,imagine that x k (i )was produced according to:x k (i )=ÃK F x,k (i )βk !βk ÃK S x,k (i )η(βk )(1−βk )!η(βk )(1−βk )µL x,k (i )(1−η(βk ))(1−βk )¶(1−η(βk ))(1−βk )where K F x,k (i )represents the part of the capital input that is transferable,and where K S x,k (i )is inalieanable to the supplier.As long as the elasticity of output with respect to transferable capital is higher,the higher the capital intensity in production,the same qualitative results would go through.In particular,as long as βk +η(βk )(1−βk )increases with βk ,the model would still predict more integration in capital-intensive industries (see footnote 24).I follow the simpler speci fication in (2)because it greatly simpli fies the algebra of the general equilibrium.12Henceforth,I associate a subscript F with the final-good producer and a subscript S with the supplier.will make T k(i)adjust so as to make them break even.Thefinal-good producer chooses the mode of organization so as to maximize its ex-ante profits,which include the transfer.Contract Incompleteness The setting is one of incomplete contracts.In partic-ular,it is assumed that an outside party cannot distinguish between a high-quality and a low-quality intermediate input.Hence,input suppliers andfinal-good pro-ducers cannot sign enforceable contracts specifying the purchase of a certain type of intermediate input for a certain price.If they did,input suppliers would have an incentive to produce a low-quality input at the lower cost and still cash the same revenues.I take the existence of contract incompleteness as a fact of life,and will not complicate the model to relax the informational assumptions needed for this in-completeness to exist.13It is equally assumed that no outside party can verify the amount of ex-ante investments rK x,k(i)and wL x,k(i).If these were verifiable,then final-good producers and suppliers could contract on them,and the cost-reducing benefit of producing a low-quality input would disappear.For the same reason,it is assumed that the parties cannot write contracts contingent on the volume of sale revenues obtained when thefinal good is sold.Following Grossman and Hart(1986), the only contractibles ex-ante are the allocation of residual rights and the transfer T k(i)between the parties.14If the supplier incurs all variable costs,the contract incompleteness gives rise to a standard hold-up problem.Thefinal-good producer will want to renegotiate the price after x k(i)has been produced,since at this point the intermediate input is useless outside the relationship.Foreseeing this renegotiation,the input supplier will 13>From the work of Aghion,Dewatripont and Rey(1994),Nöldeke and Schmidt(1995)and others,it is well-known that allowing for specific-performance contracts can lead,under certain circumstances,to efficient ex-ante relationship-specific investments.Che and Hausch(1997)have shown,however,that when ex-ante investments are cooperative(in the sense,that one party’s invest-ment benefits the other party),specific-performance contracts may not lead tofirst-best investment levels and may actually have no value.14The assumption of non-contractibility of ex-ante investments could be relaxed to a case of partial contractibility.I have investigated an extension of the model in which production requires both contractible and non-contractible investments.If the marginal cost of non-contractible investments is increasing in the amount of contractible investments,the ability to set the contractible investments in the ex-ante contract is not sufficient to solve the underinvestment problem discussed below,and the model delivers results analogous to the ones discussed in the main text.。
S t e r e o M a t c h i n g 文 献 笔 记
立体匹配综述阅读心得之Classification and evaluation of cost aggregation methods for stereo correspondence学习笔记之基于代价聚合算法的分类,主要针对cost aggregration 分类,20081.?Introduction经典的全局算法有:本文主要内容有:从精度的角度对比各个算法,主要基于文献【23】给出的评估方法,同时也在计算复杂度上进行了比较,最后综合这两方面提出一个trade-off的比较。
2?Classification?of?cost?aggregation?strategies?主要分为两种:1)The?former?generalizes?the?concept?of?variable?support?by? allowing?the?support?to?have?any?shape?instead?of?being?built?u pon?rectangular?windows?only.2)The?latter?assigns?adaptive?-?rather?than?fixed?-?weights?to?th e?points?belonging?to?the?support.大部分的代价聚合都是采用symmetric方案,也就是综合两幅图的信息。
(实际上在后面的博客中也可以发现,不一定要采用symmetric的形式,而可以采用asymmetric+TAC的形式,效果反而更好)。
采用的匹配函数为(matching?(or?error)?function?):Lp distance between two vectors包括SAD、Truncated SAD [30,25]、SSD、M-estimator [12]、similarity?function?based?on?point?distinctiveness[32] 最后要指出的是,本文基于平行平面(fronto-parallel)support。
英语问题类型作文模板
英语问题类型作文模板英文回答:What are the different types of English question types?There are many different types of English question types, but they can be broadly classified into three main categories:1. Factual questions ask for information that can be found in a text or passage. These questions are typically answered with a single word or phrase. For example:What is the capital of France?How many states are there in the United States?2. Inferential questions ask for information that is not explicitly stated in a text or passage. These questions require the reader to make inferences based on theinformation they have read. For example:Why did the author write this passage?What is the main idea of this paragraph?3. Evaluative questions ask for the reader's opinion or judgment about a text or passage. These questions cannot be answered with a single word or phrase, and they require the reader to provide a more developed response. For example:Do you agree with the author's argument?What do you think of the author's writing style?How to answer different types of English questions。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Enumerating Global States of a Distributed ComputationVijay K.GargDepartment of Electrical and Computer EngineeringThe University of Texas at AustinAustin,TX78712-1084,USAgarg@ABSTRACTGlobal predicate detection is a fundamental problem indistributed computing in the areas of distributed debug-ging and software fault-tolerance.It requires searching theglobal state lattice of a computation to determine if anyconsistent global state satisfies the given predicate.Wegive an efficient algorithm that perform the lex traversal ofthe lattice.We also give a space efficient algorithm for thebreadth-first-search(BFS)traversal.KEY WORDSGlobal Predicate Detection,Combinatorial Enumeration,Lattices,Ideals1IntroductionGlobal predicate detection is a fundamental problem in dis-tributed debugging[1,2].For debugging a distributed pro-gram,it is useful to monitor and stop the execution whenthe user specified condition,a global predicate,becomestrue.For example,the user may specify that the executionshould be stopped when where is a variableon process.Here is a global predicateand the debugger needs to detect the condition and stop theprogram in a consistent global state that satisfies the condi-tion.Given a distributed computation,the global predicatedetection problem asks whether there exists a consistentglobal state(CGS)[3]in which the predicate is r-mally,a global state is consistent if for any message whosereceive event is included in the global state,its send eventis also included.For example,consider the computationin Figure1(a).Its CGS lattice is shown in Figure1(b).The global state denotes that thefirst process hasexecuted events and the second process has executedevents in that state.Thus,the global state signifiesthat has executed and and has executed.Itcan alternatively be also viewed as the subset.The global state is not consistent because it includes thereceive event but not the send event which happenedbefore.size of a level of a lattice.It is based on efficient enumera-tion of a level set by enumerating all integer compositions .ee f 1f 2f 3e BFS: 00, 01, 10, 11, 20, 12, 21, 13, 22, 23, 33DFS: 00, 10, 20, 21, 22, 23, 33, 11, 12, 13, 01Lexical: 00, 01, 10, 11, 12, 13, 20, 21, 22, 23, 33(c)(b)0033(a)Figure 1.(a)A computation (b)Its lattice of consistent global states (c)Traversals of the lattice.We note that all the traversals discussed in the paper are straightforward if one explicitly generates the graph of the CGS lattice.Since this graph is exponential in size,the challenge is to traverse the graph without storing either the complete graph or a major part of it.Enumerating CGS in the lex and the BFS order is also useful in combinatorial applications.In [6]we have shown that many families of combinatorial objects can be mapped either to the CGS lattices or to the level sets of the CGS lat-tices of appropriate computations.Thus,algorithms for lex and BFS traversal discussed in the paper can also be used to efficiently enumerate all subsets of ,all subsets of of size ,all permutations,all integer partitions less than a given partition,all integer partitions of a given number,and all -tuples of a product space.Note that [7]gives dif-ferent algorithms for these enumerations.Our algorithm is generic and by instantiating it with different posets all the above combinatorial lex enumeration can be achieved.2Model and BackgroundThe execution of a single process in a computation results in a sequence of events totally ordered by the relation oc-curred before .We useto denote that occurred before on some process.To impose an order relation on events across processes,we use Lamport’s happened-before relation [8].We define a distributed computation as the partially ordered set (poset)consisting of the set of events together with the happened before relation and de-note it by.Two events and are concurrent in ,(denoted by ),if and .A global state (or,a cut)is a subset such that.A consistent global state (CGS)of a computationis a subset such that .For a global state ,denotes the maximal event of in (i.e.there is no event in such that occurred before ).Although we have defined global states as subsets,they can equivalently be defined using vectors of local states as shown in Figure 1.In this case equals the number of events executed by in .A global predicate (or simply a predicate )is a boolean-valued function defined on the set of consistentglobal states.We say that(holds in the CGS )if the function evaluates to true in .A lattice is a poset such that for all ,the least upper bound of and exists,called the join of and (denoted by );and the greatest lower bound of and exists,called the meet of and (denoted by).A lattice is distributive if for all :.Given a computation ,we impose an order on the set of global states as follows.Given two consistent global states,and ,we say that is less than iff .It is well known in the lattice theory that the set of all CGS form a distributive lattice under relation.3An Algorithm for Enumeration of Ideals in Lex orderIt is useful to impose on the set of global states the lex or thedictionary order.We define the lex orderas follows.iffThis imposes a total order on all global states by as-signing higher priority to small numbered processes.Forexample,in Figure 1,global statebecause has executed more events in the global state than in.We use for the reflexive closure of the relation.Recall that we have earlier used the order on the set of global states which is a partial order.The order shown in Figure 1(b)is equivalent toNote that although.Note that we have two orders on the set of global states—the partial order based on containment(),and the total order based on lex ordering().The relationship between the two orders defined is given by the following lemma.Lemma1.Proof:implies that.The lemma follows from the definition of the lex order.We now show how can be computed. Theorem1Assume that is a CGS such that it is not the greatest CGS.Then,where is the index of the process with the smallest priority which has an event enabled in.Proof:We define the following global states for conve-nience:,and.Our goal is to prove that.contains at least one event that is not in;oth-erwise and therefore cannot be lexically bigger. Choose from the highest priority process pos-sible.Let be the event in the smallest priority process enabled in,i.e.,is on process.Let and denote the process indices of and.We now do a case analysis.Case1:In this case,is from a lower priority process than.We show that this case implies that is lexically smaller than .Wefirst claim that.This is because is a CGS containing and is the smallest CGS containing.Now,since and contains an event from a higher priority process than,it follows that is lexically smaller than ,a contradiction.Case2:Recall that event is on the process with the smallest priority that had any event enabled at.Therefore, existence of in CGS implies existence of at least another event in at a process with higher priority than.This contradicts choice of event because,by definition,is from the highest prioriy process in. Case3:.Then,because both and have identical events on process with priority or higher and has no events in lower priority processes.Since is a CGS and ,we get that by definition of.From Lemma1,it follows that.But is the next lecical state after.Therefore,.2.Let.Then,lexTraverse() Incorporating these observations,we get the algo-rithm in Figure2.The outer while loop at line(1)iteratestill all consistent global states are visited.If the currentCGS satisfies the given predicate,then we are doneand is returned as the lexicographically minimum CGS.Lines(4)-(22)generate.Lines(4)-(14)de-termine the lowest priority process which has an eventenabled in.The for loop on line(4)is exited whenan enabled event is found at line(12).We are guaran-teed to get an enabled event because is not thefinalCGS.Lines(8)-(12)check if the next event on is en-abled.This is done using the vector clock.An eventin enabled in a CGS iff all the events that depend onhave been executed in;or equivalently,all the compo-nents of the vector clock for other processes in are lessthan or equal to the components of the vector clock in.This test is performed in lines(8)-(11).Lines(15)to(17)compute.Finally,lines(18)-(22)compute.Let us now analyze the time and space complexity ofthe above algorithm.The while loop iterates once per CGSof the computation.Each iteration takes time due tonested for.Thus the total time taken is.The algo-rithm uses variables and which requiresspace.We also assume that the events are represented usingtheir vector clocks.4Algorithms for BFS generation of IdealsFor many applications we may be interested in generat-ing consistent global states in the BFS order,for example,when we want to generate elements in a single level of theCGS lattice.The lex algorithm is not useful for that pur-pose.Cooper and Marzullo[2]have given an algorithm todetect based on the level set enumeration.They keep two lists of consistent global states:last andcurrent.To generate the next level of consistent globalstates,they set to and to the set ofglobal states that can be reached from in one transi-tion.Since a CGS can be reached from multiple globalstates,an implementation of their algorithm will result ineither holding multiple copies of a CGS or addedcomplexity in the algorithm to ensure that a CGS is insertedin only when it is not present.This problem oc-curs because their algorithm does not exploit the fact thatthe set of global states form a distributive lattice.We now show an extension of their algorithm whichensures that a CGS is enumerated exactly once and thatthere is no overhead of checking that the CGS has alreadybeen enumerated(or inserted in the list).Our extensionexploits the following observation.Lemma4If is reachable from by executing an event and there exists an event such that is maximalin and concurrent with,then there exists a CGS atthe same level as such that andis reachable from.Proof:Since is consistent and is a maximal event in,it follows that is a CGS.If is enabled at andis concurrent with,then is also enabled at. Therefore,is a CGS.is reachable from on executing.;if then return;//generate CGS at the next levelfor all events enabled in doif()then;end//for;end//while;Figure3.An Extension of Cooper Marzullo Algorithm for BFS enumeration of CGSThe main disadvantage of Cooper and Marzullo’s al-gorithm even with proposed extension is that it requires space at least as large as the number of consistent globalstates in the largest level set.Note that the largest level set is exponential in and therefore when using this algorithmfor a large system we may run out of memory.We now give two algorithms that use polynomialspace to list the global states in the BFS order.Thefirst algorithm is based in integer compositions and consistencychecks and the second algorithm is based on using the DFS (or the lex)traversal multiple number of times to enumerateconsistent global states in the BFS order.Thefirst algorithm for BFS traversal uses consistencycheck to avoid storing the consistent global states.The main idea is to generate all the global states in a level ratherthan storing them.Assume that we are interested in enu-merating level set.Any global state in level set cor-responds to the total number of events executed by pro-cesses to be.A composition of into parts corresponds to a representation of the formwhere each is a natural number and the order of the sum-mands is important.In our application,this corresponds to a global state such that.There are many algorithms that enumerate all the compositions of aninteger into parts(for example,the algorithm due to Nijenhuis and Wilf[11](pp.40-46)runs through the com-positions in lexicographic order reading from right to left). For every composition,the corresponding global state can be checked for consistency.The second algorithm exploits the fact that the DFSand the lex traversal can be done in polynomial space.We perform the DFS traversal for each level number.Dur-ing the DFS traversal we explore a global state only if its level number is less than or equal to and visit it(evalu-ate the predicate or print it depending upon the application) only if its level number is exactly equal to.The algorithm shown in Figure4generates one level at a time.In line (2)it reduces the computation to include only those events whose sum of vector clock values is less than or equal to .All other events can never be in a global state at level less than or equal to.In line(3)it performs space efficient lex traversal of the CGS lattice using the al-gorithm in Figure2.The computation used is the reduced one and the global predicate that is evaluated is modified to include a clause that the CGS should be at level equal to .If no CGS is found,then we try the next level. Since the total number of levels is,we can enumerate the consistent global states in the BFS order intime and space.Note that our algorithm enumer-ates each level itself in the lex order.5ConclusionsWe have presented an algorithm that does lex traversal of the CGS lattice in additional space andtime.We have also shown that at the expense of more time,BFS traversal can be accomplished in polynomial space. The previous algorithm for BFS traversal uses exponential space.var:array[...]of integer;(1)for to do(2)(3);(4)if then(5)return;(6)endfor;(7)return;Figure4.A Space Efficient algorithm for BFS Enumera-tionWe note here that there are other approaches in the mathematics and operations research literature for enumer-ation of ideals of a poset.See,for example,papers by Steiner[12],Bordat[13],Squire[14],Jegou,Medina,and Nourine[15],and Habib,Medina,Nourine and Steiner[16]. The algorithm in[16]is the most efficient known for gener-ating all ideals in space.None of these algorithms enumerate consistent global states(or ideals)in the lex or the BFS order.The most interesting question left open is:Is there any traversal algorithm for the CGS lattice in time and polynomial space?AcknowledgmentsI am thankful to James Roller Jr.,Alper Sen and Stephan Lips for discussions on the topic.References[1]V.K.Garg and B.Waldecker.Detection of unstablepredicates.In Proc.of the Workshop on Parallel andDistributed Debugging,Santa Cruz,CA,May1991.ACM/ONR.[2]R.Cooper and K.Marzullo.Consistent detection ofglobal predicates.In Proc.of the Workshop on Paral-lel and Distributed Debugging,pages163–173,SantaCruz,CA,May1991.ACM/ONR.[3]K.M.Chandy and mport.Distributed snap-shots:Determining global states of distributed sys-tems.ACM Transactions on Computer Systems,3(1):63–75,February1985.[4]N.Mittal and V.K.Garg.On detecting global pred-icates in distributed computations.In21st Interna-tional Conference on Distributed Computing Systems(ICDCS’01),pages3–10,Washington-Brussels-Tokyo,April2001.IEEE.[5]S.Alagar and S.Venkatesan.Techniques to tacklestate explosion in global predicate detection.IEEE Transactions on Software Engineering,27(8):704–714,August2001.[6]V.K.Garg.Algorithmic combinatorics based on slic-ing posets.In Proc.of22th Conference on the Foun-dations of Software Technology&Theoretical Com-puter Science,pages169–182.Springer Verlag,De-cember2002.Lecture Notes in Computer Science.[7]D.Stanton and D.White.Constructive Combina-torics.Springer-Verlag,1986.[8]mport.Time,clocks,and the ordering of eventsin a distributed munications of the ACM, 21(7):558–565,July1978.[9]C.J.Fidge.Partial orders for parallel debugging.Pro-ceedings of the ACM SIGPLAN/SIGOPS Workshop on Parallel and Distributed Debugging,published in ACM SIGPLAN Notices,24(1):183–194,January 1989.[10]F.Mattern.Virtual time and global states of dis-tributed systems.In Parallel and Distributed Algo-rithms:Proc.of the International Workshop on Paral-lel and Distributed Algorithms,pages215–226.Else-vier Science Publishers B.V.(North-Holland),1989.[11]A.Nijenhuis and binatorial Algo-rithms for Computers and Calculators.Academic Press,London,2edition,1978.[12]G.Steiner.An algorithm to generate the ideals of apartial order.Operations Research Letters,5(6):317–320,1986.[13]J.P.Bordat.Calcul des ideaux d’un ordonnefini.Op-eration Research,25(4):265–275,1991.[14]M.Squire.Gray Codes and Efficient Generation ofCombinatorial Structures.PhD Dissertation,Depart-ment of Computer Science,North Carolina State Uni-versity,1995.[15]Roland J´e gou,Raoul Medina,and Lhouari Nourine.Linear space algorithm for on-line detection of global predicates.In J¨o rg Desel,editor,Proceedings of the International Workshop on Structures in Concurrency Theory(STRICT),Workshops in Computing,pages 175–189.Springer-Verlag,1995.[16]M.Habib,R.Medina,L.Nourine,and G.Steiner.Ef-ficient algorithms on distributive lattices.DAMATH: Discrete Applied Mathematics and Combinatorial Operations Research and Computer Science,110:169–187,2001.。