A Scalable Method for Predicting Network Performance in Heterogeneous Clusters
Impact of Relevance Measures on the Robustness and Accuracy of Collaborative Filtering
Impact of Relevance Measureson the Robustness and Accuracyof Collaborative Filtering⋆JJ Sandvig,Bamshad Mobasher,and Robin BurkeCenter for Web IntelligenceSchool of Computer Science,Telecommunications and Information SystemsDePaul University,Chicago,Illinois,USA{jsandvig,mobasher,rburke}@Abstract.The open nature of collaborative recommender systems presenta security problem.Attackers that cannot be readily distinguished fromordinary users may inject biased profiles,degrading the objectivity andaccuracy of the system over time.The standard user-based collabora-tivefiltering algorithm has been shown quite vulnerable to such attacks.In this paper,we examine relevance measures that complement neigh-bor similarity and their influence on algorithm robustness.In particular,we consider two techniques,significance weighting and trust weighting,that attempt to calculate the utility of a neighbor with respect to rat-ing prediction.Such techniques have been used to improve predictionaccuracy in collaborativefiltering.We show that significance weighting,in particular,also results in improved robustness under profile injectionattacks.1IntroductionAn adaptive system dependent on anonymous,unauthenticated user profiles is subject to manipulation.The standard collaborativefiltering algorithm builds a recommendation for a target user by combining the stored preferences of peers with similar interests.If a malicious user injects the profile database with a number offictitious identities,they may be considered peers to a genuine user and bias the recommendation.We call such attacks profile injection attacks(also known as shilling[1]).Recent research has shown that surprisingly modest at-tacks are sufficient to manipulate the most common CF algorithms[2,1,3].Such attacks degrade the objectivity and accuracy of a recommender system,causing frustration for its users.In this paper we explore the robustness of certain variants of user-based rec-ommendation.In particular,we examine variants that combine similarity met-rics with other measures to determine neighbor utility.Such relevance weighting techniques apply a weight to each neighbor’s similarity score,based on somevalue reflecting the expected relevance of that neighbor to the prediction task. We focus on two types of relevance measures:significance weighting and trust-based weighting.Significance weighting[4]takes the size of profile overlap be-tween neighbors into account.This prevents neighbors with only a few commonly rated items from dominating prediction.Trust-based weighting[5]estimates the utility of a neighbor as a rating predictor based on the historical accuracy of recommendations given by the neighbor.Traditional user-based collaborativefiltering algorithms focus exclusively on the degree of similarity between the target user and its neighbors in order to generate predicted ratings.However,the“reliability”of the neighbor profiles is generally not considered.For example,due to the sparsity of the data,the similarities may have been obtained based on very few co-rated items between the neighbor and the target user resulting in sub-optimal predictions.Similarly, unreliable neighbors that have made poor predictions in the past may have a neg-ative impact on prediction accuracy for the current item.Both of the approaches to relevance weighting mentioned above were,therefore,initially introduced in order to improve the prediction accuracy in user-based collaborativefiltering.In the trust-based model[5]an explicit trust value is computed for each user, reflecting the“reputation”of that user for making accurate recommendations. Trust is not limited to the macro profile level,and can be calculated as the repu-tation a user has for the recommendation of a particular item.The trust values, in turn,can be used as relevance weights when generating predictions.In[5], O’Donovan and Smyth further studied the impact of trust weighting approach on the robustness of collaborative recommendation and showed the trust-based models are still vulnerable to attacks.On the other hand,the significance weight-ing approach,introduced initially in[4],does not focus on trust,but rather on the number of co-rated items between the target user and the neighbors as a measure for the degree of reliability of the neighbor profiles.This approach has been shown to have a significant impact on the accuracy of predictions,partic-ularly in sparse data sets.Although these and other similar approaches have been used to improve the prediction accuracy of recommender systems,the impact of neighbor signifi-cance weighting on algorithm robustness in the face of malicious attacks has been largely ignored.The primary contribution of this paper is to demonstrate that relevance weighting is an important factor in determining the robustness of a collaborativefiltering algorithm.Choosing an optimal relevance measure can yield a large improvement in recommender stability.Our results show that significance weighting,in particular,is not only more accurate;it also improves algorithm robustness under profile injection attacks that have compact profile signatures.2Attacks in Collaborative RecommendersWe assume that an attacker intends to bias a recommender system for some eco-nomic advantage.This may be in the form of an increased number of recommen-dations for the attacker’s product,or fewer recommendations for a competitor’s product.A collaborative recommender database consists of many user profiles,each with assigned ratings to a number of products that represent the user’s er-based collaborativefiltering algorithms attempt to discover a neigh-borhood of user profiles that are similar to a target user.A rating value is predicted for all missing items in the target user’s profile,based on ratings given to the item within the neighborhood.A ranked list is produced,and typically the top20or50predictions are returned as recommendations.The standard k-nearest neighbor algorithm is widely used and reasonably accurate[4].Similarity is computed using Pearson’s correlation coefficient,and the k most similar users that have rated the target item are selected as the neighborhood.This implies a target user may have a different neighborhood for each target item.It is also common tofilter neighbors with similarity below a specified threshold.This prevents predictions being based on very distant or neg-ative correlations.After identifying a neighborhood,we use Resnick’s algorithm to compute the prediction for a target item i and target user u:p u,i=¯r u+ v∈V sim u,v(r v,i−¯r v)more similar neighbors have a larger impact on thefinal prediction.However, this type of similarity weighting alone may not be sufficient to guarantee ac-curate predictions.It is also necessary to ensure the reliability of the neighbor profiles.A common reason for the lack of reliability of predictions may be that similarities between the target user and the neighbors are based on a very small number of co-rated items.In the following section we consider two approaches that have been used to address the“reliability”problem mentioned above.These approaches have been used primarily to increase prediction accuracy.Our focus, however,will be on their impact on system robustness in the face of attacks. We conjenture that an optimal relevance weight may provide an algorithmic approach to securing recommender systems against attacks.The basic goal of a relevance measure is to estimate the utility of a neighbor as a rating predictor for the target user.The standard technique is to calculate sim-ilarity as the degree of“closeness”in Euclidean space.This is often accomplished via Pearson’s correlation coefficient or vector cosine coefficient.Additional exten-sions to similarity are well known,including significance weighting[4],variance weighting[4],case amplification[7],inverse user frequency[7],default voting[7], and profile trust[5].In this paper,we focus on the effects of significance weighting and profile trust because they are widely accepted techniques with very different properties.3.1Significance WeightingThe significance weighting approach proposed by Herlocker,et al.[4]is to adjusts similarity weights by devaluing relationships with a small number of commonly rated items.It uses a linear drop-offfor neighbors with less than N co-rated items.Neighbors with more than N co-rated items are not devalued at all.The significance weight of a target user u for a neighbor v is computed as:w u,v= sim u,v∗nlg m ,where n is the number of co-rated items,and m is the total numberof ratings in the target user’s profiing a local measure prevents unduly pe-nalizing the closest neighbors when the target user has only a minimal number of ratings.Significance weighting prefers neighbors having many commonly rated items with the target user.Neighbors with fewer commonly rated items may be pushed out of the neighborhood,even if there is a higher degree of similarity to the target user.It follows that users who have rated a large number of items willbelong to more neighborhoods than those users who have rated few items.This is a potential security risk in the context of profile injection attacks.An attack profile with a very large number offiller items will necessarily be included in more neighborhoods,regardless of the rating value.As we will show,the risk is minimized precisely because a largefiller size threshold is required to make the attack successful.In most cases,genuine users rate only a small portion of all recommendable items;therefore,an attack profile with a very largefiller size is easier to detect[8].3.2Trust WeightingThe vulnerabilities of collaborative recommender systems to attacks have led to a number of recent studies focusing on the notion of“trust”in recommenda-tion.O’Donovan and Smyth[5,9]propose trust models as a means to improve accuracy in collaborativefiltering.The basic assumption is that users with a history of being good predictors will provide accurate predictions in the future. By explicitly calculating a trust value,the reputation of a user can be used as insight into the user’s relevance to recommendation.Trust is not limited to the macro profile level,and can be calculated as the reputation a user has for the recommendation of a particular item.The trust building process generates a trust value for every user in the train-ing set by examining the predictive accuracy of the corresponding profile.By cross-validation,each user in turn is designated as the sole neighbor v for all remaining users.The system then computes the prediction set P v as all possi-ble predictions p u,i that can be made for user u∈U and item i∈I using the neighborhood V=v.For each prediction p u,i,recommend v,u,i=1if p u,i∈P v and correct v,u,i=1if|p u,i−r u,i|<εwhereεis a constant threshold and r u,i is the rating of user u for item i.Item-trust values are then computed as:trust v,i= u∈U correct v,u,i(4)sim u,i+trust v,iwhere sim u,v is Pearson’s correlation coefficient.A prediction for the target user is computed using(1),replacing sim u,v with w u,v,i.Trust-based collaborativefiltering algorithms can be very susceptible to pro-file injection attacks,because mutual opinions are reinforced during the trust building process[9].Attack profiles that contain biased ratings for a target item result in mutual reinforcement of the item’s preference.The larger the attack, the more reinforcement of the target item.Furthermore,if the target item is al-ways given the maximum value,an attack profile could have higher trust scores than a genuine profile,because correct v,u,i will always be1if v and u are both attacks on item i.In a recent study,O’Donovan and Smyth[9]propose several solutions to the reinforcement problem that utilize pseudo-random subsets of the training data during the trust building phase.Sampling the population of profiles used in trust calculation effectively smoothes the noise inherent in the entire dataset. The strategy raises an interesting research question with respect to robustness: how does a non-deterministic neighborhood formation task affect the impact of a profile injection attack?Although promising,we did not evaluate sampling the training set.For this set of experiments,we are interested only in the effect of relevance weighting.4Experimental EvaluationDataset.In our experiments,we have used the publicly-available Movie-Lens 100K dataset1.This dataset consists of100,000ratings on1682movies by943 users.All ratings are integer values between one andfive,where one is the lowest (disliked)andfive is the highest(liked).Our data includes all users who have rated at least20movies.To conduct attack experiments,the full dataset is split into training and test sets.Generally,the test set contains a sample of50user profiles that mirror the overall distribution of users in terms of number of movies seen and ratings provided.The remaining user profiles are designated as the training set.All attack profiles are built from the training set,in isolation from the test set.The set of attacked items consists of50movies whose ratings distribution matches the overall ratings distribution of all movies.Each movie is attacked as a separate test,and the results are aggregated.In each case,a number of attack profiles are generated and inserted into the training set,and any existing rating for the attacked movie in the test set is temporarily removed.For every profile injection attack,we track attack size andfiller size.Attack size is the number of injected attack profiles,and is measured as a percentage of the pre-attack training set.There are approximately1000users in the database, so an attack size of1%corresponds to about10attack profiles added to the system.Filler size is the number offiller ratings given to a specific attack pro-file,and is measured as a percentage of the total number of movies.There are approximately1700movies in the database,so afiller size of10%corresponds to about170filler ratings in each attack profile.The results reported below represent averages over all combinations of test users and attacked movies.parison of MAEEvaluation Metrics.There has been considerable research in the area of rec-ommender system evaluation focused on accuracy and performance[10].We use the mean absolute error(MAE)accuracy metric,a statistical measure for com-paring predicted values to actual user ratings[4].However,our overall goal is to measure the effectiveness of an attack;the“win”for the attacker.In the ex-periments reported below,we follow the lead of[2]in measuring stability via prediction shift.Prediction shift measures the change in an item’s predicted rating after being attacked.Let U and I be the sets of test users and attacked items,respectively. For each user-item pair(u,i)the prediction shift denoted by∆u,i,can be mea-sured as∆u,i=p′u,i−p u,i,where p and p′represent the prediction before and after attack,respectively.A positive value means that the attack has succeeded in raising the predicted rating for the item.The average prediction shift for an item i over all users in the test set can be computed as:∆i= u∈U∆u,i/|U|.The average prediction shift is then computed by averaging over individual prediction shifts for all attacked items.Note that a strong prediction shift does not guarantee an item will be recommended-it is possible that other items’scores are also affected by an attack,or that the item score is so low that even a prodigious shift does not promote it to“recommended”status.Accuracy Analysis.Wefirst compare the accuracy of k-nearest neighbor using different relevance metrics.In our experiments we examined the standard Pear-son’s correlation,standard significance weighting,local significance weighting, and item-trust weighting.For significance weighting,we have followed the lead of[4]in using N=50.For trust weighting,we have followed the lead of[5]in usingε=1.8.In all cases,10-fold cross-validation is performed on the entire dataset and no attack profiles are injected.As shown in Figure1,we achieved good results using a neighborhood size of k=30users for all relevance metrics;therefore,we applied k=30to all neighborhood formation tasks in the attack results discussed below.Overall,it isclear that some form of relevance weighting,in addition to similarity,can improveprediction accuracy.Standard and local significance weighting are particularly beneficial,although trust is also helpful when considering small neighborhoods.There are several interesting observations about the MAE results.At k=5, item-trust is more accurate than the other relevance measures.At k=15and greater,item-trust is the least accurate of the measures.It appears that the trust building process overfits the data,because trust is built on the assumption that the user for whom a trust value is computed is the only neighbor in any given neighborhood.The trust model does not take into account that a large neighborhood depends on reinforcement.For example,the closest neighbor to a target user may predict a negative rating for item i.But,when the closest three neighbors are taken into account,the second and third neighbors may predict a positive rating for item i.This effectively cancels out the prediction of the closest neighbor.In fact,a positive rating prediction may be more accurate for item i because the trend of the closest neighbors is a positive rating. Robustness Analysis.To evaluate the robustness of relevance weighting,we compare the results of push attacks using the four relevance weighting schemes described in the previous section.Figure2(A)depicts prediction shift results at different attack sizes,using a5%filler.Clearly,significance weighting is much more robust than the standard Pearson’s correlation.For all attack sizes,the pre-diction shift of significance weighting is about half that of standard correlation. Although not completely immune to attack,it is certainly a large improvement. Even at a15%attack,significance weighting may be the difference between recommending an attacked item or not.Local significance weighting also performs well against profile injection at-tack,although not to the same degree of robustness as standard significance weighting.This can be explained by the fact that target users with fewer than 50ratings do not scale their neighbors linearly.An attack profile in the neigh-borhood that is highly correlated to the target user is not devalued enough.As a result,a genuine user with less correlation to the target user,but more overlap in rated items,may be removed from the neighborhood.Item-trust weighting appears slightly more robust than standard correla-tion.The mutual-reinforcement effect is not as pronounced for attack profiles at smallerfiller sizes,because the attacks don’t have enough similarity to the target user;the trust value is outweighed.In addition,the reinforcement from genuine users is enough to gain insight into the true relevance for making pre-dictions.The combination of trust and similarity of genuine users to a target user is sufficient to remove some attack profiles from the neighborhood.To evaluate the sensitivity offiller size,we have tested a full range offiller items.The100%filler is included as a benchmark for the potential influence of an attack.However,it is not likely to be practical from an attacker’s point of view.Collaborativefiltering rating databases are often extremely sparse,so attack profiles that have rated every product are quite conspicuous.Of particular interest are smallerfiller sizes.An attack that performs well with fewfiller items is less likely to be detected.Thus,an attacker will have a better chance of actuallyFig.2.(A)Average attack prediction shift at5%filler;(B)Average attackfiller size comparisonimpacting a system’s recommendation,even if the performance of the attack is not optimal.Figure2(B)depicts prediction shift at differentfiller sizes with2%attack size. Surprisingly,asfiller size is increased,prediction shift for standard correlation goes down.This is because an attack profile with manyfiller items has greater probability of being dissimilar to the active user.On the contrary,prediction shift for significance weighting goes up.As stated previously,an attack profile with a very large number offiller items will have a better chance of being included in more neighborhoods,because it isn’t devalued by significance weighting.The counter-intuitive observation is that standard correlation is actually more robust than any of the other relevance measures at very largefiller sizes. To account for this,recall that the size of profile overlap is not addressed with standard correlation.A genuine user that is very similar to the target user,but does not have many co-rated items,is not penalized.However,with significance weighting the same user would be devalued,potentially removing the user from the neighborhood in favor of an attack profile.As shown,a25%filler size is the point where prediction shift for standard correlation surpasses the other relevance measures.Overall,this does not affect the general improvement in robustness of relevance ing the modest Movie-Lens100K dataset,a user would have to rate420movies to have a pro-file with25%filler.It is simply not feasible for a genuine user to rate25%of the items in a commercial recommender such as ,with millions of different products.From a practical perspective,the threat of largefiller attacks is minimal because they should be easily detectable[8].5ConclusionThe standard user-based collaborativefiltering algorithm has been shown quite vulnerable to profile injection attacks.An attacker is able to bias recommen-dation by building a number of profiles associated withfictitious identities.In this paper,we have demonstrated the relative robustness and stability of sup-plementing the similarity weighting of neighbors with significance weighting and item-trust values.Significance weighting,in particular,results in increased rec-ommendation accuracy and improved robustness under attack,versus the stan-dard k-nearest neighbor approach.Future work will examine other relevance measures with respect to attack,including case amplification,inverse user fre-quency,and default voting.Referencesm,S.,Riedl,J.:Shilling recommender systems for fun and profit.In:Proceedingsof the13th International WWW Conference,New York(May2004)2.O’Mahony,M.,Hurley,N.,Kushmerick,N.,Silvestre,G.:Collaborative recom-mendation:A robustness analysis.ACM Transactions on Internet Technology4(4) (2004)344–3773.Mobasher,B.,Burke,R.,Bhaumik,R.,Williams,C.:Towards trustworthy rec-ommender systems:An analysis of attack models and algorithm robustness.ACM Transactions on Internet Technology7(4)(2007)4.Herlocker,J.,Konstan,J.,Borchers,A.,Riedl,J.:An algorithmic framework forperforming collaborativefiltering.In:Proceedings of the22nd ACM Conference on Research and Development in Information Retrieval(SIGIR’99),Berkeley,CA (August1999)5.O’Donovan,J.,Smyth,B.:Trust in recommender systems.In:Proceedings of the10th International Conference on Intelligent User Interfaces(EC’04),ACM Press (2005)167–1746.Mobasher,B.,Burke,R.,Sandvig,J.J.:Model-based collaborativefiltering asa defense against profile injection attacks.In:Proceedings of the21st NationalConference on Artificial Intelligence,AAAI(July2006)1388–13937.Breese,J.,Heckerman,D.,Kadie,C.:Empirical analysis of predictive algorithmsfor collaborativefiltering.In:Uncertainty in Artificial Intelligence.Proceedings of the Fourteenth Conference,New Orleans,LA,Morgan Kaufman(1998)43–53 8.Williams,C.,Bhaumik,R.,Burke,R.,Mobasher,B.:The impact of attack profileclassification on the robustness of collaborative recommendation.In:Proceedings of the2006WebKDD Workshop,held at ACM SIGKDD Conference on Data Mining and Knowledge Discovery(KDD’06),Philadelphia(August2006)9.O’Donovan,J.,Smyth,B.:Is trust robust?:An analysis of trust-based recom-mendation.In:Proceedings of the5th ACM Conference on Electronic Commerce (EC’04),ACM Press(2006)101–10810.J.Herlocker,Konstan,J.,Tervin,L.G.,Riedl,J.:Evaluating collaborativefilteringrecommender systems.ACM Transactions on Information Systems22(1)(2004) 5–53。
姿态估计算法汇总基于RGB、RGB-D以及点云数据
姿态估计算法汇总基于RGB、RGB-D以及点云数据作者:Tom Hardy点击上⽅“3D视觉⼯坊”,选择“星标”⼲货第⼀时间送达作者⼁Tom Hardy@知乎编辑⼁3D视觉⼯坊姿态估计算法汇总|基于RGB、RGB-D以及点云数据主要有整体⽅式、霍夫投票⽅式、Keypoint-based⽅式、Dense Correspondence⽅式等。
实现⽅法:传统⽅法、深度学习⽅式。
数据不同:RGB、RGB-D、点云数据等;标注⼯具实现⽅式不同整体⽅式整体⽅法直接估计给定图像中物体的三维位置和⽅向。
经典的基于模板的⽅法构造刚性模板并扫描图像以计算最佳匹配姿态。
这种⼿⼯制作的模板对集群场景不太可靠。
最近,⼈们提出了⼀些基于深度神经⽹络的⽅法来直接回归相机或物体的6D姿态。
然⽽,旋转空间的⾮线性使得数据驱动的DNN难以学习和推⼴。
1.Discriminative mixture-of-templates for viewpoint classification2.Gradient response maps for realtime detection of textureless objects.paring images using the hausdorff distance4.Implicit 3d orientation learning for 6d object detection from rgb images.5.Instance- and Category-level 6D Object Pose Estimation基于模型2.Deep model-based 6d pose refinement in rgbKeypoint-based⽅式⽬前基于关键点的⽅法⾸先检测图像中物体的⼆维关键点,然后利⽤PnP算法估计6D姿态。
1.Surf: Speeded up robust features.2.Object recognition from local scaleinvariant features3.3d object modeling and recognition using local affine-invariant image descriptors and multi-view spatial constraints.5.Stacked hourglass networks for human pose estimation6.Making deep heatmaps robust to partial occlusions for 3d object pose estimation.7.Bb8: A scalable, accurate, robust to partial occlusion method for predicting the 3d poses of challenging objects without using depth8.Real-time seamless single shot 6d object pose prediction.9.Discovery of latent 3d keypoints via end-toend geometric reasoning.10.Pvnet: Pixel-wise voting network for 6dof pose estimation.Dense Correspondence/霍夫投票⽅式1.Independent object class detection using 3d feature maps.2.Depth encoded hough voting for joint object detection and shape recovery.3.aware object detection and pose estimation.4.Learning 6d object pose estimation using 3d object coordinates.5.Global hypothesis generation for 6d object pose estimation.6.Deep learning of local rgb-d patches for 3d object detection and 6d pose estimation.7.Cdpn: Coordinates-based disentangled pose network for real-time rgb-based 6-dof object pose estimation.8.Pix2pose: Pixel-wise coordinate regression of objects for 6d pose estimation.9.Normalized object coordinate space for categorylevel 6d object pose and size estimation.10.Recovering 6d object pose and predicting next-bestview in the crowd.基于分割深度学习相关⽅法1.PoseCNN: A convolutional neural network for 6d object pose estimation in cluttered scenes.2.Render for cnn: Viewpoint estimation in images using cnns trained with rendered 3d model views.6.Robust 6D Object Pose Estimation in Cluttered Scenesusing Semantic Segmentation and Pose Regression Networks - Arul Selvam Periyasamy, Max Schwarz, and Sven Behnke. [[Paper]数据格式不同根据数据格式的不同,⼜可分为基于RGB、RGB-D、点云数据的识别算法。
A COMPUTATIONAL MODEL FOR MOS PREDICTION
A COMPUTATIONAL MODEL FOR MOS PREDICTIONDoh-Suk Kim,Oded Ghitza,and Peter KroonAcoustics and Speech Research DepartmentBell Laboratories,Lucent TechnologiesMurray Hill,New Jersey07974,USAds@sait.samsung.co.krABSTRACTA computational model to predict MOS of processed speech isproposed.The system measures the distortion of processed speech(compared to the source speech)using a peripheral model of themammalian auditory system and a psychophysically-inspired mea-sure,and maps the distortion value onto the MOS scale.This paperdescribes our attempt to derive a“universal”,database-independent,distortion-to-MOS mapping function.Preliminary experimentalevaluation shows that the performance of the proposed system iscomparable with ITU-T recommendation P.861for clean speechsources,and outperforms the P.861recommendation for speechsources corrupted by either car or babble noise at30dB SNR.1.INTRODUCTIONUp until this day,the most reliable way to evaluate the perfor-mance of a speech coding system is to perform subjective speechquality assessment tests such as MOS(Mean Opinion Score)test.Obviously,these tests are expensive both in time and cost,and dif-ficult to reproduce.Thus,it is desirable to replace them with anobjective method.Numerous studies have been conducted with the purpose offinding a distortion measure that will correlate well with subjec-tive MOS measurements,including the PSQM method[1],whichwas adopted as the ITU-T standard recommendation for telephoneband speech(P.861)[2].To the best of our knowledge,none ofthese studies yet resolved two major challenges:(1)how to mapthe distortion value onto the MOS scale,and(2)how to accuratelyassess the quality of processed speech,where the source had beencorrupted by environmental noise.In this paper,we address these challenges.We consider a sys-tem comprising of two stages.Thefirst(termed ASQM,for Au-ditory Speech Quality Measure)measures the distortion of a pro-cessed speech(compared to the source speech)using a peripheral model of the mammalian auditory system and a psychophysically-inspired measure.It will be shown that the robustness of auditory-based representations to environmental noise(as was demonstrated elsewhere,e.g.,[3],[4]),results in a distortion measurement that correlates well with subjective quality assessments of speech.The second stage maps the distortion value onto the MOS scale.Previous studies have been confined to thefirst stage,and performance was evaluated via correlation analysis of the result-ing(objective)distortion measurement with the subjective MOS.(2) where is a small number to prevent division by zero and is a control parameter greater than zero.Although the basic form of the asymmetric measure is adopted from the PSQM,parameters should be optimized for the auditory representations.The overall distortion between two sequences and is de-termined by(3)Phase I : Obtaining Distortion−to−MOS Map Phase II : Evaluation Figure1:Block diagram of the database-independent MOS esti-mation.where is a weighting factor for active speech frames,and and are the distortions for the speech portion and the non-speech portions of the signal,respectively.Distortions for the speech and the non-speech are defined as(4)where and are the the pseudo-loudness of the source speech and the processed speech at the-th frame,respec-tively,is the threshold for speech/non-speech decision,and and are the number of active speech frames and the num-ber of non-speech frames,respectively.For clean speech,only the active speech frames contribute to the overall distortion metric un-less the speech coding algorithm under test generates high-power distortions in the non-speech frames.3.MOS ESTIMATIONThe proposed system for MOS estimation is shown in Fig.1.It is based on a basic assumption that the subjective MOS scores of MNRU-conditioned[6]speech sentences are consistent across dif-ferent databases.Consequently,we collected every MNRU condi-tion(speech material and their associated subjective MOS scores) from all databasesRMSEPSQM ASQMDB-I0.3440.350 DB-II0.2630.339 DB-III(CLN)0.4240.469 DB-III(C30)0.4450.250 DB-III(B30)0.7890.27800.51 1.52 2.522.533.544.5Objective DistortionS u b j e c t i v e M O SFigure 2:Relationship between subjective MOS and objective distortion measurement,for (a)PSQM or (b)ASQM based sys-tem,for database DB-III.Diamonds represent clean source and the MNRU conditions,circles represent codecs,and unfilled cir-cles represent the unprocessed noisy source.The distortion-to-MOS mapping,obtained from MNRU conditions of databases DB-I,DB-II and DB-III,is superimposed on each plot (dashed line).5.CONCLUSIONSThis paper described a methodology to predict subjective MOS scores.The method consists of a two stage process.First a distor-tion measure based on an auditory model,followed by a database independent distortion-to-MOS mapping using MNRU anchors.Based on evaluation on a small number of databases,the method provides MOS estimates that are highly correlated with MOS scores obtained by real listening tests.The method seems to be robust against environmental noise.It should be noted that these are preliminary results.Further evaluation is needed,using different databases,to confirm the un-derlying assumption that distortion-to-MOS mapping based upon MNRU anchor points can be used to map distortion measurements of coded speech.It may very well be that other anchor points may be needed,such as carefully selected,standardized coders.Fur-ther evaluation is also needed to confirm the robust performanceagainst noise,using other noise sources for a wide range of SNR values.6.ACKNOWLEDGMENTThe authors would like to thank M.M.Sondhi and J.H.Hall for stimulating discussions throughout this work,and C.A.Harsin and D.A.Quinlan for providing the noisy speech database.7.REFERENCES[1]J.G.Beerends and J.A.Stemerdink,“A perceptual speech-quality measure based on psychoacoustic sound representa-tion,”J.Audio Eng.Soc.,vol.42,pp.115–123,March 1994.[2]ITU-T Recommendation P.861,Objective Quality Measure-ment of Telephone-Band (300-3400Hz)Speech Codecs .Geneva,1996.[3]O.Ghitza,“Auditory nerve representation as a basis for speechprocessing,”in Advances in Speech Signal Processing (S.Fu-rui and M.M.Sondhi,eds.),pp.453–485,New York:Marcel Dekker,1992.[4] D.S.Kim,S.Y .Lee,and R.M.Kil,“Auditory processingof speech signals for robust speech recognition in real-world noisy environments,”IEEE Trans.Speech and Audio Process-ing ,vol.7,no.1,pp.55–69,1999.[5]ITU-T STL96,ITU-T Software tool library .Geneva,May1996.[6]ITU-T Recommendation P.810,Modulated Noise ReferenceUnit (MNRU),Feb.1996.。
《神经网络与深度学习综述DeepLearning15May2014
Draft:Deep Learning in Neural Networks:An OverviewTechnical Report IDSIA-03-14/arXiv:1404.7828(v1.5)[cs.NE]J¨u rgen SchmidhuberThe Swiss AI Lab IDSIAIstituto Dalle Molle di Studi sull’Intelligenza ArtificialeUniversity of Lugano&SUPSIGalleria2,6928Manno-LuganoSwitzerland15May2014AbstractIn recent years,deep artificial neural networks(including recurrent ones)have won numerous con-tests in pattern recognition and machine learning.This historical survey compactly summarises relevantwork,much of it from the previous millennium.Shallow and deep learners are distinguished by thedepth of their credit assignment paths,which are chains of possibly learnable,causal links between ac-tions and effects.I review deep supervised learning(also recapitulating the history of backpropagation),unsupervised learning,reinforcement learning&evolutionary computation,and indirect search for shortprograms encoding deep and large networks.PDF of earlier draft(v1):http://www.idsia.ch/∼juergen/DeepLearning30April2014.pdfLATEX source:http://www.idsia.ch/∼juergen/DeepLearning30April2014.texComplete BIBTEXfile:http://www.idsia.ch/∼juergen/bib.bibPrefaceThis is the draft of an invited Deep Learning(DL)overview.One of its goals is to assign credit to those who contributed to the present state of the art.I acknowledge the limitations of attempting to achieve this goal.The DL research community itself may be viewed as a continually evolving,deep network of scientists who have influenced each other in complex ways.Starting from recent DL results,I tried to trace back the origins of relevant ideas through the past half century and beyond,sometimes using“local search”to follow citations of citations backwards in time.Since not all DL publications properly acknowledge earlier relevant work,additional global search strategies were employed,aided by consulting numerous neural network experts.As a result,the present draft mostly consists of references(about800entries so far).Nevertheless,through an expert selection bias I may have missed important work.A related bias was surely introduced by my special familiarity with the work of my own DL research group in the past quarter-century.For these reasons,the present draft should be viewed as merely a snapshot of an ongoing credit assignment process.To help improve it,please do not hesitate to send corrections and suggestions to juergen@idsia.ch.Contents1Introduction to Deep Learning(DL)in Neural Networks(NNs)3 2Event-Oriented Notation for Activation Spreading in FNNs/RNNs3 3Depth of Credit Assignment Paths(CAPs)and of Problems4 4Recurring Themes of Deep Learning54.1Dynamic Programming(DP)for DL (5)4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL (6)4.3Occam’s Razor:Compression and Minimum Description Length(MDL) (6)4.4Learning Hierarchical Representations Through Deep SL,UL,RL (6)4.5Fast Graphics Processing Units(GPUs)for DL in NNs (6)5Supervised NNs,Some Helped by Unsupervised NNs75.11940s and Earlier (7)5.2Around1960:More Neurobiological Inspiration for DL (7)5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) (8)5.41979:Convolution+Weight Replication+Winner-Take-All(WTA) (8)5.51960-1981and Beyond:Development of Backpropagation(BP)for NNs (8)5.5.1BP for Weight-Sharing Feedforward NNs(FNNs)and Recurrent NNs(RNNs)..95.6Late1980s-2000:Numerous Improvements of NNs (9)5.6.1Ideas for Dealing with Long Time Lags and Deep CAPs (10)5.6.2Better BP Through Advanced Gradient Descent (10)5.6.3Discovering Low-Complexity,Problem-Solving NNs (11)5.6.4Potential Benefits of UL for SL (11)5.71987:UL Through Autoencoder(AE)Hierarchies (12)5.81989:BP for Convolutional NNs(CNNs) (13)5.91991:Fundamental Deep Learning Problem of Gradient Descent (13)5.101991:UL-Based History Compression Through a Deep Hierarchy of RNNs (14)5.111992:Max-Pooling(MP):Towards MPCNNs (14)5.121994:Contest-Winning Not So Deep NNs (15)5.131995:Supervised Recurrent Very Deep Learner(LSTM RNN) (15)5.142003:More Contest-Winning/Record-Setting,Often Not So Deep NNs (16)5.152006/7:Deep Belief Networks(DBNs)&AE Stacks Fine-Tuned by BP (17)5.162006/7:Improved CNNs/GPU-CNNs/BP-Trained MPCNNs (17)5.172009:First Official Competitions Won by RNNs,and with MPCNNs (18)5.182010:Plain Backprop(+Distortions)on GPU Yields Excellent Results (18)5.192011:MPCNNs on GPU Achieve Superhuman Vision Performance (18)5.202011:Hessian-Free Optimization for RNNs (19)5.212012:First Contests Won on ImageNet&Object Detection&Segmentation (19)5.222013-:More Contests and Benchmark Records (20)5.22.1Currently Successful Supervised Techniques:LSTM RNNs/GPU-MPCNNs (21)5.23Recent Tricks for Improving SL Deep NNs(Compare Sec.5.6.2,5.6.3) (21)5.24Consequences for Neuroscience (22)5.25DL with Spiking Neurons? (22)6DL in FNNs and RNNs for Reinforcement Learning(RL)236.1RL Through NN World Models Yields RNNs With Deep CAPs (23)6.2Deep FNNs for Traditional RL and Markov Decision Processes(MDPs) (24)6.3Deep RL RNNs for Partially Observable MDPs(POMDPs) (24)6.4RL Facilitated by Deep UL in FNNs and RNNs (25)6.5Deep Hierarchical RL(HRL)and Subgoal Learning with FNNs and RNNs (25)6.6Deep RL by Direct NN Search/Policy Gradients/Evolution (25)6.7Deep RL by Indirect Policy Search/Compressed NN Search (26)6.8Universal RL (27)7Conclusion271Introduction to Deep Learning(DL)in Neural Networks(NNs) Which modifiable components of a learning system are responsible for its success or failure?What changes to them improve performance?This has been called the fundamental credit assignment problem(Minsky, 1963).There are general credit assignment methods for universal problem solvers that are time-optimal in various theoretical senses(Sec.6.8).The present survey,however,will focus on the narrower,but now commercially important,subfield of Deep Learning(DL)in Artificial Neural Networks(NNs).We are interested in accurate credit assignment across possibly many,often nonlinear,computational stages of NNs.Shallow NN-like models have been around for many decades if not centuries(Sec.5.1).Models with several successive nonlinear layers of neurons date back at least to the1960s(Sec.5.3)and1970s(Sec.5.5). An efficient gradient descent method for teacher-based Supervised Learning(SL)in discrete,differentiable networks of arbitrary depth called backpropagation(BP)was developed in the1960s and1970s,and ap-plied to NNs in1981(Sec.5.5).BP-based training of deep NNs with many layers,however,had been found to be difficult in practice by the late1980s(Sec.5.6),and had become an explicit research subject by the early1990s(Sec.5.9).DL became practically feasible to some extent through the help of Unsupervised Learning(UL)(e.g.,Sec.5.10,5.15).The1990s and2000s also saw many improvements of purely super-vised DL(Sec.5).In the new millennium,deep NNs havefinally attracted wide-spread attention,mainly by outperforming alternative machine learning methods such as kernel machines(Vapnik,1995;Sch¨o lkopf et al.,1998)in numerous important applications.In fact,supervised deep NNs have won numerous of-ficial international pattern recognition competitions(e.g.,Sec.5.17,5.19,5.21,5.22),achieving thefirst superhuman visual pattern recognition results in limited domains(Sec.5.19).Deep NNs also have become relevant for the more generalfield of Reinforcement Learning(RL)where there is no supervising teacher (Sec.6).Both feedforward(acyclic)NNs(FNNs)and recurrent(cyclic)NNs(RNNs)have won contests(Sec.5.12,5.14,5.17,5.19,5.21,5.22).In a sense,RNNs are the deepest of all NNs(Sec.3)—they are general computers more powerful than FNNs,and can in principle create and process memories of ar-bitrary sequences of input patterns(e.g.,Siegelmann and Sontag,1991;Schmidhuber,1990a).Unlike traditional methods for automatic sequential program synthesis(e.g.,Waldinger and Lee,1969;Balzer, 1985;Soloway,1986;Deville and Lau,1994),RNNs can learn programs that mix sequential and parallel information processing in a natural and efficient way,exploiting the massive parallelism viewed as crucial for sustaining the rapid decline of computation cost observed over the past75years.The rest of this paper is structured as follows.Sec.2introduces a compact,event-oriented notation that is simple yet general enough to accommodate both FNNs and RNNs.Sec.3introduces the concept of Credit Assignment Paths(CAPs)to measure whether learning in a given NN application is of the deep or shallow type.Sec.4lists recurring themes of DL in SL,UL,and RL.Sec.5focuses on SL and UL,and on how UL can facilitate SL,although pure SL has become dominant in recent competitions(Sec.5.17-5.22). Sec.5is arranged in a historical timeline format with subsections on important inspirations and technical contributions.Sec.6on deep RL discusses traditional Dynamic Programming(DP)-based RL combined with gradient-based search techniques for SL or UL in deep NNs,as well as general methods for direct and indirect search in the weight space of deep FNNs and RNNs,including successful policy gradient and evolutionary methods.2Event-Oriented Notation for Activation Spreading in FNNs/RNNs Throughout this paper,let i,j,k,t,p,q,r denote positive integer variables assuming ranges implicit in the given contexts.Let n,m,T denote positive integer constants.An NN’s topology may change over time(e.g.,Fahlman,1991;Ring,1991;Weng et al.,1992;Fritzke, 1994).At any given moment,it can be described as afinite subset of units(or nodes or neurons)N= {u1,u2,...,}and afinite set H⊆N×N of directed edges or connections between nodes.FNNs are acyclic graphs,RNNs cyclic.Thefirst(input)layer is the set of input units,a subset of N.In FNNs,the k-th layer(k>1)is the set of all nodes u∈N such that there is an edge path of length k−1(but no longer path)between some input unit and u.There may be shortcut connections between distant layers.The NN’s behavior or program is determined by a set of real-valued,possibly modifiable,parameters or weights w i(i=1,...,n).We now focus on a singlefinite episode or epoch of information processing and activation spreading,without learning through weight changes.The following slightly unconventional notation is designed to compactly describe what is happening during the runtime of the system.During an episode,there is a partially causal sequence x t(t=1,...,T)of real values that I call events.Each x t is either an input set by the environment,or the activation of a unit that may directly depend on other x k(k<t)through a current NN topology-dependent set in t of indices k representing incoming causal connections or links.Let the function v encode topology information and map such event index pairs(k,t)to weight indices.For example,in the non-input case we may have x t=f t(net t)with real-valued net t= k∈in t x k w v(k,t)(additive case)or net t= k∈in t x k w v(k,t)(multiplicative case), where f t is a typically nonlinear real-valued activation function such as tanh.In many recent competition-winning NNs(Sec.5.19,5.21,5.22)there also are events of the type x t=max k∈int (x k);some networktypes may also use complex polynomial activation functions(Sec.5.3).x t may directly affect certain x k(k>t)through outgoing connections or links represented through a current set out t of indices k with t∈in k.Some non-input events are called output events.Note that many of the x t may refer to different,time-varying activations of the same unit in sequence-processing RNNs(e.g.,Williams,1989,“unfolding in time”),or also in FNNs sequentially exposed to time-varying input patterns of a large training set encoded as input events.During an episode,the same weight may get reused over and over again in topology-dependent ways,e.g.,in RNNs,or in convolutional NNs(Sec.5.4,5.8).I call this weight sharing across space and/or time.Weight sharing may greatly reduce the NN’s descriptive complexity,which is the number of bits of information required to describe the NN (Sec.4.3).In Supervised Learning(SL),certain NN output events x t may be associated with teacher-given,real-valued labels or targets d t yielding errors e t,e.g.,e t=1/2(x t−d t)2.A typical goal of supervised NN training is tofind weights that yield episodes with small total error E,the sum of all such e t.The hope is that the NN will generalize well in later episodes,causing only small errors on previously unseen sequences of input events.Many alternative error functions for SL and UL are possible.SL assumes that input events are independent of earlier output events(which may affect the environ-ment through actions causing subsequent perceptions).This assumption does not hold in the broaderfields of Sequential Decision Making and Reinforcement Learning(RL)(Kaelbling et al.,1996;Sutton and Barto, 1998;Hutter,2005)(Sec.6).In RL,some of the input events may encode real-valued reward signals given by the environment,and a typical goal is tofind weights that yield episodes with a high sum of reward signals,through sequences of appropriate output actions.Sec.5.5will use the notation above to compactly describe a central algorithm of DL,namely,back-propagation(BP)for supervised weight-sharing FNNs and RNNs.(FNNs may be viewed as RNNs with certainfixed zero weights.)Sec.6will address the more general RL case.3Depth of Credit Assignment Paths(CAPs)and of ProblemsTo measure whether credit assignment in a given NN application is of the deep or shallow type,I introduce the concept of Credit Assignment Paths or CAPs,which are chains of possibly causal links between events.Let usfirst focus on SL.Consider two events x p and x q(1≤p<q≤T).Depending on the appli-cation,they may have a Potential Direct Causal Connection(PDCC)expressed by the Boolean predicate pdcc(p,q),which is true if and only if p∈in q.Then the2-element list(p,q)is defined to be a CAP from p to q(a minimal one).A learning algorithm may be allowed to change w v(p,q)to improve performance in future episodes.More general,possibly indirect,Potential Causal Connections(PCC)are expressed by the recursively defined Boolean predicate pcc(p,q),which in the SL case is true only if pdcc(p,q),or if pcc(p,k)for some k and pdcc(k,q).In the latter case,appending q to any CAP from p to k yields a CAP from p to q(this is a recursive definition,too).The set of such CAPs may be large but isfinite.Note that the same weight may affect many different PDCCs between successive events listed by a given CAP,e.g.,in the case of RNNs, or weight-sharing FNNs.Suppose a CAP has the form(...,k,t,...,q),where k and t(possibly t=q)are thefirst successive elements with modifiable w v(k,t).Then the length of the suffix list(t,...,q)is called the CAP’s depth (which is0if there are no modifiable links at all).This depth limits how far backwards credit assignment can move down the causal chain tofind a modifiable weight.1Suppose an episode and its event sequence x1,...,x T satisfy a computable criterion used to decide whether a given problem has been solved(e.g.,total error E below some threshold).Then the set of used weights is called a solution to the problem,and the depth of the deepest CAP within the sequence is called the solution’s depth.There may be other solutions(yielding different event sequences)with different depths.Given somefixed NN topology,the smallest depth of any solution is called the problem’s depth.Sometimes we also speak of the depth of an architecture:SL FNNs withfixed topology imply a problem-independent maximal problem depth bounded by the number of non-input layers.Certain SL RNNs withfixed weights for all connections except those to output units(Jaeger,2001;Maass et al.,2002; Jaeger,2004;Schrauwen et al.,2007)have a maximal problem depth of1,because only thefinal links in the corresponding CAPs are modifiable.In general,however,RNNs may learn to solve problems of potentially unlimited depth.Note that the definitions above are solely based on the depths of causal chains,and agnostic of the temporal distance between events.For example,shallow FNNs perceiving large“time windows”of in-put events may correctly classify long input sequences through appropriate output events,and thus solve shallow problems involving long time lags between relevant events.At which problem depth does Shallow Learning end,and Deep Learning begin?Discussions with DL experts have not yet yielded a conclusive response to this question.Instead of committing myself to a precise answer,let me just define for the purposes of this overview:problems of depth>10require Very Deep Learning.The difficulty of a problem may have little to do with its depth.Some NNs can quickly learn to solve certain deep problems,e.g.,through random weight guessing(Sec.5.9)or other types of direct search (Sec.6.6)or indirect search(Sec.6.7)in weight space,or through training an NNfirst on shallow problems whose solutions may then generalize to deep problems,or through collapsing sequences of(non)linear operations into a single(non)linear operation—but see an analysis of non-trivial aspects of deep linear networks(Baldi and Hornik,1994,Section B).In general,however,finding an NN that precisely models a given training set is an NP-complete problem(Judd,1990;Blum and Rivest,1992),also in the case of deep NNs(S´ıma,1994;de Souto et al.,1999;Windisch,2005);compare a survey of negative results(S´ıma, 2002,Section1).Above we have focused on SL.In the more general case of RL in unknown environments,pcc(p,q) is also true if x p is an output event and x q any later input event—any action may affect the environment and thus any later perception.(In the real world,the environment may even influence non-input events computed on a physical hardware entangled with the entire universe,but this is ignored here.)It is possible to model and replace such unmodifiable environmental PCCs through a part of the NN that has already learned to predict(through some of its units)input events(including reward signals)from former input events and actions(Sec.6.1).Its weights are frozen,but can help to assign credit to other,still modifiable weights used to compute actions(Sec.6.1).This approach may lead to very deep CAPs though.Some DL research is about automatically rephrasing problems such that their depth is reduced(Sec.4). In particular,sometimes UL is used to make SL problems less deep,e.g.,Sec.5.10.Often Dynamic Programming(Sec.4.1)is used to facilitate certain traditional RL problems,e.g.,Sec.6.2.Sec.5focuses on CAPs for SL,Sec.6on the more complex case of RL.4Recurring Themes of Deep Learning4.1Dynamic Programming(DP)for DLOne recurring theme of DL is Dynamic Programming(DP)(Bellman,1957),which can help to facili-tate credit assignment under certain assumptions.For example,in SL NNs,backpropagation itself can 1An alternative would be to count only modifiable links when measuring depth.In many typical NN applications this would not make a difference,but in some it would,e.g.,Sec.6.1.be viewed as a DP-derived method(Sec.5.5).In traditional RL based on strong Markovian assumptions, DP-derived methods can help to greatly reduce problem depth(Sec.6.2).DP algorithms are also essen-tial for systems that combine concepts of NNs and graphical models,such as Hidden Markov Models (HMMs)(Stratonovich,1960;Baum and Petrie,1966)and Expectation Maximization(EM)(Dempster et al.,1977),e.g.,(Bottou,1991;Bengio,1991;Bourlard and Morgan,1994;Baldi and Chauvin,1996; Jordan and Sejnowski,2001;Bishop,2006;Poon and Domingos,2011;Dahl et al.,2012;Hinton et al., 2012a).4.2Unsupervised Learning(UL)Facilitating Supervised Learning(SL)and RL Another recurring theme is how UL can facilitate both SL(Sec.5)and RL(Sec.6).UL(Sec.5.6.4) is normally used to encode raw incoming data such as video or speech streams in a form that is more convenient for subsequent goal-directed learning.In particular,codes that describe the original data in a less redundant or more compact way can be fed into SL(Sec.5.10,5.15)or RL machines(Sec.6.4),whose search spaces may thus become smaller(and whose CAPs shallower)than those necessary for dealing with the raw data.UL is closely connected to the topics of regularization and compression(Sec.4.3,5.6.3). 4.3Occam’s Razor:Compression and Minimum Description Length(MDL) Occam’s razor favors simple solutions over complex ones.Given some programming language,the prin-ciple of Minimum Description Length(MDL)can be used to measure the complexity of a solution candi-date by the length of the shortest program that computes it(e.g.,Solomonoff,1964;Kolmogorov,1965b; Chaitin,1966;Wallace and Boulton,1968;Levin,1973a;Rissanen,1986;Blumer et al.,1987;Li and Vit´a nyi,1997;Gr¨u nwald et al.,2005).Some methods explicitly take into account program runtime(Al-lender,1992;Watanabe,1992;Schmidhuber,2002,1995);many consider only programs with constant runtime,written in non-universal programming languages(e.g.,Rissanen,1986;Hinton and van Camp, 1993).In the NN case,the MDL principle suggests that low NN weight complexity corresponds to high NN probability in the Bayesian view(e.g.,MacKay,1992;Buntine and Weigend,1991;De Freitas,2003), and to high generalization performance(e.g.,Baum and Haussler,1989),without overfitting the training data.Many methods have been proposed for regularizing NNs,that is,searching for solution-computing, low-complexity SL NNs(Sec.5.6.3)and RL NNs(Sec.6.7).This is closely related to certain UL methods (Sec.4.2,5.6.4).4.4Learning Hierarchical Representations Through Deep SL,UL,RLMany methods of Good Old-Fashioned Artificial Intelligence(GOFAI)(Nilsson,1980)as well as more recent approaches to AI(Russell et al.,1995)and Machine Learning(Mitchell,1997)learn hierarchies of more and more abstract data representations.For example,certain methods of syntactic pattern recog-nition(Fu,1977)such as grammar induction discover hierarchies of formal rules to model observations. The partially(un)supervised Automated Mathematician/EURISKO(Lenat,1983;Lenat and Brown,1984) continually learns concepts by combining previously learnt concepts.Such hierarchical representation learning(Ring,1994;Bengio et al.,2013;Deng and Yu,2014)is also a recurring theme of DL NNs for SL (Sec.5),UL-aided SL(Sec.5.7,5.10,5.15),and hierarchical RL(Sec.6.5).Often,abstract hierarchical representations are natural by-products of data compression(Sec.4.3),e.g.,Sec.5.10.4.5Fast Graphics Processing Units(GPUs)for DL in NNsWhile the previous millennium saw several attempts at creating fast NN-specific hardware(e.g.,Jackel et al.,1990;Faggin,1992;Ramacher et al.,1993;Widrow et al.,1994;Heemskerk,1995;Korkin et al., 1997;Urlbe,1999),and at exploiting standard hardware(e.g.,Anguita et al.,1994;Muller et al.,1995; Anguita and Gomes,1996),the new millennium brought a DL breakthrough in form of cheap,multi-processor graphics cards or GPUs.GPUs are widely used for video games,a huge and competitive market that has driven down hardware prices.GPUs excel at fast matrix and vector multiplications required not only for convincing virtual realities but also for NN training,where they can speed up learning by a factorof50and more.Some of the GPU-based FNN implementations(Sec.5.16-5.19)have greatly contributed to recent successes in contests for pattern recognition(Sec.5.19-5.22),image segmentation(Sec.5.21), and object detection(Sec.5.21-5.22).5Supervised NNs,Some Helped by Unsupervised NNsThe main focus of current practical applications is on Supervised Learning(SL),which has dominated re-cent pattern recognition contests(Sec.5.17-5.22).Several methods,however,use additional Unsupervised Learning(UL)to facilitate SL(Sec.5.7,5.10,5.15).It does make sense to treat SL and UL in the same section:often gradient-based methods,such as BP(Sec.5.5.1),are used to optimize objective functions of both UL and SL,and the boundary between SL and UL may blur,for example,when it comes to time series prediction and sequence classification,e.g.,Sec.5.10,5.12.A historical timeline format will help to arrange subsections on important inspirations and techni-cal contributions(although such a subsection may span a time interval of many years).Sec.5.1briefly mentions early,shallow NN models since the1940s,Sec.5.2additional early neurobiological inspiration relevant for modern Deep Learning(DL).Sec.5.3is about GMDH networks(since1965),perhaps thefirst (feedforward)DL systems.Sec.5.4is about the relatively deep Neocognitron NN(1979)which is similar to certain modern deep FNN architectures,as it combines convolutional NNs(CNNs),weight pattern repli-cation,and winner-take-all(WTA)mechanisms.Sec.5.5uses the notation of Sec.2to compactly describe a central algorithm of DL,namely,backpropagation(BP)for supervised weight-sharing FNNs and RNNs. It also summarizes the history of BP1960-1981and beyond.Sec.5.6describes problems encountered in the late1980s with BP for deep NNs,and mentions several ideas from the previous millennium to overcome them.Sec.5.7discusses afirst hierarchical stack of coupled UL-based Autoencoders(AEs)—this concept resurfaced in the new millennium(Sec.5.15).Sec.5.8is about applying BP to CNNs,which is important for today’s DL applications.Sec.5.9explains BP’s Fundamental DL Problem(of vanishing/exploding gradients)discovered in1991.Sec.5.10explains how a deep RNN stack of1991(the History Compressor) pre-trained by UL helped to solve previously unlearnable DL benchmarks requiring Credit Assignment Paths(CAPs,Sec.3)of depth1000and more.Sec.5.11discusses a particular WTA method called Max-Pooling(MP)important in today’s DL FNNs.Sec.5.12mentions afirst important contest won by SL NNs in1994.Sec.5.13describes a purely supervised DL RNN(Long Short-Term Memory,LSTM)for problems of depth1000and more.Sec.5.14mentions an early contest of2003won by an ensemble of shallow NNs, as well as good pattern recognition results with CNNs and LSTM RNNs(2003).Sec.5.15is mostly about Deep Belief Networks(DBNs,2006)and related stacks of Autoencoders(AEs,Sec.5.7)pre-trained by UL to facilitate BP-based SL.Sec.5.16mentions thefirst BP-trained MPCNNs(2007)and GPU-CNNs(2006). Sec.5.17-5.22focus on official competitions with secret test sets won by(mostly purely supervised)DL NNs since2009,in sequence recognition,image classification,image segmentation,and object detection. Many RNN results depended on LSTM(Sec.5.13);many FNN results depended on GPU-based FNN code developed since2004(Sec.5.16,5.17,5.18,5.19),in particular,GPU-MPCNNs(Sec.5.19).5.11940s and EarlierNN research started in the1940s(e.g.,McCulloch and Pitts,1943;Hebb,1949);compare also later work on learning NNs(Rosenblatt,1958,1962;Widrow and Hoff,1962;Grossberg,1969;Kohonen,1972; von der Malsburg,1973;Narendra and Thathatchar,1974;Willshaw and von der Malsburg,1976;Palm, 1980;Hopfield,1982).In a sense NNs have been around even longer,since early supervised NNs were essentially variants of linear regression methods going back at least to the early1800s(e.g.,Legendre, 1805;Gauss,1809,1821).Early NNs had a maximal CAP depth of1(Sec.3).5.2Around1960:More Neurobiological Inspiration for DLSimple cells and complex cells were found in the cat’s visual cortex(e.g.,Hubel and Wiesel,1962;Wiesel and Hubel,1959).These cellsfire in response to certain properties of visual sensory inputs,such as theorientation of plex cells exhibit more spatial invariance than simple cells.This inspired later deep NN architectures(Sec.5.4)used in certain modern award-winning Deep Learners(Sec.5.19-5.22).5.31965:Deep Networks Based on the Group Method of Data Handling(GMDH) Networks trained by the Group Method of Data Handling(GMDH)(Ivakhnenko and Lapa,1965; Ivakhnenko et al.,1967;Ivakhnenko,1968,1971)were perhaps thefirst DL systems of the Feedforward Multilayer Perceptron type.The units of GMDH nets may have polynomial activation functions imple-menting Kolmogorov-Gabor polynomials(more general than traditional NN activation functions).Given a training set,layers are incrementally grown and trained by regression analysis,then pruned with the help of a separate validation set(using today’s terminology),where Decision Regularisation is used to weed out superfluous units.The numbers of layers and units per layer can be learned in problem-dependent fashion. This is a good example of hierarchical representation learning(Sec.4.4).There have been numerous ap-plications of GMDH-style networks,e.g.(Ikeda et al.,1976;Farlow,1984;Madala and Ivakhnenko,1994; Ivakhnenko,1995;Kondo,1998;Kord´ık et al.,2003;Witczak et al.,2006;Kondo and Ueno,2008).5.41979:Convolution+Weight Replication+Winner-Take-All(WTA)Apart from deep GMDH networks(Sec.5.3),the Neocognitron(Fukushima,1979,1980,2013a)was per-haps thefirst artificial NN that deserved the attribute deep,and thefirst to incorporate the neurophysiolog-ical insights of Sec.5.2.It introduced convolutional NNs(today often called CNNs or convnets),where the(typically rectangular)receptivefield of a convolutional unit with given weight vector is shifted step by step across a2-dimensional array of input values,such as the pixels of an image.The resulting2D array of subsequent activation events of this unit can then provide inputs to higher-level units,and so on.Due to massive weight replication(Sec.2),relatively few parameters may be necessary to describe the behavior of such a convolutional layer.Competition layers have WTA subsets whose maximally active units are the only ones to adopt non-zero activation values.They essentially“down-sample”the competition layer’s input.This helps to create units whose responses are insensitive to small image shifts(compare Sec.5.2).The Neocognitron is very similar to the architecture of modern,contest-winning,purely super-vised,feedforward,gradient-based Deep Learners with alternating convolutional and competition lay-ers(e.g.,Sec.5.19-5.22).Fukushima,however,did not set the weights by supervised backpropagation (Sec.5.5,5.8),but by local un supervised learning rules(e.g.,Fukushima,2013b),or by pre-wiring.In that sense he did not care for the DL problem(Sec.5.9),although his architecture was comparatively deep indeed.He also used Spatial Averaging(Fukushima,1980,2011)instead of Max-Pooling(MP,Sec.5.11), currently a particularly convenient and popular WTA mechanism.Today’s CNN-based DL machines profita lot from later CNN work(e.g.,LeCun et al.,1989;Ranzato et al.,2007)(Sec.5.8,5.16,5.19).5.51960-1981and Beyond:Development of Backpropagation(BP)for NNsThe minimisation of errors through gradient descent(Hadamard,1908)in the parameter space of com-plex,nonlinear,differentiable,multi-stage,NN-related systems has been discussed at least since the early 1960s(e.g.,Kelley,1960;Bryson,1961;Bryson and Denham,1961;Pontryagin et al.,1961;Dreyfus,1962; Wilkinson,1965;Amari,1967;Bryson and Ho,1969;Director and Rohrer,1969;Griewank,2012),ini-tially within the framework of Euler-LaGrange equations in the Calculus of Variations(e.g.,Euler,1744). Steepest descent in such systems can be performed(Bryson,1961;Kelley,1960;Bryson and Ho,1969)by iterating the ancient chain rule(Leibniz,1676;L’Hˆo pital,1696)in Dynamic Programming(DP)style(Bell-man,1957).A simplified derivation of the method uses the chain rule only(Dreyfus,1962).The methods of the1960s were already efficient in the DP sense.However,they backpropagated derivative information through standard Jacobian matrix calculations from one“layer”to the previous one, explicitly addressing neither direct links across several layers nor potential additional efficiency gains due to network sparsity(but perhaps such enhancements seemed obvious to the authors).。
翻译A fast learning algorithm for deep belief nets
基于深度置信网络的快速学习算法A fast learning algorithm for deep belief nets摘要本文展示了如何运用“互补先验”来消除使得在多隐层密度连接型置信网络中推理困难的explaining away现象。
利用互补先验,我们提出了一个快速贪婪算法,用于学习深度有向置信网络,每次学习一层,为最顶上的两层提供无向关联记忆。
快速贪婪算法用来初始化一个更慢的的学习过程,这个过程是用wake-sleep算法的对比版本来微调权值。
在微调之后,一个三层隐含层的网络生成了一个很好的手写数字图像和其它记号的联合分布生成模型。
这个生成模型能比判别式学习算法更好的分类数字。
这些存在数字的低维副本通过顶层关联记忆的自由能量地形的长峡谷建模,利用有向关系去表现脑海中的关联记忆,很容易找到这些峡谷。
1、介绍在一些含有多个隐层的密度连接有向置信网络中,学习是困难的,因为给定一个数据向量,要推断隐含活动的条件分布是很难的。
变分方法简单地去近似真实的条件分布,但是这些近似可能很差,特别是在假设先验独立的最深的隐层。
而且,很多学习仍需要所有的参数一起学习,造成学习时间随参数增加而剧增。
图1 这个网络用来建模数字图像的联合分布。
我们设计了一个模型,模型的顶部两层来自于一个无向联想记忆(见图1),剩下的隐层来自于一个有向无环图,这个有向无环图能将联想记忆转换为像像素点那样的观察变量的。
这种混合模型有很多优点:1、可以利用快速贪婪算法来快速的寻找一个很好的参数集合,甚至是有数百万参数和很多隐层的深度网络。
2、学习算法是无监督的,但是可以通过学习一个生成标记和数据的模型,从而使的模型同样适用于有标记的样本。
3、提出微调算法来学习优秀的生成模型,是一个优于用于手写数字MNIST 数据库的判别式算法的算法。
4、生成模型更易于解释深度隐层的分布情况。
5、用于形成认知的推断又快又准。
6、学习算法是本地的:神经元强度的调整仅取决于前端神经元和后端神经元的状态。
卷积神经网络机器学习外文文献翻译中英文2020
卷积神经网络机器学习相关外文翻译中英文2020英文Prediction of composite microstructure stress-strain curves usingconvolutional neural networksCharles Yang,Youngsoo Kim,Seunghwa Ryu,Grace GuAbstractStress-strain curves are an important representation of a material's mechanical properties, from which important properties such as elastic modulus, strength, and toughness, are defined. However, generating stress-strain curves from numerical methods such as finite element method (FEM) is computationally intensive, especially when considering the entire failure path for a material. As a result, it is difficult to perform high throughput computational design of materials with large design spaces, especially when considering mechanical responses beyond the elastic limit. In this work, a combination of principal component analysis (PCA) and convolutional neural networks (CNN) are used to predict the entire stress-strain behavior of binary composites evaluated over the entire failure path, motivated by the significantly faster inference speed of empirical models. We show that PCA transforms the stress-strain curves into an effective latent space by visualizing the eigenbasis of PCA. Despite having a dataset of only 10-27% of possible microstructure configurations, the mean absolute error of the prediction is <10% of therange of values in the dataset, when measuring model performance based on derived material descriptors, such as modulus, strength, and toughness. Our study demonstrates the potential to use machine learning to accelerate material design, characterization, and optimization.Keywords:Machine learning,Convolutional neural networks,Mechanical properties,Microstructure,Computational mechanics IntroductionUnderstanding the relationship between structure and property for materials is a seminal problem in material science, with significant applications for designing next-generation materials. A primary motivating example is designing composite microstructures for load-bearing applications, as composites offer advantageously high specific strength and specific toughness. Recent advancements in additive manufacturing have facilitated the fabrication of complex composite structures, and as a result, a variety of complex designs have been fabricated and tested via 3D-printing methods. While more advanced manufacturing techniques are opening up unprecedented opportunities for advanced materials and novel functionalities, identifying microstructures with desirable properties is a difficult optimization problem.One method of identifying optimal composite designs is by constructing analytical theories. For conventional particulate/fiber-reinforced composites, a variety of homogenizationtheories have been developed to predict the mechanical properties of composites as a function of volume fraction, aspect ratio, and orientation distribution of reinforcements. Because many natural composites, synthesized via self-assembly processes, have relatively periodic and regular structures, their mechanical properties can be predicted if the load transfer mechanism of a representative unit cell and the role of the self-similar hierarchical structure are understood. However, the applicability of analytical theories is limited in quantitatively predicting composite properties beyond the elastic limit in the presence of defects, because such theories rely on the concept of representative volume element (RVE), a statistical representation of material properties, whereas the strength and failure is determined by the weakest defect in the entire sample domain. Numerical modeling based on finite element methods (FEM) can complement analytical methods for predicting inelastic properties such as strength and toughness modulus (referred to as toughness, hereafter) which can only be obtained from full stress-strain curves.However, numerical schemes capable of modeling the initiation and propagation of the curvilinear cracks, such as the crack phase field model, are computationally expensive and time-consuming because a very fine mesh is required to accommodate highly concentrated stress field near crack tip and the rapid variation of damage parameter near diffusive cracksurface. Meanwhile, analytical models require significant human effort and domain expertise and fail to generalize to similar domain problems.In order to identify high-performing composites in the midst of large design spaces within realistic time-frames, we need models that can rapidly describe the mechanical properties of complex systems and be generalized easily to analogous systems. Machine learning offers the benefit of extremely fast inference times and requires only training data to learn relationships between inputs and outputs e.g., composite microstructures and their mechanical properties. Machine learning has already been applied to speed up the optimization of several different physical systems, including graphene kirigami cuts, fine-tuning spin qubit parameters, and probe microscopy tuning. Such models do not require significant human intervention or knowledge, learn relationships efficiently relative to the input design space, and can be generalized to different systems.In this paper, we utilize a combination of principal component analysis (PCA) and convolutional neural networks (CNN) to predict the entire stress-strain c urve of composite failures beyond the elastic limit. Stress-strain curves are chosen as the model's target because t hey are difficult to predict given their high dimensionality. In addition, stress-strain curves are used to derive important material descriptors such as modulus, strength, and toughness. In this sense, predicting stress-straincurves is a more general description of composites properties than any combination of scaler material descriptors. A dataset of 100,000 different composite microstructures and their corresponding stress-strain curves are used to train and evaluate model performance. Due to the high dimensionality of the stress-strain dataset, several dimensionality reduction methods are used, including PCA, featuring a blend of domain understanding and traditional machine learning, to simplify the problem without loss of generality for the model.We will first describe our modeling methodology and the parameters of our finite-element method (FEM) used to generate data. Visualizations of the learned PCA latent space are then presented, a long with model performance results.CNN implementation and trainingA convolutional neural network was trained to predict this lower dimensional representation of the stress vector. The input to the CNN was a binary matrix representing the composite design, with 0's corresponding to soft blocks and 1's corresponding to stiff blocks. PCA was implemented with the open-source Python package scikit-learn, using the default hyperparameters. CNN was implemented using Keras with a TensorFlow backend. The batch size for all experiments was set to 16 and the number of epochs to 30; the Adam optimizer was used to update the CNN weights during backpropagation.A train/test split ratio of 95:5 is used –we justify using a smaller ratio than the standard 80:20 because of a relatively large dataset. With a ratio of 95:5 and a dataset with 100,000 instances, the test set size still has enough data points, roughly several thousands, for its results to generalize. Each column of the target PCA-representation was normalized to have a mean of 0 and a standard deviation of 1 to prevent instable training.Finite element method data generationFEM was used to generate training data for the CNN model. Although initially obtained training data is compute-intensive, it takes much less time to train the CNN model and even less time to make high-throughput inferences over thousands of new, randomly generated composites. The crack phase field solver was based on the hybrid formulation for the quasi-static fracture of elastic solids and implementedin the commercial FEM software ABAQUS with a user-element subroutine (UEL).Visualizing PCAIn order to better understand the role PCA plays in effectively capturing the information contained in stress-strain curves, the principal component representation of stress-strain curves is plotted in 3 dimensions. Specifically, we take the first three principal components, which have a cumulative explained variance ~85%, and plot stress-strain curves in that basis and provide several different angles from which toview the 3D plot. Each point represents a stress-strain curve in the PCA latent space and is colored based on the associated modulus value. it seems that the PCA is able to spread out the curves in the latent space based on modulus values, which suggests that this is a useful latent space for CNN to make predictions in.CNN model design and performanceOur CNN was a fully convolutional neural network i.e. the only dense layer was the output layer. All convolution layers used 16 filters with a stride of 1, with a LeakyReLU activation followed by BatchNormalization. The first 3 Conv blocks did not have 2D MaxPooling, followed by 9 conv blocks which did have a 2D MaxPooling layer, placed after the BatchNormalization layer. A GlobalAveragePooling was used to reduce the dimensionality of the output tensor from the sequential convolution blocks and the final output layer was a Dense layer with 15 nodes, where each node corresponded to a principal component. In total, our model had 26,319 trainable weights.Our architecture was motivated by the recent development and convergence onto fully-convolutional architectures for traditional computer vision applications, where convolutions are empirically observed to be more efficient and stable for learning as opposed to dense layers. In addition, in our previous work, we had shown that CNN's werea capable architecture for learning to predict mechanical properties of 2Dcomposites [30]. The convolution operation is an intuitively good fit forpredicting crack propagation because it is a local operation, allowing it toimplicitly featurize and learn the local spatial effects of crack propagation.After applying PCA transformation to reduce the dimensionality ofthe target variable, CNN is used to predict the PCA representation of thestress-strain curve of a given binary composite design. After training theCNN on a training set, its ability to generalize to composite designs it hasnot seen is evaluated by comparing its predictions on an unseen test set.However, a natural question that emerges i s how to evaluate a model's performance at predicting stress-strain curves in a real-world engineeringcontext. While simple scaler metrics such as mean squared error (MSE)and mean absolute error (MAE) generalize easily to vector targets, it isnot clear how to interpret these aggregate summaries of performance. It isdifficult to use such metrics to ask questions such as “Is this modeand “On average, how poorly will aenough to use in the real world” given prediction be incorrect relative to some given specification”. Although being able to predict stress-strain curves is an importantapplication of FEM and a highly desirable property for any machinelearning model to learn, it does not easily lend itself to interpretation. Specifically, there is no simple quantitative way to define whether two-world units.stress-s train curves are “close” or “similar” with real Given that stress-strain curves are oftentimes intermediary representations of a composite property that are used to derive more meaningful descriptors such as modulus, strength, and toughness, we decided to evaluate the model in an analogous fashion. The CNN prediction in the PCA latent space representation is transformed back to a stress-strain curve using PCA, and used to derive the predicted modulus, strength, and toughness of the composite. The predicted material descriptors are then compared with the actual material descriptors. In this way, MSE and MAE now have clearly interpretable units and meanings. The average performance of the model with respect to the error between the actual and predicted material descriptor values derived from stress-strain curves are presented in Table. The MAE for material descriptors provides an easily interpretable metric of model performance and can easily be used in any design specification to provide confidence estimates of a model prediction. When comparing the mean absolute error (MAE) to the range of values taken on by the distribution of material descriptors, we can see that the MAE is relatively small compared to the range. The MAE compared to the range is <10% for all material descriptors. Relatively tight confidence intervals on the error indicate that this model architecture is stable, the model performance is not heavily dependent on initialization, and that our results are robust to differenttrain-test splits of the data.Future workFuture work includes combining empirical models with optimization algorithms, such as gradient-based methods, to identify composite designs that yield complementary mechanical properties. The ability of a trained empirical model to make high-throughput predictions over designs it has never seen before allows for large parameter space optimization that would be computationally infeasible for FEM. In addition, we plan to explore different visualizations of empirical models-box” of such models. Applying machine in an effort to “open up the blacklearning to finite-element methods is a rapidly growing field with the potential to discover novel next-generation materials tailored for a variety of applications. We also note that the proposed method can be readily applied to predict other physical properties represented in a similar vectorized format, such as electron/phonon density of states, and sound/light absorption spectrum.ConclusionIn conclusion, we applied PCA and CNN to rapidly and accurately predict the stress-strain curves of composites beyond the elastic limit. In doing so, several novel methodological approaches were developed, including using the derived material descriptors from the stress-strain curves as interpretable metrics for model performance and dimensionalityreduction techniques to stress-strain curves. This method has the potential to enable composite design with respect to mechanical response beyond the elastic limit, which was previously computationally infeasible, and can generalize easily to related problems outside of microstructural design for enhancing mechanical properties.中文基于卷积神经网络的复合材料微结构应力-应变曲线预测查尔斯,吉姆,瑞恩,格瑞斯摘要应力-应变曲线是材料机械性能的重要代表,从中可以定义重要的性能,例如弹性模量,强度和韧性。
A Cascaded Scheme for Recognition of Handprinted Numerals
A Cascaded Scheme for Recognition of Handprinted NumeralsU.Bhattacharya T.K.Das B.B.ChaudhuriCVPR Unit,Indian Statistical Institute,Kolkata,Indiaujjwal,das t,bbc@isical.ac.inAbstractThis paper proposes a novel off-line handprinted Bangla(a major Indian script)numeral recognition scheme using a multistage classifier system comprising multilayer percep-tron(MLP)neural networks.In this scheme we consider multiresolution features based on wavelet transforms.We start from certain coarse resolution level of wavelet repre-sentation and if rejection occurs at this level of the classifier, the input pattern is passed to a larger MLP network corre-sponding to the next higher resolution level.For simplicity and efficiency we considered only three coarse-to-fine res-olution levels in the present work.The system was trained and tested on a database of9000samples of handprinted Bangla(a major Indian script)numerals.For improved generalization and to avoid overtraining,the whole avail-able data set had been divided into three subsets–train-ing set,validation set and test set.We achieved94.96% and93.025%correct recognition rates on training and test sets respectively.The proposed recognition scheme is ro-bust with respect to various writing styles and sizes as well as presence of considerable noise.Moreover,the present scheme is sufficiently fast for its real-life applications. 1.IntroductionOff-line recognition of handwritten characters,in particular numerals has been a topic of intensive research during last few years.The application areas include postal code read-ing,automatic processing of bank cheques,office automa-tion and various other scientific and business applications.Automatic recognition of handwritten characters is diffi-cult because of variations in style,size,shape,orientation etc.,presence of noise and factors related to the writing instrument,writing surface,scanning device etc.To sim-plify the recognition scheme,many existing methods put constraints on handwriting with respect to tilt,size,relative positions,stroke connections,distortions etc.In this paper we consider numeral characters written inside rectangular boxes offixed size.Enough research papers are found in English[19],Chi-nese[20],Korean[10],Arabic[1],Kanji[22]and other languages.For example see the review in[17].However,only preliminary work[16,2,3]has been done on a script like Bangla,the second-most popular language and script in the Indian subcontinent and thefifth-most popular language in the world.In the previous census,it was found that only 3%of the educated population of West Bengal,the major Bangla speaking state of India,knows a foreign language (mainly English).One of the important issues related to handwriting recog-nition is the determination of a feature set which is reason-ably invariant with respect to shape variations caused by various writing styles.To tackle the problem we have cho-sen a wavelet based multistage approach.Wavelet based ap-proach has been used for handwritten character recognition previously[21,11,9]but not in a cascaded manner used by us.In wavelet analysis,the frequency of the basis function as well as the scale can be changed and thus it is possible to exploit the fact that high frequency features of a function are localized while low frequency features are spread over time. Real life images are composed of large areas of similar in-formation but sharp changes at object edges.The biological eyes are more sensitive to object edges rather than minor de-tails inside.Thus,in many situations,wavelet based tech-niques are suitable for image processing tasks.Moreover, wavelet,as a problem-solving toolfits naturally with digital computer with its basis functions defined by just multipli-cation and addition operators–there are no derivatives or integrals.In this paper,a three stage system is proposed where fea-tures in the form of wavelet coefficient matrices at different resolution levels are considered at three different stages of the recognition system.MLP networks with different ar-chitecture is used as classifiers.In the initial stage,a nu-meral is subjected to recognition using the low-low part of the wavelet coefficient matrix as the feature set.If the input character is not classified at this level,it is passed to the next stage using wavelet coefficients of the next higher level of resolution.If the pattern is again rejected at this stage,at-tempt is made by the last stage where the next higher level of wavelet features are considered.In this scheme,depicted in Fig1,feature vectors are ob-tained by convolving the Daubechies-4wavelets[6]with a character image.Three MLP network architectures are trained using training sets at three coarse-to-fine resolutionFigure1:Smooth smooth components of wavelet de-composition of a Bangla numeral(one)image at different resolution levels Original image resolution level resolution level resolution level levels.Also a validation sample set[8]is used to determine the termination of training.The classification strategy is to place an input character to one of the possible10categories or reject it.The rejection criterion is chosen interactively so that the misclassification is minimized on the union of training and validation sets.Rejected candidates are given higher representations during the MLP training for the next two recognition stages.This helps in reducing the rejection percentage[14]at the higher stages.The rejection criterion at thefinal stage is determined by maximizing the correct classification on the above union set.The rest of this article is organized as follows.Section 2and3respectively provide brief overviews of multilayer perceptron and wavelet transform.We describe the prepro-cessing,training of the set of MLP networks and the multi-stage recognition scheme in Section4.Experimental results are reported in Section5.Concluding remarks are given in Section6.2.Multilayer PerceptronMultilayer perceptron neural network model is possibly the most well-known neural network architecture[15].The strengths of connections between nodes in different layers are called weights.Such weights are usually initialized with small random values and in the present application we con-sidered random values between-0.5to+0.5obtained from a uniform distribution.Thefinal weights may be obtained in an iterative manner by using the so-called backpropaga-tion(BP)algorithm[18]This training algorithm performs steepest descent in the connection weight space on an error surface defined by(1) where,are respectively,the target and output vectors corresponding to the-th input pattern.The system error is defined as(2) where is the total number of patterns in the training set.In classical BP algorithm,weight modification rules are given by(3) where is the weight connecting a node and another node in the next upper layer at time and is a positive constant,called the learning rate.To tackle the problem of possible local minima and slow convergence,Rumelhart et al.suggested the use of momen-tum term when the weight modification rule(3)becomes(4)where is the change in the corresponding weight during time,and is a constant, called the momentum factor[18].In many situations,the inclusion of this momentum in the weight modification rule increases the speed of conver-gence of the algorithm to some extent.However,in real life applications,futher improvement of this learning algo-rithm is essential and there exits a large number of such modifications of this algorithm in the literature.In fact,in the present application,we used a modified BP algorithm which considers a distinct self-adaptive learning rate corre-sponding to each individual connection ing such self-adaptive learning rates,the weight modification rule(4) becomes[4](5) where.is called the effec-tive value function and is a constant.The self-adaptive learning rates are modified as follows:(6) whereis a constant of proportionality.The learning performance of an MLP network using this modified BP algorithm does not depend much on the choice of.The constant determines the maximum value which can be assumed by a learning rate.In the present applica-tion,the value of and are always taken as0.1and4.0 respectively.3.Wavelet Descriptor for Multiresolu-tion Feature ExtractionIn wavelet analysis,an input signal is decomposed into dif-ferent frequency components and then each component is presented with a resolution matched to its scale.Thus a wavelet provides a tool for time-frequency localization.A wavelet system is a set of building blocks used to represent a function.It is a two-dimensional set of ba-sis functions w such that any square integrable function can be expressed asfor some set of coefficients.Gen-eral wavelet basis functions,,may be obtained from a mother wavelet,,by shrinking by a factor of and trans-lating by,namelyHere represents the dilation number and represents the translation number.The scale factor normalizesso that.For certain choices of the mother wavelet function,the set of functions form an or-thonormal basis and hence any given function may be ap-proximated by these basis functions.Thefirst and simplest possible orthogonal wavelet sys-tem is the Haar wavelet(Thesis of A.Haar,1909).However, Daubechies constructed a set of orthonormal wavelet basis function that are perhaps the most elegant.These wavelets are compactly supported in the time-domain and have good frequency domain decay.The above describes the reason behind our choice of Daubechies’wavelet transform.A particular family of wavelets is specified by a particular set of numbers,called waveletfilter coefficients.The simplest member of the Daubechies’family is the Daubechies4which has been con-sidered in our implementation.The layout of application of wavelet transform recursively on an image is shown in Fig-ure2.The successive application of the transform produces an increasingly smoother version of the original image.One most useful aspect of wavelet transform is the existence of fast computation algorithmn by means of multiresolution analysis[13].Moreover,the algorithm for two-dimensional transform is a straightforward extension of that for the one-dimensional transform.4.Recognition Scheme4.1.PreprocessingAs preprocessing,we considered only size normalization. The input grey scale image isfirst scaled to image by using the moment method[5].No further preprocessing like tilt correction,smoothing etc.areconsidered.Figure2:Layout of Wavelet decomposition for an image(L low-passfilter,H high-passfilter,L th level)4.2.Training of MLPsTwo important aspects of the training of MLP networks are Designing the training sets andTermination of trainingDesigning the training set.The recognition performance of an MLP network highly depends on the choice of a repre-sentative training set.Manual selection of training samples is definitely a good approach to this problem.However, since this approach is extremely tedious,we have chosen the training set randomly from the available data.In fact, we performed random selections with respect to three dif-ferent seed values and experimental results will be provided corresponding to the best of these three.In our simulations,the size of the training set is20%of the available data size.The MLP network in thefirst stage, uses this set for its training.However,training sets used for MLP networks of the latter stages are slightly different.We increased the representations of training samples rejected by the MLP of thefirst stage,by a factor of four to form the sample set for the training of MLP at the second stage. Similarly,the sample set for the training of the MLP at the third stage is formed from thefirst level training set by in-creasing the representations of elements rejected at the sec-ond level,by a factor of eight.This approach of enhanced representations helps in increasing the correct classification percentage.Termination of training.There are various termination criteria available in the literature.The recognition perfor-mance highly depends on how much training has been given to the network.An effective strategy of judging training ad-equacy is the use of a validation set.With increased train-ing,the recognition error on the validation set decreases monotonically to a minimum value but then it starts to in-crease ,even if the training error continues to decrease.For better network performance,training is terminated when the validation error reaches its minimum.In our simulations,we considered 15%of the data as the validation set.4.3.Recognition schemeThe proposed recognition scheme has been simulated on the domain of handprinted Bangla numerals.Ideal Bangla numerals and 70handwritten samples (7different numeral characters per class)are shown in Fig.3.(a)(b)Figure 3:Ideal samples of Bangla numerals A typi-cal sample data subset of handwritten Bangla numeralsThe bounding box (minimum possible rectangle enclos-ing the image)of an input image of a numeral is first com-puted and then this is normalized to the sizeus-ing the moment method [5].Wavelet decomposition algo-rithm is applied to this normalized image recursively forfour times to obtainsmooth smooth approxima-tion of the original image.In this procedure,we also obtain,and smooth smooth approxima-tions of the original image.Theoretically,this decomposi-tion algorithm could be applied for the fifth time to obtain approximation.However,during our simulations,it is observed that different numerals are not generally distin-guishable from these smallest possible approximations.The above approximations of the original image aregray-valued images and we apply simple thresolding tech-nique to obtain these as binary images.The present scheme is a 3-stage recognition scheme.In the first stage binarizedversion ofapproximation of the original image is fed to the input layer.Different responses at the nodes of the output layer are ually the output node with maximum value recognizes the input image.However,in the present problem this approach is not suitable because it leads to unacceptably high misclassification percentage.On the other hand,we interactively determine a thresoldvalue ()of the diference between the maximum ()and second maximum ()values among the output nodes so that the misclassification from the union of the training and validation sets is minimized.Thus,if corresponding to aninput numeralholds,then it is recognized to belong to numeral class correponding to the output nodehaving the value;otherwise the input numeral is said to be rejected by the initial stage of classification.In case of rejection by the first stage of the classification scheme,it ispassed to the next stage withsmooth smooth com-ponent of the transformed image.Again similar recognition procedure is followed in this middle level of recognition and if it is rejected for the second time a third and last attemptis made using thesmooth smooth component of the transformed image.During our simulation runs,it has been observed that by extending the proposed classification scheme into further stages cannot improve the classification accuracy.5.Experimental ResultsThe authors do not have information of the availability of any standard database of handprinted Bangla numerals.So,a database has been generated for simulation purposes.In our simulation,we considered a data set of 9000hand-printed Bangla numerals equally distributed over all classes.These data has been collected from different sections of the population of West Bengal,India keeping in mind variations with respect to age,sex,education,place of origin,income group and profession by a number of University students over a period of more than one year.Since there appears variation in the writing style of a single individual at differ-ent points of time,each individual has been approached on 4occassions for the sample.Out of this set of 9000data,the training set,valida-tion set and test set consist of 1800,1440and 5760sam-ples respectively.In the first stage of classification scheme,18.72%of the test data are rejected out of which 67.38%and 2.86%are respectively classified by the second stage and third stage of the classification scheme.The overall rejection percentage is 1.97%and misclassfication is only 5.005%.Thus we achieved 93.025%correct classification accuracy.The confusion matrices at different levels of clas-Confusion Matrix of the First Stage ofClassification Confusion Matrix of the Second Stage ofClassification sification and also the overall classification result is given in the Tables below.This recognition performance of the proposed scheme is better than the exiting ones.The cor-rect recognition percentages reported in[16]and[3]are respectively91.98%and90.56%.On the other hand,the present approach can recognize sixty numerals per second on a Pentium-IV Desktop Computer which is enough for any real-life applications.Confusion Matrix of the Third Stage ofClassification6.ConclusionWavelets have been studied thoroughly during the last decade[7]and recently the researchers are applying it in variousfields of mathematics and engineering.Its poten-tial in image compression tasks has been already established Overall Confusion Matrix On The TestSetFigure4:Smooth smooth components of wavelet de-composition of a noisy image of a Bangla numeral(one) at different resolution levels Original noisy image resolution level resolution level resolution leveland its applicability in various other image processing prob-lems are getting explored.In this paper we proposed an efficient multistage approach using multiresolution wavelet features and multilayer perceptron clasifiers.As it is seen from the simulation results,the proposed approach can pro-vide very good recognition result on handprinted Bangla numerals.The wavelet based features are also not affected in the presence of moderate noise or discontinuity or small changes in orientation.In Figure4,we have shown the nu-meral of Figure1affected by noise and its wavelet-based feature images at different resolution levels.From Figure4, it is clear that higher resolution levels are more sensitive to noise.During our simulation runs,it is observed that reso-lution level contributes most to recognition.Here it should be noted that consideration of this multi-stage classification is useful because if all the test patterns are fed to each of the three individual MLP classifiers,then none of the rejection sets or sets of misclassified samples is a subset of another.On the other hand,considering components at the initial stage helps to minimize the aver-age computation time.Finally,there exists striking resemblance between Quadrature Mirror Filters(QMF)known in subband cod-ing techniques in thefield of signal processing and the or-thonomal bases of the wavelet analysis.In fact,the waveletalgorithm is a form of subbandfiltering and most of the computations of wavelets can be implemented usingfilter banks.Moreover,Lewis and Knowles have also devised a clever idea of quantising the wavelet coefficients in order to build a simple VLSI arhitecture without multipliers[12]. References[1]A.Amin and H.B.Al-Sadoun,“Hand Printed ArabicCharacter Recognition System”,12th.Int.Conf.Pat-tern Recognition,pp.536-539,1994.[2]U.Bhattacharya,T.K.Das,A.Dutta,S.K.Parui and B.B.Chaudhuri,“Self-organizing Neural Network-BasedSystem for Recognition of Handprinted Bangla Numer-als”,Proc.of XXXVI Ann.Convention of Comp.Soc.of India-CSI2001,Kolkata,2001,pp.C-92-C-96.[3]U.Bhattacharya,T.K.Das,A.Dutta,S.K.Parui andB.B.Chaudhuri,“Recognition of Handprinted BanglaNumerals using Neural Network Models”,Advances in Soft Computing–AFSS2002,Springer Verlag Lecture Notes on Artificial Intelligence(LNAI-2275),Eds.N.R.Pal and M.Sugeno,pp.228-235,2002.[4]U.Bhattacharya and S.K.Parui,“Self-Adaptive Learn-ing Rates in Backpropagation Algorithm Improve Its Function Approximation Performance”,Proc.of the IEEE International Conference on Neural Networks, Australia,pp.2784-2788,1995.[5]R.G.Casey,“Moment normalization of handprintedcharacters”,IBM J.Res.Develop.,vol.14,pp.548-557, 1970.[6]I.Daubechies,“The wavelet transform,time-frequencylocalization and signal analysis”,IEEE Trans.on Infor-mation Theory,vol.36,no.5,pp.961-1005,1990. [7]A.Graps,“An introduction to wavelets”,IEEE Compu-tational Science and Engineering,vol.2(2),1995.[8]M.H.Hassoun,Fundamentals of Artificial Neural Net-works.Cambridge:The MIT Press,1995,p.226.[9]L.Huang and X.Huang,“Multiresolution Recogni-tion of Offline Handwritten Chinese Characters with Wavelet Transform”,Proc.of Sixth ICDAR,Seattle, Washington,USA,Sept.10-13,2001,pp.631-634. [10]S.W.Lee and J.S.Park,“Nonlinear shape normaliza-tion methods for the recognition of large-set handwrit-ten characters,Pattrn Recognition,vol.27,pp.895-902, 1994.[11]S.W.Lee,C.H.Kim,H.Ma and Y.Y.Tang,“Mul-tiresolution Recognition of Unconstrained Handwritten Numerals with Wavelet Transform and Multilayer Clus-ter Network”,Pattern Recognition,vol.29,no.12,pp.1953-1961,1996.[12]A.S.Lewis,G.Knowles,“A VLSI architecture for thediscrete wavelet transform without multipliers”,Elec-tronics Letters vol.27(2),pp.171-173,1991.[13]S.G.Mallat,“A theory for multiresolution signaldecomposition:The wavelet representation”,IEEE Trans.on Pattern Anal.and Machine Int.,vol.11,no.7,pp674-693,1989.[14]T.Masters,Practical Neural Network Recipes inC++,New York:Academic Press,1993,p.18. [15]R.Hecht-Nielson,Neurocomputing,Addison-Wesley:New York,Chapter5,1990.[16]U.Pal and B.B.Chaudhuri,“Automatic Recognitionof Unconstrained Off-line Bangla Hand-written Nu-merals”,Advances in Multimodal Interfaces,Springer Verlag Lecture Notes on Computer Science(LNCS-1948),Eds.T.Tan,Y.Shi and W.Gao.(2000)371-378.[17]R.Plamondon,S.N.Srihari,“On-Line and Off-LineHandwriting recognition:A Comprehensive Survey”, IEEE Trans.Pattern Analysis and Machine Intelli-gence,vol.22,no.1,pp.63-84,2000.[18]D.E.Rumelhart,G.E.Hinton and R.J.Williams,“Learning internal representations by error propaga-tion’,Parallel Distributed Processing:Explorations in the Microstructure of Cognition,Volume1:Founda-tions,D.E.Rumelhart and J.L.McClelland Eds.Cam-bridge,MA:The MIT Press,pp.318-362,1986. [19]S.N.Srihari,E.Cohen,J.J.Hull and L.Kuan,“Asystem to locate and recognize ZIP codes in handwrit-ten addresses”,IJRE,vol.1,pp.37-45,1989.[20]J.Tsukumo and H.Tanaka,“Classification of Hand-printed Chinese Characters Using Nonlinear Normal-ization Methods”,9th.Int.Conf.Pattern Recognition, pp.168-171,1988.[21]P.Wunsch and ine,“Wavelet Descriptors forMultiresolution Recognition of Handprinted Charac-ters”,Pattern Recognition,vol.28,no.8,pp.1237-1249,1995.[22]H.Yamada,K.Yamamoto and T.Saito,“A non-linearnormalization method for handprinted Kanji character recognition–line density equalization”,Pattern Recog-nition,vol.23,pp.1023-1029,1990.。
纹理物体缺陷的视觉检测算法研究--优秀毕业论文
摘 要
在竞争激烈的工业自动化生产过程中,机器视觉对产品质量的把关起着举足 轻重的作用,机器视觉在缺陷检测技术方面的应用也逐渐普遍起来。与常规的检 测技术相比,自动化的视觉检测系统更加经济、快捷、高效与 安全。纹理物体在 工业生产中广泛存在,像用于半导体装配和封装底板和发光二极管,现代 化电子 系统中的印制电路板,以及纺织行业中的布匹和织物等都可认为是含有纹理特征 的物体。本论文主要致力于纹理物体的缺陷检测技术研究,为纹理物体的自动化 检测提供高效而可靠的检测算法。 纹理是描述图像内容的重要特征,纹理分析也已经被成功的应用与纹理分割 和纹理分类当中。本研究提出了一种基于纹理分析技术和参考比较方式的缺陷检 测算法。这种算法能容忍物体变形引起的图像配准误差,对纹理的影响也具有鲁 棒性。本算法旨在为检测出的缺陷区域提供丰富而重要的物理意义,如缺陷区域 的大小、形状、亮度对比度及空间分布等。同时,在参考图像可行的情况下,本 算法可用于同质纹理物体和非同质纹理物体的检测,对非纹理物体 的检测也可取 得不错的效果。 在整个检测过程中,我们采用了可调控金字塔的纹理分析和重构技术。与传 统的小波纹理分析技术不同,我们在小波域中加入处理物体变形和纹理影响的容 忍度控制算法,来实现容忍物体变形和对纹理影响鲁棒的目的。最后可调控金字 塔的重构保证了缺陷区域物理意义恢复的准确性。实验阶段,我们检测了一系列 具有实际应用价值的图像。实验结果表明 本文提出的纹理物体缺陷检测算法具有 高效性和易于实现性。 关键字: 缺陷检测;纹理;物体变形;可调控金字塔;重构
Keywords: defect detection, texture, object distortion, steerable pyramid, reconstruction
II
人工智能深度学习技术练习(习题卷4)
人工智能深度学习技术练习(习题卷4)说明:答案和解析在试卷最后第1部分:单项选择题,共50题,每题只有一个正确答案,多选或少选均不得分。
1.[单选题]Tf.nn.softmax_cross_entropy_with_logits函数是TensorFlow中常用的求( )的函数,即计算labels和logits之间的交叉熵(cross entropy)A)信息熵B)信息元C)logitsD)交叉熵2.[单选题]Which of the following are reasons for using feature scaling?A)It prevents the matrix XTX (used in the normal equation) from being no n-invertable(singular/degenerate)B)It speeds up gradient descent by making it require fewer iterations to get to a good solution.C)It speeds up gradient descent by making each iteration of gradient descent lessD)It is necessary to prevent the normal equation from getting stuck in local optima3.[单选题]判断和之前信息是否有用的门是A)遗忘门B)输入门C)输出门D)更新门4.[单选题]卷积函数中,参数strides的作用是()A)设置卷积核B)设置卷积步长C)设置卷积层数D)以上都不对5.[单选题]数量积(dot product; scalar product,也称为( )是接受在实数R上的两个向量并返回一个实数值标量的二元运算,它是欧几里得空间的标准内积。
麦克利兰的素质模型(Mcclelland
麦克利兰的素质模型(Mcclelland's quality model)Mcclelland has a famous iceberg model. In this model, he describes human qualities as an iceberg, which is divided into two parts: the surface of the water and the surface of the water. The part of the water is a visual feature, referring to people's knowledge and skills, and is usually susceptible to perception and measurement. The underwater part is the latent characteristic, mainly refers to the social role, the self concept, the latent characteristic, the motive and so on, this part of characteristic is more following, is not easy to excavate and the perception. Mcclelland pointed out that the best factor in predicting performance is not the qualifications, qualifications and other conditions, but the deep quality of the human being, that is, the iceberg part of the water. This metaphor seems simple, but it contains great theoretical value and practical value. It has a significant impact on management, especially human resource management. It reveals that the most important quality that affects individual performance is not what we traditionally think. In the late part of his academic career, Mcclelland focused his energies on this aspect.Mcclelland's quality research begins with the evaluation of the performance of managers. His consulting firm mcber consultancy, received a large number of enterprises and government agencies to entrust the management of staff performance evaluation. In the process of performance evaluation, he found that the traditional methods of human resources assessment exist many problems, and lack the validity of evaluation. Traditional methods of measurement mainly focus on intelligence, knowledge, ability and expertise, but these factors can not predict future job performance and personal career achievement. In addition,factors that are traditionally measured need considerable cost inputs, such as the acquisition of academic qualifications or the training of a specific skill. This can lead to discrimination against the underprivileged and the disadvantaged. For example, a career outlook with a primary education level is likely to exceed that of a Ph. D., but previous methods of measurement are unlikely to give the former a higher rating. After further study, Mcclelland's team found that influence individual performance is a fundamental quality (competency), which is similar to "achievement motivation", "interpersonal understanding, influence factors such as the team".The word "Competency" is translated as "talent" or "ability" more precisely. But in China, when it comes to talent, it's often associated with technical skills rather than linking it to competency. Therefore, this article uses the word "quality" to emphasize its "competency" nature.In 1973, Mcclelland in the "American Psychologist magazine" (American Psychologist) published the famous "measurement quality rather than intelligence" (Testing for Competency Rather Than Intelligence) article, formally proposed the "quality" concept. He cited a lot of data in the article, the irrationality of personal ability to judge only by the intelligence test, and further pointed out that it is a subjective decision that some performance such as personality, intelligence, values and other factors, in reality did not show the expected effect. Therefore, Mcclelland stressed the need to abandon the traditional theoretical assumptions and subjective judgments, adhere to the material from the firsthand, and explore the characteristics and factors that can really affect the performance of the work. He pointed out in the article, the performance is decided by some of the more fundamental more potential factors, these factors can better predict work performance in specific positions, these can distinguish between personal characteristics of "the level of performance in a specific job and organization environment, it is" quality "(competence). The competency. Quality is the potential employee characteristics, such as motivation, traits (motive) (trait), skills (skill) and self image (self-image), (social role), the social role of knowledge (knowledge) and so on. These factors determine whether work is effective and determine whether a person can produce outstanding performance. The publication of this article marks the beginning of quality research. After the American Compensation Association (The American Compensation Association) to make a further definition of quality, namely: individual work behavior to achieve the level of performance shown by these actions are observable, measurable, scalable.In order to identify and evaluate the quality, Mcclelland created the BEI Event Interview (Behavioral). This method is a combination of key events and the thematic apperception test proposed. The main steps are to identify and describe the key respondents through examples in the past six months or a year in the most successful or not, then a detailed report as follows: 1. The situation description; what are the people involved in the actual; measures; the person is what feeling to the result.The team examined the content of the events to determine what the interviewee was capable of.Behavioral event interviews were first applied to an American government project. At the beginning of 1970s, McCabe consulting firm to the U.S. government on selection of Foreign Liaison Officer (Foreign Service Information Officers, FSIO) of the task. The foreign liaison officers at the time were largely male and white, and had been subjected to very strict layers of screening. It's about their mission. The main duties of foreign liaison officer is through the library publicity, press, diplomatic and cultural activities, and people in other countries, such as speech dialogue, promote American foreign policy to build the international image of the United States, the spread of American values, in order to get more understanding of the people of his country, the elimination of all kinds of misunderstanding of the United states. In order to select the right people, the United States government has devised a test system for service officials abroad. This test is very strict requirements, the main contents are divided into three categories: IQ; degree, diploma and achievement; general humanities knowledge and related cultural background knowledge (including the corresponding professional knowledge of American history and Western culture, and political and economic English). The US government hopes to select the right candidates through such rigorous examinations, thus taking on the important mission of liaison officers abroad. Unfortunately, the test results are not satisfactory, and many closely screened liaison officers are incompetent. Therefore, the United States government commissioned Mcclelland to find and design a new way to effectively evaluate and select overseas liaison officers.Mcclelland found that the previous "expatriate service officer test" excluded all those from non mainstream cultural backgrounds because of the strong cultural background required. In addition, the evaluation of this test required is not a qualified liaison officer to the key ability, the existing test method evaluation standard is obviously biased, scholars conceived in the study, there is a certain distance with the reality. In order to find a reasonable and effective selection criteria, Mcclelland used behavioral event interview to screen the evaluation factors. From the existing official liaison he chose some people and divided them into two groups, one group was the most excellent, known as the outstanding group; another group is generally known as a competent person, for the group. Then, the research team used behavioral event interview method to sum up the differences in behavior and thinking patterns between the two groups according to the content of interviews, and then made a comparative analysis of different factors on the basis of induction. This method quickly, researchers found that the outstanding group and for group shows characteristics is different, after induction, you can see immediately, what is common to all staff, which characteristics manifested in body but not for outstanding person. That distinguished the unique characteristics, researchers in accordance with their scientific methods of classification, level division, finally obtains the characteristics of system can reflect the differences between good and mediocre, and establishes the competency model of liaison officer, for the evaluation and selection of use. Mcclelland's model of foreign liaison officers for the United States government includes such three core competencies: intercultural interpersonal sensitivity, positive expectations of others, and the ability to quicklymove into local political networks. Later facts showed that the quality of the model chosen by Mcclelland was more adequate for the job. Even today, despite repeated revisions and changes, the United States government still regards these three qualities as an important basis for the selection of liaison officers abroad.Beginning with the success of selecting foreign liaison officers, Mcclelland applied the quality model to various fields. Under his guidance, application service management consulting company began to mcber enterprises, government agencies and other professional organizations to provide quality model in the management of human resources, and gradually become the authority of application of the internationally recognized quality evaluation method. Later, the advantages of competency model in human resource management were widely recognized in different countries, especially in enterprise management. Now there are many design procedures of the management, the quality of the model is often a core module of human resource management, applied to personnel selection, training management, performance management, salary design and occupation career planning etc..In 1981, Richard Boyazis (Richard Boyatzis) put forward the onion model similar to the iceberg model on the basis of Mcclelland's research". Boya Keith through the original data about the quality of the Managers Competency of re analysis, summed up a group of outstanding managers can identify the competency factors, these factors have broad applicability, can also use different companies in different. The so-called onion model, is the quality of the quality from inside tooutside as a layer of package structure,The core is motivation, and then outward in order to expand personality, self-image and values, social roles, attitudes, knowledge, skills. The more the outer layer, the more easy to cultivate and evaluate; the more the inner layer, the more difficult to evaluate and learn. In general, the "onion" the knowledge and skills, the equivalent of "iceberg" on water; "onion" the inside motivation and personality, the equivalent of "iceberg" water under the deepest part; the "onion" in the middle of the self image and role, the equivalent of the "iceberg" water shallow part. The onion model is essentially the same as the iceberg model, with emphasis on core qualities or basic qualities. The assessment of core quality can predict a person's long-term performance. In contrast, onion model more prominent potential quality and the quality of the hierarchical relationship, more accurate than iceberg model of the relationship between the quality.After Mcclelland put forward the quality model, not only did Boya Keith deepen and develop Mcclelland's research, but also a large number of consulting organizations adopted this method. In this regard, Maikelilanbugan backward. In 1989, he began to examine and examine the competencies involved in more than 200 work in various organizations around the world. After in-depth research, he extracted from numerous competency summarized 21 general competency elements, the 21 elements of the quality of Xiang Shengren summarizes the people appeared in the daily life and behavior of knowledge and skills, social role, self-concept, personality and motivation characteristics, forming a competency model in common use the management personnel. Onthis basis, Hay-McBer, an advisory body chaired by Mcclelland, published the graded quality dictionary in 1996. (there are many kinds of names in the country, commonly called competency dictionaries). The general competency dictionary consists of 6 basic competency groups and 21 competency elements. The main contents are as follows:Mcclelland proposed the concept of competency, the human resource management of great influence, it involves all aspects of human resources management, respectively, job analysis, staff recruitment, personnel assessment, personnel training and personnel incentive provided operational method. Compared with the previous methods, competency is a kind of human resource management tool which aims at strengthening the core competitiveness and improving the long-term performance of the organization.The effective use of the quality model needs to follow certain steps. First, different types of work, quality requirements are not the same, it is necessary to determine what qualities are required for this kind of job quality. There are two basic principles for determining competency: (1) validity. The judgment of a unique standard of competency is the ability to distinguish the work performance, which means that the confirmation of the competency must be clear, can measure the difference between the excellent staff and general staff. (2) objectivity is the basis of objective data to judge whether a competent quality can distinguish work performance. Secondly, after determining the competency, the organization should establish an evaluation system that can measure the individual's competency level. The evaluation system shouldalso be tested by objective data and be able to distinguish the work performance. Finally, on the basis of accurate measurement, the specific application methods of competency evaluation results in various human resources management work are designed.In a word, Mcclelland's theory of competency model for the practice of human resource management provides a new perspective and a more powerful tool, it can not only meet the requirements of modern human resources management, construct the competency model of a job, the competency required for the job of a clear that it has become an important basis for the quality of personnel evaluation, provide scientific basis for the development of human resources management.。
From Data Mining to Knowledge Discovery in Databases
s Data mining and knowledge discovery in databases have been attracting a significant amount of research, industry, and media atten-tion of late. What is all the excitement about?This article provides an overview of this emerging field, clarifying how data mining and knowledge discovery in databases are related both to each other and to related fields, such as machine learning, statistics, and databases. The article mentions particular real-world applications, specific data-mining techniques, challenges in-volved in real-world applications of knowledge discovery, and current and future research direc-tions in the field.A cross a wide variety of fields, data arebeing collected and accumulated at adramatic pace. There is an urgent need for a new generation of computational theo-ries and tools to assist humans in extracting useful information (knowledge) from the rapidly growing volumes of digital data. These theories and tools are the subject of the emerging field of knowledge discovery in databases (KDD).At an abstract level, the KDD field is con-cerned with the development of methods and techniques for making sense of data. The basic problem addressed by the KDD process is one of mapping low-level data (which are typically too voluminous to understand and digest easi-ly) into other forms that might be more com-pact (for example, a short report), more ab-stract (for example, a descriptive approximation or model of the process that generated the data), or more useful (for exam-ple, a predictive model for estimating the val-ue of future cases). At the core of the process is the application of specific data-mining meth-ods for pattern discovery and extraction.1This article begins by discussing the histori-cal context of KDD and data mining and theirintersection with other related fields. A briefsummary of recent KDD real-world applica-tions is provided. Definitions of KDD and da-ta mining are provided, and the general mul-tistep KDD process is outlined. This multistepprocess has the application of data-mining al-gorithms as one particular step in the process.The data-mining step is discussed in more de-tail in the context of specific data-mining al-gorithms and their application. Real-worldpractical application issues are also outlined.Finally, the article enumerates challenges forfuture research and development and in par-ticular discusses potential opportunities for AItechnology in KDD systems.Why Do We Need KDD?The traditional method of turning data intoknowledge relies on manual analysis and in-terpretation. For example, in the health-careindustry, it is common for specialists to peri-odically analyze current trends and changesin health-care data, say, on a quarterly basis.The specialists then provide a report detailingthe analysis to the sponsoring health-care or-ganization; this report becomes the basis forfuture decision making and planning forhealth-care management. In a totally differ-ent type of application, planetary geologistssift through remotely sensed images of plan-ets and asteroids, carefully locating and cata-loging such geologic objects of interest as im-pact craters. Be it science, marketing, finance,health care, retail, or any other field, the clas-sical approach to data analysis relies funda-mentally on one or more analysts becomingArticlesFALL 1996 37From Data Mining to Knowledge Discovery inDatabasesUsama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth Copyright © 1996, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1996 / $2.00areas is astronomy. Here, a notable success was achieved by SKICAT ,a system used by as-tronomers to perform image analysis,classification, and cataloging of sky objects from sky-survey images (Fayyad, Djorgovski,and Weir 1996). In its first application, the system was used to process the 3 terabytes (1012bytes) of image data resulting from the Second Palomar Observatory Sky Survey,where it is estimated that on the order of 109sky objects are detectable. SKICAT can outper-form humans and traditional computational techniques in classifying faint sky objects. See Fayyad, Haussler, and Stolorz (1996) for a sur-vey of scientific applications.In business, main KDD application areas includes marketing, finance (especially in-vestment), fraud detection, manufacturing,telecommunications, and Internet agents.Marketing:In marketing, the primary ap-plication is database marketing systems,which analyze customer databases to identify different customer groups and forecast their behavior. Business Week (Berry 1994) estimat-ed that over half of all retailers are using or planning to use database marketing, and those who do use it have good results; for ex-ample, American Express reports a 10- to 15-percent increase in credit-card use. Another notable marketing application is market-bas-ket analysis (Agrawal et al. 1996) systems,which find patterns such as, “If customer bought X, he/she is also likely to buy Y and Z.” Such patterns are valuable to retailers.Investment: Numerous companies use da-ta mining for investment, but most do not describe their systems. One exception is LBS Capital Management. Its system uses expert systems, neural nets, and genetic algorithms to manage portfolios totaling $600 million;since its start in 1993, the system has outper-formed the broad stock market (Hall, Mani,and Barr 1996).Fraud detection: HNC Falcon and Nestor PRISM systems are used for monitoring credit-card fraud, watching over millions of ac-counts. The FAIS system (Senator et al. 1995),from the U.S. Treasury Financial Crimes En-forcement Network, is used to identify finan-cial transactions that might indicate money-laundering activity.Manufacturing: The CASSIOPEE trou-bleshooting system, developed as part of a joint venture between General Electric and SNECMA, was applied by three major Euro-pean airlines to diagnose and predict prob-lems for the Boeing 737. To derive families of faults, clustering methods are used. CASSIOPEE received the European first prize for innova-intimately familiar with the data and serving as an interface between the data and the users and products.For these (and many other) applications,this form of manual probing of a data set is slow, expensive, and highly subjective. In fact, as data volumes grow dramatically, this type of manual data analysis is becoming completely impractical in many domains.Databases are increasing in size in two ways:(1) the number N of records or objects in the database and (2) the number d of fields or at-tributes to an object. Databases containing on the order of N = 109objects are becoming in-creasingly common, for example, in the as-tronomical sciences. Similarly, the number of fields d can easily be on the order of 102or even 103, for example, in medical diagnostic applications. Who could be expected to di-gest millions of records, each having tens or hundreds of fields? We believe that this job is certainly not one for humans; hence, analysis work needs to be automated, at least partially.The need to scale up human analysis capa-bilities to handling the large number of bytes that we can collect is both economic and sci-entific. Businesses use data to gain competi-tive advantage, increase efficiency, and pro-vide more valuable services to customers.Data we capture about our environment are the basic evidence we use to build theories and models of the universe we live in. Be-cause computers have enabled humans to gather more data than we can digest, it is on-ly natural to turn to computational tech-niques to help us unearth meaningful pat-terns and structures from the massive volumes of data. Hence, KDD is an attempt to address a problem that the digital informa-tion era made a fact of life for all of us: data overload.Data Mining and Knowledge Discovery in the Real WorldA large degree of the current interest in KDD is the result of the media interest surrounding successful KDD applications, for example, the focus articles within the last two years in Business Week , Newsweek , Byte , PC Week , and other large-circulation periodicals. Unfortu-nately, it is not always easy to separate fact from media hype. Nonetheless, several well-documented examples of successful systems can rightly be referred to as KDD applications and have been deployed in operational use on large-scale real-world problems in science and in business.In science, one of the primary applicationThere is an urgent need for a new generation of computation-al theories and tools toassist humans in extractinguseful information (knowledge)from the rapidly growing volumes ofdigital data.Articles38AI MAGAZINEtive applications (Manago and Auriol 1996).Telecommunications: The telecommuni-cations alarm-sequence analyzer (TASA) wasbuilt in cooperation with a manufacturer oftelecommunications equipment and threetelephone networks (Mannila, Toivonen, andVerkamo 1995). The system uses a novelframework for locating frequently occurringalarm episodes from the alarm stream andpresenting them as rules. Large sets of discov-ered rules can be explored with flexible infor-mation-retrieval tools supporting interactivityand iteration. In this way, TASA offers pruning,grouping, and ordering tools to refine the re-sults of a basic brute-force search for rules.Data cleaning: The MERGE-PURGE systemwas applied to the identification of duplicatewelfare claims (Hernandez and Stolfo 1995).It was used successfully on data from the Wel-fare Department of the State of Washington.In other areas, a well-publicized system isIBM’s ADVANCED SCOUT,a specialized data-min-ing system that helps National Basketball As-sociation (NBA) coaches organize and inter-pret data from NBA games (U.S. News 1995). ADVANCED SCOUT was used by several of the NBA teams in 1996, including the Seattle Su-personics, which reached the NBA finals.Finally, a novel and increasingly importanttype of discovery is one based on the use of in-telligent agents to navigate through an infor-mation-rich environment. Although the ideaof active triggers has long been analyzed in thedatabase field, really successful applications ofthis idea appeared only with the advent of theInternet. These systems ask the user to specifya profile of interest and search for related in-formation among a wide variety of public-do-main and proprietary sources. For example, FIREFLY is a personal music-recommendation agent: It asks a user his/her opinion of several music pieces and then suggests other music that the user might like (<http:// www.ffl/>). CRAYON(/>) allows users to create their own free newspaper (supported by ads); NEWSHOUND(<http://www. /hound/>) from the San Jose Mercury News and FARCAST(</> automatically search information from a wide variety of sources, including newspapers and wire services, and e-mail rele-vant documents directly to the user.These are just a few of the numerous suchsystems that use KDD techniques to automat-ically produce useful information from largemasses of raw data. See Piatetsky-Shapiro etal. (1996) for an overview of issues in devel-oping industrial KDD applications.Data Mining and KDDHistorically, the notion of finding useful pat-terns in data has been given a variety ofnames, including data mining, knowledge ex-traction, information discovery, informationharvesting, data archaeology, and data patternprocessing. The term data mining has mostlybeen used by statisticians, data analysts, andthe management information systems (MIS)communities. It has also gained popularity inthe database field. The phrase knowledge dis-covery in databases was coined at the first KDDworkshop in 1989 (Piatetsky-Shapiro 1991) toemphasize that knowledge is the end productof a data-driven discovery. It has been popular-ized in the AI and machine-learning fields.In our view, KDD refers to the overall pro-cess of discovering useful knowledge from da-ta, and data mining refers to a particular stepin this process. Data mining is the applicationof specific algorithms for extracting patternsfrom data. The distinction between the KDDprocess and the data-mining step (within theprocess) is a central point of this article. Theadditional steps in the KDD process, such asdata preparation, data selection, data cleaning,incorporation of appropriate prior knowledge,and proper interpretation of the results ofmining, are essential to ensure that usefulknowledge is derived from the data. Blind ap-plication of data-mining methods (rightly crit-icized as data dredging in the statistical litera-ture) can be a dangerous activity, easilyleading to the discovery of meaningless andinvalid patterns.The Interdisciplinary Nature of KDDKDD has evolved, and continues to evolve,from the intersection of research fields such asmachine learning, pattern recognition,databases, statistics, AI, knowledge acquisitionfor expert systems, data visualization, andhigh-performance computing. The unifyinggoal is extracting high-level knowledge fromlow-level data in the context of large data sets.The data-mining component of KDD cur-rently relies heavily on known techniquesfrom machine learning, pattern recognition,and statistics to find patterns from data in thedata-mining step of the KDD process. A natu-ral question is, How is KDD different from pat-tern recognition or machine learning (and re-lated fields)? The answer is that these fieldsprovide some of the data-mining methodsthat are used in the data-mining step of theKDD process. KDD focuses on the overall pro-cess of knowledge discovery from data, includ-ing how the data are stored and accessed, howalgorithms can be scaled to massive data setsThe basicproblemaddressed bythe KDDprocess isone ofmappinglow-leveldata intoother formsthat might bemorecompact,moreabstract,or moreuseful.ArticlesFALL 1996 39A driving force behind KDD is the database field (the second D in KDD). Indeed, the problem of effective data manipulation when data cannot fit in the main memory is of fun-damental importance to KDD. Database tech-niques for gaining efficient data access,grouping and ordering operations when ac-cessing data, and optimizing queries consti-tute the basics for scaling algorithms to larger data sets. Most data-mining algorithms from statistics, pattern recognition, and machine learning assume data are in the main memo-ry and pay no attention to how the algorithm breaks down if only limited views of the data are possible.A related field evolving from databases is data warehousing,which refers to the popular business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. Data warehousing helps set the stage for KDD in two important ways: (1) data cleaning and (2)data access.Data cleaning: As organizations are forced to think about a unified logical view of the wide variety of data and databases they pos-sess, they have to address the issues of map-ping data to a single naming convention,uniformly representing and handling missing data, and handling noise and errors when possible.Data access: Uniform and well-defined methods must be created for accessing the da-ta and providing access paths to data that were historically difficult to get to (for exam-ple, stored offline).Once organizations and individuals have solved the problem of how to store and ac-cess their data, the natural next step is the question, What else do we do with all the da-ta? This is where opportunities for KDD natu-rally arise.A popular approach for analysis of data warehouses is called online analytical processing (OLAP), named for a set of principles pro-posed by Codd (1993). OLAP tools focus on providing multidimensional data analysis,which is superior to SQL in computing sum-maries and breakdowns along many dimen-sions. OLAP tools are targeted toward simpli-fying and supporting interactive data analysis,but the goal of KDD tools is to automate as much of the process as possible. Thus, KDD is a step beyond what is currently supported by most standard database systems.Basic DefinitionsKDD is the nontrivial process of identifying valid, novel, potentially useful, and ultimate-and still run efficiently, how results can be in-terpreted and visualized, and how the overall man-machine interaction can usefully be modeled and supported. The KDD process can be viewed as a multidisciplinary activity that encompasses techniques beyond the scope of any one particular discipline such as machine learning. In this context, there are clear opportunities for other fields of AI (be-sides machine learning) to contribute to KDD. KDD places a special emphasis on find-ing understandable patterns that can be inter-preted as useful or interesting knowledge.Thus, for example, neural networks, although a powerful modeling tool, are relatively difficult to understand compared to decision trees. KDD also emphasizes scaling and ro-bustness properties of modeling algorithms for large noisy data sets.Related AI research fields include machine discovery, which targets the discovery of em-pirical laws from observation and experimen-tation (Shrager and Langley 1990) (see Kloes-gen and Zytkow [1996] for a glossary of terms common to KDD and machine discovery),and causal modeling for the inference of causal models from data (Spirtes, Glymour,and Scheines 1993). Statistics in particular has much in common with KDD (see Elder and Pregibon [1996] and Glymour et al.[1996] for a more detailed discussion of this synergy). Knowledge discovery from data is fundamentally a statistical endeavor. Statistics provides a language and framework for quan-tifying the uncertainty that results when one tries to infer general patterns from a particu-lar sample of an overall population. As men-tioned earlier, the term data mining has had negative connotations in statistics since the 1960s when computer-based data analysis techniques were first introduced. The concern arose because if one searches long enough in any data set (even randomly generated data),one can find patterns that appear to be statis-tically significant but, in fact, are not. Clearly,this issue is of fundamental importance to KDD. Substantial progress has been made in recent years in understanding such issues in statistics. Much of this work is of direct rele-vance to KDD. Thus, data mining is a legiti-mate activity as long as one understands how to do it correctly; data mining carried out poorly (without regard to the statistical as-pects of the problem) is to be avoided. KDD can also be viewed as encompassing a broader view of modeling than statistics. KDD aims to provide tools to automate (to the degree pos-sible) the entire process of data analysis and the statistician’s “art” of hypothesis selection.Data mining is a step in the KDD process that consists of ap-plying data analysis and discovery al-gorithms that produce a par-ticular enu-meration ofpatterns (or models)over the data.Articles40AI MAGAZINEly understandable patterns in data (Fayyad, Piatetsky-Shapiro, and Smyth 1996).Here, data are a set of facts (for example, cases in a database), and pattern is an expres-sion in some language describing a subset of the data or a model applicable to the subset. Hence, in our usage here, extracting a pattern also designates fitting a model to data; find-ing structure from data; or, in general, mak-ing any high-level description of a set of data. The term process implies that KDD comprises many steps, which involve data preparation, search for patterns, knowledge evaluation, and refinement, all repeated in multiple itera-tions. By nontrivial, we mean that some search or inference is involved; that is, it is not a straightforward computation of predefined quantities like computing the av-erage value of a set of numbers.The discovered patterns should be valid on new data with some degree of certainty. We also want patterns to be novel (at least to the system and preferably to the user) and poten-tially useful, that is, lead to some benefit to the user or task. Finally, the patterns should be understandable, if not immediately then after some postprocessing.The previous discussion implies that we can define quantitative measures for evaluating extracted patterns. In many cases, it is possi-ble to define measures of certainty (for exam-ple, estimated prediction accuracy on new data) or utility (for example, gain, perhaps indollars saved because of better predictions orspeedup in response time of a system). No-tions such as novelty and understandabilityare much more subjective. In certain contexts,understandability can be estimated by sim-plicity (for example, the number of bits to de-scribe a pattern). An important notion, calledinterestingness(for example, see Silberschatzand Tuzhilin [1995] and Piatetsky-Shapiro andMatheus [1994]), is usually taken as an overallmeasure of pattern value, combining validity,novelty, usefulness, and simplicity. Interest-ingness functions can be defined explicitly orcan be manifested implicitly through an or-dering placed by the KDD system on the dis-covered patterns or models.Given these notions, we can consider apattern to be knowledge if it exceeds some in-terestingness threshold, which is by nomeans an attempt to define knowledge in thephilosophical or even the popular view. As amatter of fact, knowledge in this definition ispurely user oriented and domain specific andis determined by whatever functions andthresholds the user chooses.Data mining is a step in the KDD processthat consists of applying data analysis anddiscovery algorithms that, under acceptablecomputational efficiency limitations, pro-duce a particular enumeration of patterns (ormodels) over the data. Note that the space ofArticlesFALL 1996 41Figure 1. An Overview of the Steps That Compose the KDD Process.methods, the effective number of variables under consideration can be reduced, or in-variant representations for the data can be found.Fifth is matching the goals of the KDD pro-cess (step 1) to a particular data-mining method. For example, summarization, clas-sification, regression, clustering, and so on,are described later as well as in Fayyad, Piatet-sky-Shapiro, and Smyth (1996).Sixth is exploratory analysis and model and hypothesis selection: choosing the data-mining algorithm(s) and selecting method(s)to be used for searching for data patterns.This process includes deciding which models and parameters might be appropriate (for ex-ample, models of categorical data are differ-ent than models of vectors over the reals) and matching a particular data-mining method with the overall criteria of the KDD process (for example, the end user might be more in-terested in understanding the model than its predictive capabilities).Seventh is data mining: searching for pat-terns of interest in a particular representa-tional form or a set of such representations,including classification rules or trees, regres-sion, and clustering. The user can significant-ly aid the data-mining method by correctly performing the preceding steps.Eighth is interpreting mined patterns, pos-sibly returning to any of steps 1 through 7 for further iteration. This step can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models.Ninth is acting on the discovered knowl-edge: using the knowledge directly, incorpo-rating the knowledge into another system for further action, or simply documenting it and reporting it to interested parties. This process also includes checking for and resolving po-tential conflicts with previously believed (or extracted) knowledge.The KDD process can involve significant iteration and can contain loops between any two steps. The basic flow of steps (al-though not the potential multitude of itera-tions and loops) is illustrated in figure 1.Most previous work on KDD has focused on step 7, the data mining. However, the other steps are as important (and probably more so) for the successful application of KDD in practice. Having defined the basic notions and introduced the KDD process, we now focus on the data-mining component,which has, by far, received the most atten-tion in the literature.patterns is often infinite, and the enumera-tion of patterns involves some form of search in this space. Practical computational constraints place severe limits on the sub-space that can be explored by a data-mining algorithm.The KDD process involves using the database along with any required selection,preprocessing, subsampling, and transforma-tions of it; applying data-mining methods (algorithms) to enumerate patterns from it;and evaluating the products of data mining to identify the subset of the enumerated pat-terns deemed knowledge. The data-mining component of the KDD process is concerned with the algorithmic means by which pat-terns are extracted and enumerated from da-ta. The overall KDD process (figure 1) in-cludes the evaluation and possible interpretation of the mined patterns to de-termine which patterns can be considered new knowledge. The KDD process also in-cludes all the additional steps described in the next section.The notion of an overall user-driven pro-cess is not unique to KDD: analogous propos-als have been put forward both in statistics (Hand 1994) and in machine learning (Brod-ley and Smyth 1996).The KDD ProcessThe KDD process is interactive and iterative,involving numerous steps with many deci-sions made by the user. Brachman and Anand (1996) give a practical view of the KDD pro-cess, emphasizing the interactive nature of the process. Here, we broadly outline some of its basic steps:First is developing an understanding of the application domain and the relevant prior knowledge and identifying the goal of the KDD process from the customer’s viewpoint.Second is creating a target data set: select-ing a data set, or focusing on a subset of vari-ables or data samples, on which discovery is to be performed.Third is data cleaning and preprocessing.Basic operations include removing noise if appropriate, collecting the necessary informa-tion to model or account for noise, deciding on strategies for handling missing data fields,and accounting for time-sequence informa-tion and known changes.Fourth is data reduction and projection:finding useful features to represent the data depending on the goal of the task. With di-mensionality reduction or transformationArticles42AI MAGAZINEThe Data-Mining Stepof the KDD ProcessThe data-mining component of the KDD pro-cess often involves repeated iterative applica-tion of particular data-mining methods. This section presents an overview of the primary goals of data mining, a description of the methods used to address these goals, and a brief description of the data-mining algo-rithms that incorporate these methods.The knowledge discovery goals are defined by the intended use of the system. We can distinguish two types of goals: (1) verification and (2) discovery. With verification,the sys-tem is limited to verifying the user’s hypothe-sis. With discovery,the system autonomously finds new patterns. We further subdivide the discovery goal into prediction,where the sys-tem finds patterns for predicting the future behavior of some entities, and description, where the system finds patterns for presenta-tion to a user in a human-understandableform. In this article, we are primarily con-cerned with discovery-oriented data mining.Data mining involves fitting models to, or determining patterns from, observed data. The fitted models play the role of inferred knowledge: Whether the models reflect useful or interesting knowledge is part of the over-all, interactive KDD process where subjective human judgment is typically required. Two primary mathematical formalisms are used in model fitting: (1) statistical and (2) logical. The statistical approach allows for nondeter-ministic effects in the model, whereas a logi-cal model is purely deterministic. We focus primarily on the statistical approach to data mining, which tends to be the most widely used basis for practical data-mining applica-tions given the typical presence of uncertain-ty in real-world data-generating processes.Most data-mining methods are based on tried and tested techniques from machine learning, pattern recognition, and statistics: classification, clustering, regression, and so on. The array of different algorithms under each of these headings can often be bewilder-ing to both the novice and the experienced data analyst. It should be emphasized that of the many data-mining methods advertised in the literature, there are really only a few fun-damental techniques. The actual underlying model representation being used by a particu-lar method typically comes from a composi-tion of a small number of well-known op-tions: polynomials, splines, kernel and basis functions, threshold-Boolean functions, and so on. Thus, algorithms tend to differ primar-ily in the goodness-of-fit criterion used toevaluate model fit or in the search methodused to find a good fit.In our brief overview of data-mining meth-ods, we try in particular to convey the notionthat most (if not all) methods can be viewedas extensions or hybrids of a few basic tech-niques and principles. We first discuss the pri-mary methods of data mining and then showthat the data- mining methods can be viewedas consisting of three primary algorithmiccomponents: (1) model representation, (2)model evaluation, and (3) search. In the dis-cussion of KDD and data-mining methods,we use a simple example to make some of thenotions more concrete. Figure 2 shows a sim-ple two-dimensional artificial data set consist-ing of 23 cases. Each point on the graph rep-resents a person who has been given a loanby a particular bank at some time in the past.The horizontal axis represents the income ofthe person; the vertical axis represents the to-tal personal debt of the person (mortgage, carpayments, and so on). The data have beenclassified into two classes: (1) the x’s repre-sent persons who have defaulted on theirloans and (2) the o’s represent persons whoseloans are in good status with the bank. Thus,this simple artificial data set could represent ahistorical data set that can contain usefulknowledge from the point of view of thebank making the loans. Note that in actualKDD applications, there are typically manymore dimensions (as many as several hun-dreds) and many more data points (manythousands or even millions).ArticlesFALL 1996 43Figure 2. A Simple Data Set with Two Classes Used for Illustrative Purposes.。
签名与网络编码
Shweta Agrawal1 Dan Boneh2
1 2
Xavier Boyen2
and David Freeman3 †
University of Texas at Austin, shweta.a@. Stanford University, {dabo,xb}@ 3 CWI and Universiteit Leiden, freeman@cwi.nl
1
Introduction
Network coding [2, 17] is an elegant technique that replaces the traditional “store and forward” paradigm of network routing by a method that allows routers to transform the received data before re-transmission. It has been established that for certain classes of networks, random linear coding is sufficient to improve throughput [11]. In addition, linear network codes offer robustness and adaptability and have many practical applications (in wireless and sensor networks, for example) [10]. Due to these advantages, network coding has become very popular. On the other hand, networks using network coding are exposed to problems that traditional networks do not face. A particularly important instance of this is the pollution problem: if some routers in the network are malicious and forward invalid combinations of received packets, then these invalid packets get mixed with valid packets downstream and quickly pollute the whole network. In addition, the receiver who obtains multiple packets has no way of ascertaining which of these are valid and should be used for decoding. Indeed, using even one invalid packet during the decodin decoded wrongly. For a detailed discussion of pollution attacks, we refer the reader to [4, 19, 12]. To prevent the network from being flooded with invalid packets, it is desirable to have “hopby-hop containment.” This means that even if a bad packet gets injected into the network, it is detected and discarded at the very first hop. Thus, it can be dropped before it is combined with any other packets, preventing its pollution from spreading. Hop-by-hop containment cannot be achieved by standard signatures or MACs. As pointed out in [1], signing the message packets does not help since recipients do not have the original message packets and therefore cannot verify the signature. Nor does signing the entire message prior to transmission work, because it forces the recipient to decode exponentially many subsets of received packets to find a decoded message with a consistent signature. Thus, new integrity mechanisms are needed to mitigate pollution attacks.
prediction-error method
prediction-error methodThe prediction-error method, also known as the prediction error method, is a statistical technique used in various fields such as signal processing, machine learning, and econometrics. It is primarily used for model selection, parameter estimation, and prediction.The basic idea behind the prediction-error method is to compare the prediction errors of different models to determine which model performs the best. This is done by training multiple models on the same dataset and then evaluating their performance on a separate test dataset or through cross-validation.Here's a general outline of how the prediction-error method works:1. **Model Training**: Multiple models are trained on the dataset, using various algorithms or techniques.2. **Prediction Error Calculation**: For each model, the prediction error is calculated. This is the difference between the predicted values and the actual values of the output variable.3. **Model Comparison**: The models are compared based on their prediction errors. The model with the smallest prediction error is considered the best performing model.4. **Model Selection**: The model with the lowest prediction error is selected as the final model for prediction or further analysis.The prediction-error method is particularly useful when the goal is to find a model that is not only accurate but also parsimonious, meaning it has a relatively simple structure with fewer parameters. By focusing on minimizing prediction error, this method aims to find a balance between model complexity and performance.It's important to note that the prediction-error method is just one approach to model selection and evaluation. Other factors such as interpretability, computational efficiency, and domain-specific requirements may also influence the choice of model.。
ACM-GIS%202006-A%20Peer-to-Peer%20Spatial%20Cloaking%20Algorithm%20for%20Anonymous%20Location-based%
A Peer-to-Peer Spatial Cloaking Algorithm for AnonymousLocation-based Services∗Chi-Yin Chow Department of Computer Science and Engineering University of Minnesota Minneapolis,MN cchow@ Mohamed F.MokbelDepartment of ComputerScience and EngineeringUniversity of MinnesotaMinneapolis,MNmokbel@Xuan LiuIBM Thomas J.WatsonResearch CenterHawthorne,NYxuanliu@ABSTRACTThis paper tackles a major privacy threat in current location-based services where users have to report their ex-act locations to the database server in order to obtain their desired services.For example,a mobile user asking about her nearest restaurant has to report her exact location.With untrusted service providers,reporting private location in-formation may lead to several privacy threats.In this pa-per,we present a peer-to-peer(P2P)spatial cloaking algo-rithm in which mobile and stationary users can entertain location-based services without revealing their exact loca-tion information.The main idea is that before requesting any location-based service,the mobile user will form a group from her peers via single-hop communication and/or multi-hop routing.Then,the spatial cloaked area is computed as the region that covers the entire group of peers.Two modes of operations are supported within the proposed P2P spa-tial cloaking algorithm,namely,the on-demand mode and the proactive mode.Experimental results show that the P2P spatial cloaking algorithm operated in the on-demand mode has lower communication cost and better quality of services than the proactive mode,but the on-demand incurs longer response time.Categories and Subject Descriptors:H.2.8[Database Applications]:Spatial databases and GISGeneral Terms:Algorithms and Experimentation. Keywords:Mobile computing,location-based services,lo-cation privacy and spatial cloaking.1.INTRODUCTIONThe emergence of state-of-the-art location-detection de-vices,e.g.,cellular phones,global positioning system(GPS) devices,and radio-frequency identification(RFID)chips re-sults in a location-dependent information access paradigm,∗This work is supported in part by the Grants-in-Aid of Re-search,Artistry,and Scholarship,University of Minnesota. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior specific permission and/or a fee.ACM-GIS’06,November10-11,2006,Arlington,Virginia,USA. Copyright2006ACM1-59593-529-0/06/0011...$5.00.known as location-based services(LBS)[30].In LBS,mobile users have the ability to issue location-based queries to the location-based database server.Examples of such queries include“where is my nearest gas station”,“what are the restaurants within one mile of my location”,and“what is the traffic condition within ten minutes of my route”.To get the precise answer of these queries,the user has to pro-vide her exact location information to the database server. With untrustworthy servers,adversaries may access sensi-tive information about specific individuals based on their location information and issued queries.For example,an adversary may check a user’s habit and interest by knowing the places she visits and the time of each visit,or someone can track the locations of his ex-friends.In fact,in many cases,GPS devices have been used in stalking personal lo-cations[12,39].To tackle this major privacy concern,three centralized privacy-preserving frameworks are proposed for LBS[13,14,31],in which a trusted third party is used as a middleware to blur user locations into spatial regions to achieve k-anonymity,i.e.,a user is indistinguishable among other k−1users.The centralized privacy-preserving frame-work possesses the following shortcomings:1)The central-ized trusted third party could be the system bottleneck or single point of failure.2)Since the centralized third party has the complete knowledge of the location information and queries of all users,it may pose a serious privacy threat when the third party is attacked by adversaries.In this paper,we propose a peer-to-peer(P2P)spatial cloaking algorithm.Mobile users adopting the P2P spatial cloaking algorithm can protect their privacy without seeking help from any centralized third party.Other than the short-comings of the centralized approach,our work is also moti-vated by the following facts:1)The computation power and storage capacity of most mobile devices have been improv-ing at a fast pace.2)P2P communication technologies,such as IEEE802.11and Bluetooth,have been widely deployed.3)Many new applications based on P2P information shar-ing have rapidly taken shape,e.g.,cooperative information access[9,32]and P2P spatio-temporal query processing[20, 24].Figure1gives an illustrative example of P2P spatial cloak-ing.The mobile user A wants tofind her nearest gas station while beingfive anonymous,i.e.,the user is indistinguish-able amongfive users.Thus,the mobile user A has to look around andfind other four peers to collaborate as a group. In this example,the four peers are B,C,D,and E.Then, the mobile user A cloaks her exact location into a spatialA B CDEBase Stationregion that covers the entire group of mobile users A ,B ,C ,D ,and E .The mobile user A randomly selects one of the mobile users within the group as an agent .In the ex-ample given in Figure 1,the mobile user D is selected as an agent.Then,the mobile user A sends her query (i.e.,what is the nearest gas station)along with her cloaked spa-tial region to the agent.The agent forwards the query to the location-based database server through a base station.Since the location-based database server processes the query based on the cloaked spatial region,it can only give a list of candidate answers that includes the actual answers and some false positives.After the agent receives the candidate answers,it forwards the candidate answers to the mobile user A .Finally,the mobile user A gets the actual answer by filtering out all the false positives.The proposed P2P spatial cloaking algorithm can operate in two modes:on-demand and proactive .In the on-demand mode,mobile clients execute the cloaking algorithm when they need to access information from the location-based database server.On the other side,in the proactive mode,mobile clients periodically look around to find the desired number of peers.Thus,they can cloak their exact locations into spatial regions whenever they want to retrieve informa-tion from the location-based database server.In general,the contributions of this paper can be summarized as follows:1.We introduce a distributed system architecture for pro-viding anonymous location-based services (LBS)for mobile users.2.We propose the first P2P spatial cloaking algorithm for mobile users to entertain high quality location-based services without compromising their privacy.3.We provide experimental evidence that our proposed algorithm is efficient in terms of the response time,is scalable to large numbers of mobile clients,and is effective as it provides high-quality services for mobile clients without the need of exact location information.The rest of this paper is organized as follows.Section 2highlights the related work.The system model of the P2P spatial cloaking algorithm is presented in Section 3.The P2P spatial cloaking algorithm is described in Section 4.Section 5discusses the integration of the P2P spatial cloak-ing algorithm with privacy-aware location-based database servers.Section 6depicts the experimental evaluation of the P2P spatial cloaking algorithm.Finally,Section 7con-cludes this paper.2.RELATED WORKThe k -anonymity model [37,38]has been widely used in maintaining privacy in databases [5,26,27,28].The main idea is to have each tuple in the table as k -anonymous,i.e.,indistinguishable among other k −1tuples.Although we aim for the similar k -anonymity model for the P2P spatial cloaking algorithm,none of these techniques can be applied to protect user privacy for LBS,mainly for the following four reasons:1)These techniques preserve the privacy of the stored data.In our model,we aim not to store the data at all.Instead,we store perturbed versions of the data.Thus,data privacy is managed before storing the data.2)These approaches protect the data not the queries.In anonymous LBS,we aim to protect the user who issues the query to the location-based database server.For example,a mobile user who wants to ask about her nearest gas station needs to pro-tect her location while the location information of the gas station is not protected.3)These approaches guarantee the k -anonymity for a snapshot of the database.In LBS,the user location is continuously changing.Such dynamic be-havior calls for continuous maintenance of the k -anonymity model.(4)These approaches assume a unified k -anonymity requirement for all the stored records.In our P2P spatial cloaking algorithm,k -anonymity is a user-specified privacy requirement which may have a different value for each user.Motivated by the privacy threats of location-detection de-vices [1,4,6,40],several research efforts are dedicated to protect the locations of mobile users (e.g.,false dummies [23],landmark objects [18],and location perturbation [10,13,14]).The most closed approaches to ours are two centralized spatial cloaking algorithms,namely,the spatio-temporal cloaking [14]and the CliqueCloak algorithm [13],and one decentralized privacy-preserving algorithm [23].The spatio-temporal cloaking algorithm [14]assumes that all users have the same k -anonymity requirements.Furthermore,it lacks the scalability because it deals with each single request of each user individually.The CliqueCloak algorithm [13]as-sumes a different k -anonymity requirement for each user.However,since it has large computation overhead,it is lim-ited to a small k -anonymity requirement,i.e.,k is from 5to 10.A decentralized privacy-preserving algorithm is proposed for LBS [23].The main idea is that the mobile client sends a set of false locations,called dummies ,along with its true location to the location-based database server.However,the disadvantages of using dummies are threefold.First,the user has to generate realistic dummies to pre-vent the adversary from guessing its true location.Second,the location-based database server wastes a lot of resources to process the dummies.Finally,the adversary may esti-mate the user location by using cellular positioning tech-niques [34],e.g.,the time-of-arrival (TOA),the time differ-ence of arrival (TDOA)and the direction of arrival (DOA).Although several existing distributed group formation al-gorithms can be used to find peers in a mobile environment,they are not designed for privacy preserving in LBS.Some algorithms are limited to only finding the neighboring peers,e.g.,lowest-ID [11],largest-connectivity (degree)[33]and mobility-based clustering algorithms [2,25].When a mo-bile user with a strict privacy requirement,i.e.,the value of k −1is larger than the number of neighboring peers,it has to enlist other peers for help via multi-hop routing.Other algorithms do not have this limitation,but they are designed for grouping stable mobile clients together to facil-Location-based Database ServerDatabase ServerDatabase ServerFigure 2:The system architectureitate efficient data replica allocation,e.g.,dynamic connec-tivity based group algorithm [16]and mobility-based clus-tering algorithm,called DRAM [19].Our work is different from these approaches in that we propose a P2P spatial cloaking algorithm that is dedicated for mobile users to dis-cover other k −1peers via single-hop communication and/or via multi-hop routing,in order to preserve user privacy in LBS.3.SYSTEM MODELFigure 2depicts the system architecture for the pro-posed P2P spatial cloaking algorithm which contains two main components:mobile clients and location-based data-base server .Each mobile client has its own privacy profile that specifies its desired level of privacy.A privacy profile includes two parameters,k and A min ,k indicates that the user wants to be k -anonymous,i.e.,indistinguishable among k users,while A min specifies the minimum resolution of the cloaked spatial region.The larger the value of k and A min ,the more strict privacy requirements a user needs.Mobile users have the ability to change their privacy profile at any time.Our employed privacy profile matches the privacy re-quirements of mobiles users as depicted by several social science studies (e.g.,see [4,15,17,22,29]).In this architecture,each mobile user is equipped with two wireless network interface cards;one of them is dedicated to communicate with the location-based database server through the base station,while the other one is devoted to the communication with other peers.A similar multi-interface technique has been used to implement IP multi-homing for stream control transmission protocol (SCTP),in which a machine is installed with multiple network in-terface cards,and each assigned a different IP address [36].Similarly,in mobile P2P cooperation environment,mobile users have a network connection to access information from the server,e.g.,through a wireless modem or a base station,and the mobile users also have the ability to communicate with other peers via a wireless LAN,e.g.,IEEE 802.11or Bluetooth [9,24,32].Furthermore,each mobile client is equipped with a positioning device, e.g.,GPS or sensor-based local positioning systems,to determine its current lo-cation information.4.P2P SPATIAL CLOAKINGIn this section,we present the data structure and the P2P spatial cloaking algorithm.Then,we describe two operation modes of the algorithm:on-demand and proactive .4.1Data StructureThe entire system area is divided into grid.The mobile client communicates with each other to discover other k −1peers,in order to achieve the k -anonymity requirement.TheAlgorithm 1P2P Spatial Cloaking:Request Originator m 1:Function P2PCloaking-Originator (h ,k )2://Phase 1:Peer searching phase 3:The hop distance h is set to h4:The set of discovered peers T is set to {∅},and the number ofdiscovered peers k =|T |=05:while k <k −1do6:Broadcast a FORM GROUP request with the parameter h (Al-gorithm 2gives the response of each peer p that receives this request)7:T is the set of peers that respond back to m by executingAlgorithm 28:k =|T |;9:if k <k −1then 10:if T =T then 11:Suspend the request 12:end if 13:h ←h +1;14:T ←T ;15:end if 16:end while17://Phase 2:Location adjustment phase 18:for all T i ∈T do19:|mT i .p |←the greatest possible distance between m and T i .pby considering the timestamp of T i .p ’s reply and maximum speed20:end for21://Phase 3:Spatial cloaking phase22:Form a group with k −1peers having the smallest |mp |23:h ←the largest hop distance h p of the selected k −1peers 24:Determine a grid area A that covers the entire group 25:if A <A min then26:Extend the area of A till it covers A min 27:end if28:Randomly select a mobile client of the group as an agent 29:Forward the query and A to the agentmobile client can thus blur its exact location into a cloaked spatial region that is the minimum grid area covering the k −1peers and itself,and satisfies A min as well.The grid area is represented by the ID of the left-bottom and right-top cells,i.e.,(l,b )and (r,t ).In addition,each mobile client maintains a parameter h that is the required hop distance of the last peer searching.The initial value of h is equal to one.4.2AlgorithmFigure 3gives a running example for the P2P spatial cloaking algorithm.There are 15mobile clients,m 1to m 15,represented as solid circles.m 8is the request originator,other black circles represent the mobile clients received the request from m 8.The dotted circles represent the commu-nication range of the mobile client,and the arrow represents the movement direction.Algorithms 1and 2give the pseudo code for the request originator (denoted as m )and the re-quest receivers (denoted as p ),respectively.In general,the algorithm consists of the following three phases:Phase 1:Peer searching phase .The request origina-tor m wants to retrieve information from the location-based database server.m first sets h to h ,a set of discovered peers T to {∅}and the number of discovered peers k to zero,i.e.,|T |.(Lines 3to 4in Algorithm 1).Then,m broadcasts a FORM GROUP request along with a message sequence ID and the hop distance h to its neighboring peers (Line 6in Algorithm 1).m listens to the network and waits for the reply from its neighboring peers.Algorithm 2describes how a peer p responds to the FORM GROUP request along with a hop distance h and aFigure3:P2P spatial cloaking algorithm.Algorithm2P2P Spatial Cloaking:Request Receiver p1:Function P2PCloaking-Receiver(h)2://Let r be the request forwarder3:if the request is duplicate then4:Reply r with an ACK message5:return;6:end if7:h p←1;8:if h=1then9:Send the tuple T=<p,(x p,y p),v maxp ,t p,h p>to r10:else11:h←h−1;12:Broadcast a FORM GROUP request with the parameter h 13:T p is the set of peers that respond back to p14:for all T i∈T p do15:T i.h p←T i.h p+1;16:end for17:T p←T p∪{<p,(x p,y p),v maxp ,t p,h p>};18:Send T p back to r19:end ifmessage sequence ID from another peer(denoted as r)that is either the request originator or the forwarder of the re-quest.First,p checks if it is a duplicate request based on the message sequence ID.If it is a duplicate request,it sim-ply replies r with an ACK message without processing the request.Otherwise,p processes the request based on the value of h:Case1:h= 1.p turns in a tuple that contains its ID,current location,maximum movement speed,a timestamp and a hop distance(it is set to one),i.e.,< p,(x p,y p),v max p,t p,h p>,to r(Line9in Algorithm2). Case2:h> 1.p decrements h and broadcasts the FORM GROUP request with the updated h and the origi-nal message sequence ID to its neighboring peers.p keeps listening to the network,until it collects the replies from all its neighboring peers.After that,p increments the h p of each collected tuple,and then it appends its own tuple to the collected tuples T p.Finally,it sends T p back to r (Lines11to18in Algorithm2).After m collects the tuples T from its neighboring peers, if m cannotfind other k−1peers with a hop distance of h,it increments h and re-broadcasts the FORM GROUP request along with a new message sequence ID and h.m repeatedly increments h till itfinds other k−1peers(Lines6to14in Algorithm1).However,if mfinds the same set of peers in two consecutive broadcasts,i.e.,with hop distances h and h+1,there are not enough connected peers for m.Thus, m has to relax its privacy profile,i.e.,use a smaller value of k,or to be suspended for a period of time(Line11in Algorithm1).Figures3(a)and3(b)depict single-hop and multi-hop peer searching in our running example,respectively.In Fig-ure3(a),the request originator,m8,(e.g.,k=5)canfind k−1peers via single-hop communication,so m8sets h=1. Since h=1,its neighboring peers,m5,m6,m7,m9,m10, and m11,will not further broadcast the FORM GROUP re-quest.On the other hand,in Figure3(b),m8does not connect to k−1peers directly,so it has to set h>1.Thus, its neighboring peers,m7,m10,and m11,will broadcast the FORM GROUP request along with a decremented hop dis-tance,i.e.,h=h−1,and the original message sequence ID to their neighboring peers.Phase2:Location adjustment phase.Since the peer keeps moving,we have to capture the movement between the time when the peer sends its tuple and the current time. For each received tuple from a peer p,the request originator, m,determines the greatest possible distance between them by an equation,|mp |=|mp|+(t c−t p)×v max p,where |mp|is the Euclidean distance between m and p at time t p,i.e.,|mp|=(x m−x p)2+(y m−y p)2,t c is the currenttime,t p is the timestamp of the tuple and v maxpis the maximum speed of p(Lines18to20in Algorithm1).In this paper,a conservative approach is used to determine the distance,because we assume that the peer will move with the maximum speed in any direction.If p gives its movement direction,m has the ability to determine a more precise distance between them.Figure3(c)illustrates that,for each discovered peer,the circle represents the largest region where the peer can lo-cate at time t c.The greatest possible distance between the request originator m8and its discovered peer,m5,m6,m7, m9,m10,or m11is represented by a dotted line.For exam-ple,the distance of the line m8m 11is the greatest possible distance between m8and m11at time t c,i.e.,|m8m 11|. Phase3:Spatial cloaking phase.In this phase,the request originator,m,forms a virtual group with the k−1 nearest peers,based on the greatest possible distance be-tween them(Line22in Algorithm1).To adapt to the dynamic network topology and k-anonymity requirement, m sets h to the largest value of h p of the selected k−1 peers(Line15in Algorithm1).Then,m determines the minimum grid area A covering the entire group(Line24in Algorithm1).If the area of A is less than A min,m extends A,until it satisfies A min(Lines25to27in Algorithm1). Figure3(c)gives the k−1nearest peers,m6,m7,m10,and m11to the request originator,m8.For example,the privacy profile of m8is(k=5,A min=20cells),and the required cloaked spatial region of m8is represented by a bold rectan-gle,as depicted in Figure3(d).To issue the query to the location-based database server anonymously,m randomly selects a mobile client in the group as an agent(Line28in Algorithm1).Then,m sendsthe query along with the cloaked spatial region,i.e.,A,to the agent(Line29in Algorithm1).The agent forwards thequery to the location-based database server.After the serverprocesses the query with respect to the cloaked spatial re-gion,it sends a list of candidate answers back to the agent.The agent forwards the candidate answer to m,and then mfilters out the false positives from the candidate answers. 4.3Modes of OperationsThe P2P spatial cloaking algorithm can operate in twomodes,on-demand and proactive.The on-demand mode:The mobile client only executesthe algorithm when it needs to retrieve information from the location-based database server.The algorithm operatedin the on-demand mode generally incurs less communica-tion overhead than the proactive mode,because the mobileclient only executes the algorithm when necessary.However,it suffers from a longer response time than the algorithm op-erated in the proactive mode.The proactive mode:The mobile client adopting theproactive mode periodically executes the algorithm in back-ground.The mobile client can cloak its location into a spa-tial region immediately,once it wants to communicate withthe location-based database server.The proactive mode pro-vides a better response time than the on-demand mode,but it generally incurs higher communication overhead and giveslower quality of service than the on-demand mode.5.ANONYMOUS LOCATION-BASEDSERVICESHaving the spatial cloaked region as an output form Algo-rithm1,the mobile user m sends her request to the location-based server through an agent p that is randomly selected.Existing location-based database servers can support onlyexact point locations rather than cloaked regions.In or-der to be able to work with a spatial region,location-basedservers need to be equipped with a privacy-aware queryprocessor(e.g.,see[29,31]).The main idea of the privacy-aware query processor is to return a list of candidate answerrather than the exact query answer.Then,the mobile user m willfilter the candidate list to eliminate its false positives andfind its exact answer.The tighter the spatial cloaked re-gion,the lower is the size of the candidate answer,and hencethe better is the performance of the privacy-aware query processor.However,tight cloaked regions may represent re-laxed privacy constrained.Thus,a trade-offbetween the user privacy and the quality of service can be achieved[31]. Figure4(a)depicts such scenario by showing the data stored at the server side.There are32target objects,i.e., gas stations,T1to T32represented as black circles,the shaded area represents the spatial cloaked area of the mo-bile client who issued the query.For clarification,the actual mobile client location is plotted in Figure4(a)as a black square inside the cloaked area.However,such information is neither stored at the server side nor revealed to the server. The privacy-aware query processor determines a range that includes all target objects that are possibly contributing to the answer given that the actual location of the mobile client could be anywhere within the shaded area.The range is rep-resented as a bold rectangle,as depicted in Figure4(b).The server sends a list of candidate answers,i.e.,T8,T12,T13, T16,T17,T21,and T22,back to the agent.The agent next for-(a)Server Side(b)Client SideFigure4:Anonymous location-based services wards the candidate answers to the requesting mobile client either through single-hop communication or through multi-hop routing.Finally,the mobile client can get the actualanswer,i.e.,T13,byfiltering out the false positives from thecandidate answers.The algorithmic details of the privacy-aware query proces-sor is beyond the scope of this paper.Interested readers are referred to[31]for more details.6.EXPERIMENTAL RESULTSIn this section,we evaluate and compare the scalabilityand efficiency of the P2P spatial cloaking algorithm in boththe on-demand and proactive modes with respect to the av-erage response time per query,the average number of mes-sages per query,and the size of the returned candidate an-swers from the location-based database server.The queryresponse time in the on-demand mode is defined as the timeelapsed between a mobile client starting to search k−1peersand receiving the candidate answers from the agent.On theother hand,the query response time in the proactive mode is defined as the time elapsed between a mobile client startingto forward its query along with the cloaked spatial regionto the agent and receiving the candidate answers from theagent.The simulation model is implemented in C++usingCSIM[35].In all the experiments in this section,we consider an in-dividual random walk model that is based on“random way-point”model[7,8].At the beginning,the mobile clientsare randomly distributed in a spatial space of1,000×1,000square meters,in which a uniform grid structure of100×100cells is constructed.Each mobile client randomly chooses itsown destination in the space with a randomly determined speed s from a uniform distribution U(v min,v max).When the mobile client reaches the destination,it comes to a stand-still for one second to determine its next destination.Afterthat,the mobile client moves towards its new destinationwith another speed.All the mobile clients repeat this move-ment behavior during the simulation.The time interval be-tween two consecutive queries generated by a mobile client follows an exponential distribution with a mean of ten sec-onds.All the experiments consider one half-duplex wirelesschannel for a mobile client to communicate with its peers with a total bandwidth of2Mbps and a transmission range of250meters.When a mobile client wants to communicate with other peers or the location-based database server,it has to wait if the requested channel is busy.In the simulated mobile environment,there is a centralized location-based database server,and one wireless communication channel between the location-based database server and the mobile。
Fast QR
336
´ A POLIN ARIO , S IQUEIRA , AND D INIZ
Table 1. Classification of the fast QR algorithms Error Type A Posteriori A Priori Forward FQR POS F [4] FQR PRI F [1] Prediction Backward FQR POS B [8] FQR PRI B [6], [9]
Jos´ e Antonio Apolin´ ario Jr.,1 Marcio G. Siqueira,2 and Paulo S. R. Diniz2
Abstract. QR decomposition techniques are well known for their good numerical behavior and low complexity. Fast QRD recursive least squares adaptive algorithms benefit from these characteristics to offer robust and fast adaptive filters. This paper examines two different versions of the fast QR algorithm based on a priori backward prediction errors as well as two other corresponding versions of the fast QR algorithm based on a posteriori backward prediction errors. The main matrix equations are presented with different versions derived from two distinct deployments of a particular matrix equation. From this study, a new algorithm is derived. The discussed algorithms are compared, and differences in computational complexity and in finite-precision behavior are shown. Key words: Adaptive system, fast RLS algorithm, QR decomposition.
基于改进YOLO6D的缸盖位姿估计算法
CBL
CBL = Conv BN Leaky ReLU
Add
Res1
Res unit =
CBL
CBL
+
Res2
Res8
Resn = CBL
Res unit
n个残差块 Res8
Res4
CBL*3
CBL*3
CBL*2
图 1 改进后 YOLO6D 网络结果 Fig.1 Improved YOLO6D network results 与 Darknet19 相 比,Darknet53 层 次 更 深, 整 体 网 络在性能上超过 Darknet19,可以有效提取细节特征, 在工业领域应用广泛 。 [12,13]
3.2 数据集 本文利用 ObjectDatasetTools 制作数据集。实验采 集平台包括 Intel RealSense D435i 相机、三脚架等。首 先 对 相 机 进 行 标 定, 获 取 相 机 的 内 外 参 数。 然 后 打 印 Arucomarkers 文件夹下的 Aruco 标记码贴在目标物体周 围,确保标记码平整、不重复、间距合理。之后使用相 机进行录像,过程中确保图像变化平稳,相机视野内至 少出现 2-3 个标记码。获取目标物体的 3D 原始模型后, 删除不必要的背景数据并处理噪声。最后,生成掩码和 标签文件,得到 LineMod 格式的数据集,如图 2 所示。 3.3 实施细节 为了避免过拟合,在实验中对图像进行数据增强, 例如,调整饱和度、曝光度、色调、旋转角度,对图像 进行随机缩放和平移,以生成更多训练样本。实验使用 随机梯度下降法进行训练,设置初始学习率为 0.001,每 完成 100 个 Epoch 在原来的学习率上衰减 10 倍。本文 采用以下 3 个常见的指标评估算法的性能,即 2D 投影误 差、ADD 以及 5cm5°。 3.4 实验结果及分析 本文在自建的缸盖数据集中进行训练,验证 YOLO6D 与改进后算法的性能。实验过程曲线及实验结果如图 3、 图 4 所示。 从实验结果可知(如表 1 所示),在相同实验条件下,
uplink
3
1
Introduction
Direct-Sequence code-division multiple-access systems have the ability to accommodate multiple users in multi-path fading environments. By making use of advanced detection algorithms, the diversity of such channels can be exploited [1]. Uplink performance can be further increased by equipping the base station (BS) with multiple receive antennas [2–5]. Through advanced coding schemes (such as turbo codes [6] and/or space-time codes [3]), reliable transmission at very high data rates becomes possible. However, in order to exploit multi-path and/or receive diversity, the receiver requires accurate channel estimates to perform reliable data detection [7]. In recent years, a lot of effort has been put in developing powerful channel estimation algorithms. In the context of the
U.S.A.
Abstract
AdaBoost algorithm of Freund and Schapire has been successfully applied to many domains 2, 10, 12] and the combination of AdaBoost with the C4.5 decision tree algorithm has been called the best o -the-shelf learning algorithm in practice. Unfortunately, in some applications, the number of decision trees required by AdaBoost to achieve a reasonable accuracy is enormously
large and hence is very space consuming. This problem was rst studied by Margineantu and Dietterich 7], where they proposed an empirical method called Kappa pruning to prune the boosting ensemble of decision trees. The Kappa method did this without sacri cing too much accuracy. In this work-in-progress we propose a potential improvement to the Kappa pruning method and also study the boosting pruning problem from a theoretical perspective. We point out that the boosting pruning problem is intractable even to approximate. Finally, we suggest a margin-based theoretical heuristic for this problem. Keywords: Pruning Adaptive Boosting. Boosting Decision Trees. Intractability.
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
A Scalable Method for Predicting Network Performancein Heterogeneous ClustersDimitrios Katramatos∗Department of Computer Science University of VirginiaCharlottesville,V A22903dk3x@Steve J.Chapin Department of Electrical Engineering and Computer ScienceSyracuse UniversitySyracuse,NY13244chapin@AbstractAn important requirement for the effective schedulingof parallel applications on large heterogeneous clusters isa current view of system resource availability.Maintain-ing such a view is a time consuming problem,potentially O(N2).Although CPU availability is relatively easy to monitor,interconnecting network bandwidth varies not onlywith network topology,but also with message size and evenwith respect to the load of the communicating nodes.Thispaper describes a method for predicting a cluster’s networkperformance for the purpose of scheduling parallel appli-cations.The method generates a cluster-specific networkmodel which can predict the latency of communications be-tween any pair of nodes in linear time and under any com-putational and/or communication load conditions.The pa-per also presents the models generated for the Centurioncluster at the University of Virginia and the Orange Grovecluster at Syracuse University.A study of the predictionaccuracy of the method under various load conditions bycomparison to experimental measurements indicates an av-erage prediction error of approximately5%with the maxi-mum encountered prediction error of less than9%.1IntroductionEffective scheduling(or mapping)of parallel applica-tions on large heterogeneous clusters can be achieved by pursuing the best practicable match between a system’s re-source availability and an application’s resource needs and utilization patterns.When the necessary system and appli-cation information is readily available,such a matching can be pursued with the use of a global optimization algorithm,∗Currently with the Brookhaven National Laboratory.like simulated annealing or a genetic algorithm.Obtain-ing system information,application information,andfind-ing an efficient mapping are three separate aspects of thescheduling problem.In this paper,we focus on the aspectof obtaining and maintaining a consistent current view ofthe resource availability of a system.In particular,we consider two fundamental system re-sources per cluster node:CPU availability and networkconnection availability.To be useful for scheduling pur-poses,the resource availability information must be avail-able on-demand and reflect the condition of the comput-ing system at the time of the scheduling request.Obtain-ing such information for a large heterogeneous cluster with N nodes can be a complex problem.Although monitoring per node parameters,such as CPU availability,is relativelycheap(O(N)),four factors complicate the monitoring ofavailable network bandwidth between nodes:system het-erogeneity,network topology,programming environmentoverhead,and the combined load of computation and com-munication on participating nodes.System heterogeneity,network topology,and programming environment overheadare static factors,typically changing only with major up-grades of the system.With suitable profiling measurementsin the absence of significant node load(as is the case after ageneral reboot of the“machine”)we can measure the max-imum system capacity both in terms of computation andcommunication.Programming environment overhead canhave a direct effect on the communication latency.For ex-ample,our experiments using the popular MPI[1]environ-ment on three significantly different clusters have shown anon-linear variation of latency vs.message size up to a mes-sage size of8KB(seefigure1).Assuming linearity in thisregion can lead to an error of20%or higher in the corre-sponding latency values.A dynamic factor affecting communication latency is theamount of computation and communication load that is al-Figure1.Non-linear region of message la-tency curves.ready present in the system.This load needs to be taken into account when scheduling new tasks on the nodes of a cluster.The faster and more accurately we can measure or predict this dynamic load,the more effective our resulting schedule will be.The CPU load of a pair of nodes is eas-ily available,but does not,alone,give an accurate predic-tor of dynamic communication capabilities.On the other hand,dynamically measuring pairwise communication la-tencies/bandwidth on a per-message-size basis provides ac-curate readings,but is an impractical approach because it requires O(N2)time to do a full measurement of an N-size cluster.Our method uses the results of a sequence of mi-crobenchmarks on a cluster to generate a model of the dy-namic effect of the per-node CPU and network connection load on the latency of node pairwise communications.The model accepts two inputs:•the CPU availability(A CP U):the fraction of CPU thata newly created process will be given when assignedto run on a node,and•the network connection availability(A NET):the per-centage of free bandwidth of the node’s network con-nectionGiven these quantities for a pair of nodes and a specific mes-sage size,the model can provide an estimated increase of the communication latency over the measured latency for an unloaded cluster.During runtime,the factors A CP U and A NET can be gathered in O(N)time for the entire cluster.Section2describes the background of the method while section3discusses the processing and correlation of the ex-perimental profiling data and the creation of the model for a cluster.Section4presents a study of the method’s pre-diction error.Section5discusses related work and,finally, section6presents the conclusions.2BackgroundThe computing systems of interest incorporate hundreds or even thousands of nodes and may exhibit various degrees of heterogeneity in node architecture,network architecture, network topology,operating system,etc.The two main properties of the interconnecting network we use to define the basis for categorizing such systems are the minimum bisection,measured in links or paths,or equivalently the minimum bisection bandwidth,and the use of the network (dedicated vs.public).The minimum bisection is an indica-tion for the capacity of the network regardless of the com-munication patterns of the nodes.The upper bound on the minimum bisection for N nodes is N/2,known as“full”bi-section.The lower the bisection of a system,the higher the probability of contention appearing in the interconnection network and increasing the communication latency.The use of a network imposes restrictions on the origin of the traf-fic in that network.A network dedicated to interconnecting the nodes of a clustered system contains only internode(in-ternal)traffic,while a public network also contains traffic from sources external to the system,which can be unpre-dictable.With these two properties in mind,we consider three categories(types)of clustered systems:Type-1Basic heterogeneous cluster with N nodes,high minimum bisection and dedicated network.Type-2Federated cluster composed of type-1clusters in-terconnected with one or more,low or high capacity, dedicated links.Type-3Federated cluster composed of type-1and/or type-2clusters connected with public links or the In-ternet.In the environment of a type-1cluster the only domi-nant factor dynamically affecting the latency of communi-cations between any pair of cluster nodes is the load condi-tions at the two end nodes.A type-2cluster maintains the dedicated network so that there is no unpredictable traffic. However,the network links connecting the type-1building blocks may or may not be enough in number and/or capacity to keep the minimum bisection at a level where contention is absent or insignificant.Finally,a type-3cluster is essen-tially a special grid configuration.Here,the links between the type-1and/or type-2clusters are neither dedicated nor do they necessarily have enough bandwidth for the basic assumption to be valid.Traffic from sources external to the system can pass through the public links at any instant,so the available bandwidth can vary with time in unpredictable ways.We use the results of a set of microbenchmarks on a cluster to generate an approximate model of the end-to-end communication latency between cluster nodes.The model takes into account the dynamic effects of the per-node CPU and network connection load on the“nominal”latency(minimum latency measured under no-load condi-tions)of node pair-wise communications.The model is re-alized as:•a3D matrix providing the no-load communication la-tencies for every possible pair of nodes and for a range of message sizes,and•a set of correction coefficient parametric curve families providing corrections to nominal latencies due to node CPU and NIC load.The model accepts as inputs:•the identities of the sending and the receiving node,•the message size,•the CPU availability A CP U:the fraction of CPU thata newly created process will be given when assignedto run on a node;if N CP U the number of CPUs of a node,and LOAD CP U the current average CPU load then A CP U is defined as:N CP U1+LOAD CP U(1)A CP U is equal to1.00for nodes free of load;and •the network connection availability A NET:the per-centage of free bandwidth of the node’s network con-nection.The output is an estimated value of the actual communica-tion latency a message of given size will encounter when sent from the sender to the receiver node under current load conditions.During runtime,the factors A CP U and A NET can be gathered in O(N)time for the entire cluster.The presented model estimates properly the pair-wise communication latencies when the internode latency is pri-marily affected only by node loads.Type-1clusters fol-low this premise by definition.Type-2and type-3clusters, however,are bound to encounter common link contention problems.To account for this kind of contention,we extend the basic model with an additional latency correction.In type-2clusters,we calculate this correction by periodically sampling the latency of a shared link by running a bench-mark on at least one characteristic node pair connected by that link.Sampling more than one pair increases the confi-dence in the results.If the measurements show consistently a higher latency value than the one estimated by the basic model,contention is present and the additional correction is the ratio of the measured value over the basic model esti-mated value.In type-3clusters there is the additional issue of unpre-dictable shared link bandwidthfluctuations with time due to traffic external to the system.Because of thesefluctu-ations,it is practically impossible to populate the nominal (no-load)latency matrix with representative values for node pair-wise communications that utilize the shared link.The extension to the model is similar to the extension for type-2 clusters,i.e.periodic sampling of the latency between char-acteristic node pairs to determine the condition of the link. However,the selected node pairs must stay free of load at all times because it is impossible to distinguish between la-tency increases due to external traffic or due to node load. By eliminating the possibility of an increase due to node load,each measurement returns the(instantaneous)latency of the link between the node pair.A second issue is the population of the region of the no-load latency matrix cor-responding to the shared link.In this case,the no-load la-tency for a node pair can be either the minimum observed no-load latency value over a period of time,as this value is highly probable to correspond to a free link,or the theoret-ical minimum latency estimated from the characteristics of the hardware and the topology of the network.The extended model now applies to type-3clusters as to type-2ones;it es-timates a correction to nominal latency due to node load and a second correction due to link contention.3Correlation MethodologyThe model of the internode latency behavior of the clus-ter is generated by correlating the parameters measured dur-ing the benchmarking phase.The goal is to obtain a mean-ingful function that expresses the variation of latency vs.a set of parameters measured in O(N)time(in this case, A CP U and A NET),i.e.a function of the following form:LL0=F(A CP U,A NET)(2)Each pair of nodes chosen to represent a characteristic group of cluster nodes requires two series of experimen-tal data(for a predetermined set of message sizes).Thefirst series consists of measurements of LL0(t)and A CP U(t) under the imposition of pure CPU load and the second se-ries measurements of LL0(t),A CP U(t),and A NET(t)under the imposition of composite(network and CPU)load.Pure CPU and/or composite load is imposed artificailly on the system by running in the background synthetic,computa-tion and/or communication intensive jobs and increasingly reducing A CP U and A NET.Measurements for all param-eters in a series are taken for N i periodic instants t j within an interval∆t i,during which the load remains steady.Al-though desirable,it is not a requirement that the∆t i periodsbe equal.For each interval∆t i the corresponding value of each parameter P is computed as the average of the values sampled during that period,i.e.:P i= Nij=1P jN i(3)For each parameter,the computation of the average yields a set of i value pairs(P,∆t)i which are distinct values of the function P(t).Because each∆t i is common for all parameters in a series,it is possible to create a direct one-to-one correspondence between parameter values,in effect eliminating the parameter time.Processing of thefirst experimental data series samples the normalized latency as a function of CPU availability:LL0|CP U=F(A CP U)(4)Processing of the second experimental data series samples the normalized latency as a function of CPU and network availability:LL0|composite=G(A CP U,A NET)(5)The second series also samples CPU availability as a func-tion of network availability:A CP U=Q(A NET)(6)The latter function expresses the variation of A CP U vs.A NET for this specific series.To isolate the network con-nection load effect we assume that the normalized latency corresponding to the composite load,which is a function of CPU and network availability can be expressed as the sum of a function of solely the CPU availability plus a function of solely the network availability:LL0|composite=G1(A NET)+G2(A CP U)(7)This assumption is based on the fact that the network con-nection is controlled by a separate asynchronous subsystem. Since the effect on the latency from pure CPU load is al-ready known from the processing of the experimental data of thefirst data series,is possible to sample the normalized latency as a function of the network availability:LL0|NET(A NET)=G1(A NET)(8) This is done by eliminating the G2portion approximated as:F(Q(A NET))(9) thus yielding:G1(A NET)=G(A CP U,A NET)−F(Q(A NET))(10)We now introduce the terms“correction for CPU load”C CP U,“correction for network connection load”C NET, and“total correction for load”Corr T.Knowing the val-ues of L0and LL0directly leads tofinding the value of L. LL0represents Corr T and is indeed the ratio of L over L0 under load conditions.The described process generates the desired corrections:C CP U=LL0|CP U(11)the correction for CPU load,also a ratio with values≥1, and:C NET=G1(A NET)(12) the correction for network connection load,now a ratio≥0. The latter correction cannot be used by itself but only in conjunction with C CP U to form the total correction.This is in accordance with the fact that it is not possible to have 100%pure network load without at least a small portion of CPU load.Thus:Corr T=C CP U+C NET(13) with C NET being zero when there is no network activity.Thefinal step in this procedure is to generate two fami-lies of correction curves for each characteristic category of cluster nodes,For the case of pairs formed with nodes of different characteristic categories,the necessary corrections can be derived from the curve families of both node cat-egories.These families of curves give the C CP U and the additional C NET for any message size.Summarizing,the full procedure to obtain an approxima-tion of the network latency L between two cluster nodes A and B,for message size m s,comprises the following steps: 1.Get zero-load latency for nodes A and B and messagesize m s,L0A,B,m s,from the no-load latency matrix.2.Get current CPU availability for nodes A and B,A CP UAand A CP UB.3.Get current network connection transmitting and re-ceiving availability for nodes A and B,A NETA tx,A NETA rx,A NETB tx,and A NETB rx.4.Get values for CPU correction for nodes A and B,C CP UAand C CP UB,from the appropriate curves of the model corresponding to each node category.pute the correction for CPU as the followingmean:C CP U=C CP UA+C CP UB2(14) 6.Get values for network connection correction for Aand B from the appropriate curves of the model,us-ing C NETA tx,C NETA rx,C NETB tx,and C NETB rx.pute the additional network correction as the fol-lowing mean:C NET =C NET A tx +C NET A rx +C NET B tx +C NET B rx4(15)orC NET =C NET A +C NET B2(16)8.Total correction is given as:Corr T =C CP U +C NET(17)9.and finally the estimated latency value is:L =L 0A,B,m s ∗Corr T(18)4The Resulting ModelsThe Centurion cluster at the University of Virginia [2]is a 256-node cluster with 128Intel and 128Alpha processor-based nodes running Linux.The nodes are connected to 24-port fast Ethernet switches which are in turn connected to a 12-port 1bps central switch.Figures 2and 3present correction curve families for the Centurion nodes of Alpha architecture.Note that the message size axis is logarithmic to accommodate the range of useful message sizes.These curves are essentially a fingerprint of the effect of CPU and network load on the message latency for the specific hard-ware and software of the cluster.An immediate observation for Centurion is that the effect of network connection load diminishes with increasing message size while the opposite is true for the effect of CPU load.Figure 4.Orange Grove:correction for CPU load,SPARC nodesThe Orange Grove cluster at Syracuse University †is a 28-node cluster with 8Alpha,8Sparc,and 12In-tel processor-based nodes.It is organized in 4mixed-architecture basic blocks which in turn form two larger block pairs.There is only one network link between the two block pairs (central link)and all network connections are fast Ethernet.The cluster is configured so as to resem-ble a scaled down type-3cluster.Due to the low bisection (minimal bisection bandwidth of just 100Mbps)the central link is prone to contention effects.Figure 4gives an ex-ample of what the correction coefficient curve families look like for this cluster.The coefficient curves of Centurion present a sharp drop of the correction coefficients for message sizes in the re-†Weused a re-arranged version of the original Orange Grove cluster.Figure5.Central link contention effect,Intelnodes.Figure 6.Central link contention effect,SPARC nodes.gion of1KB to8KB,regardless of architecture.However, for increasing message sizes,the coefficient values for CPU load correction are consistently increasing,and,for network load correction,consistently decreasing.In contrast,the co-efficients for Orange Grove present a minimum in the same message size region.This observation holds for all three node architectures used in the Orange Grove cluster.Figures5and6present the contention effect at the cen-tral link as seen from the perspective of Intel and SPARC nodes,respectively.Both curve families present a minimum in the message size range of1KB to8KB,similarly to the curve families offigure4.This fact indicates that if pro-grams use message sizes in the system’s“preferred”range, they will encounter minimal contentioneffects.Figure7.Prediction error.5Model EvaluationTo evaluate the results of the models generated for Cen-turion and Orange Grove,we measured the actual pair-wiselatency for several cases of node pairs under different con-ditions of both real and artificial load.We compare thesemeasurements to the predictions of the models Figure7of-fers a collective view of the results obtained from both clus-ters,with parameter sweeps for A CP U,A NET,and mes-sage size,and for a variety of node pairs.The node pairssample the node combination space with regard to hard-ware architecture and network topology.Thefigure displaysthe mean prediction error and the corresponding95%con-fidence intervals fro the full range of encountered A CP Uvalues and for the upper(A NET=0.9−1.0)and lower (A NET=0.3−0.4)limits of the encountered A NET val-ues.In essence,thefigure indicates the range of predictionerror.We did not observe any significant dependance of theprediction error on node category mix and/or message size.The error remains virtually constant at approximately2%for high values of A NET.However,there is a strong indica-tion that model accuracy decreases with increasing networkload.This is because large loads cause largerfluctuationsof the various parameter values(larger standard deviations).The maximum error encountered during the comparison isapproximately9%,but such error levels correspond to ex-tremely loaded node pairs.For all practical purposes,themean error is5%or less.6Related WorkThe performance of the message passing layer,as partof the runtime system of a cluster,is critical for mes-sage passing applications.In our approach,the use of mi-crobenchmarks is a key element in profiling and monitor-ing system resources.The design of these benchmarks hasits basis in the information presented by Liu,Culler,andYoshikawa[3].They use microbenchmarks to evaluate MPIperformance on three different hardware platforms,eachwith its own implementation of MPI,and reveal the extendthat hardware architecture and underlying implementationof MPI affect message passing performance.There are several research efforts dealing with the prob-lem of monitoring resource availability in large-scale het-erogeneous distributed systems.One of the most widelyused works in system resource monitoring in heteroge-neous distributed systems,is the Network Weather Service(NWS)[4].NWS uses sensors(daemon processes)to mon-itor the resources of each host participating in a distributedsystem and a set of centralized administrative and databasecomponents to coordinate measurements and store results.Groups of sensors can be organized in NWS cliques,struc-tures that enforce a conflict-free pair-wise benchmarkingprotocol.NWS can collect sensor data for all measured pa-rameters periodically and create time series and has the ca-pability of forecasting parameter values for near-future pe-riods.To deal with the scalability problems NWS encoun-ters when gathering O(N2)latency measurements,Swany and Wolski[5]proposes dividing a system’s nodes into cat-egories and selecting representative nodes from each cat-egory to be members of NWS cliques,thus creating a hi-erarchical clique structure.Recent work by Casanova etal.[6]uses NWS traces to model CPU load and network la-tency/bandwidth while examining heuristics for schedulingparameter sweep applications in grid environments throughsimulation.Remos[7]is a system designed to provide distributed ap-plications with resource information and supports resourcemeasurements in various environments and for various ap-plications.The system incorporates collectors,modelers,and predictors.The collectors acquire network resourceinformation and forward it to the modelers;the modelersexpose the Remos API to applications and do all neces-sary processing to respond to queries made by applicationsthrough the API.The predictors turn a measurement historyinto a prediction for future behavior.The collectors use ei-ther SNMP to collect information directly from routers andswitches or explicit benchmarking similarly to NWS;theycan operate both on-demand and periodically and rely onaggressive information caching to reduce the O(N2)over-head.Topology-d[8]is a service that periodically estimatesend-to-end latency and available bandwidth among N net-worked resources(N2paths).Topology-d uses the es-timated delay and bandwidth values to compute a fault-tolerant,minimum cost spanning tree that connects the par-ticipating sites.The service uses a“damping”effect to takeprevious measurement history into account and tries to mea-sure the delay and bandwidth as actually“seen”by an ap-plication.Network Tomography[9]uses end-to-end measurementsof a network to infer internal behavior.Internal networkdelays,common portions or routes between different setsof senders and receivers,and other information can be ob-tained without having to perform measurements at internalpoints of the network.The Autopilot system[10]is an infrastructure for dy-namic performance tuning of heterogeneous computationalgrids.Autopilot sensors,inserted in application or librarycode,can either capture and transmit raw data or computeand periodically transmit application or system characteri-zation metrics.When an application executes,the embed-ded sensors register with a name server provided by an Au-topilot Manager.Remote clients can then query the man-ager to locate sensors with specific properties and receivemeasurement information directly from these sensors.Thisinformation can be used to decision mechanisms to opti-mize resource management.Sensors,managers and clientscan execute anywhere on a grid.Finally,Ganglia[11]is a scalable distributed monitor-ing system for clusters and grids,with support for severaloperating systems.The system is based on a hierarchicaldesign targeted at federations of clusters.Ganglia can mon-itor a variety of metrics,system supported or user defined,within clusters,and uses a tree of point-to-point connectionsamongst representative nodes of clusters participating in afederation,through which it can collect the partial states andform a global view of the federated ing a specialtool,Ganglia can store and visualize historical data for timegranularities ranging from minutes to years.The major difference of our approach with regard to thementioned works is the use of a cluster-specific network la-tency model,based on experimental latency measurements,to predict the performance of the network at schedulingtime.The use of this model makes possible to predict net-work latencies in linear time without having to resort to amultitude of“expensive”node pair-wise latency measure-ments when latency information is needed within short timeconstraints.7ConclusionsWe have presented a methodology for creating a resourceavailability model of a given heterogeneous cluster.The ul-timate goal of this model is to generate,on demand dur-ing runtime,a valid view of the available resources withouthaving to resort to exhaustive,time consuming measure-ments.Our approach uses an initial O(N2)round of off-line measurements of internode latency to obtain the suit-able parameter values and to generate the model,althoughcareful analysis of the cluster architecture can amelioratethis affect by executing non-interfering microbenchmarks。