协同过滤 文献










1. 基于用户的协同过滤基于用户的协同过滤主要依据用户的历史行为数据,找到与当前用户兴趣相似的其他用户,然后根据这些相似用户的喜好为用户推荐内容。


2. 基于物品的协同过滤基于物品的协同过滤则是通过分析物品之间的相似度,将用户感兴趣的物品推荐给用户。


三、协同过滤算法的优缺点分析(一)优点1. 简单易实现:协同过滤算法基于用户的历史行为数据,易于实现且效果良好。

2. 推荐准确:通过分析用户的历史行为和物品之间的相似度,可以为用户推荐其可能感兴趣的内容。

3. 可解释性强:协同过滤算法的推荐结果具有可解释性,用户可以了解推荐的原因和依据。

(二)缺点1. 数据稀疏性问题:在推荐系统中,由于用户的行为数据往往不完整,导致数据稀疏性问题严重,影响推荐效果。

2. 冷启动问题:对于新用户或新物品,由于缺乏历史行为数据,难以进行准确的推荐。

3. 可扩展性问题:随着用户和物品数量的增加,协同过滤算法的计算复杂度也会相应增加,导致系统可扩展性差。


1. 融合多种数据源:将用户的社交网络信息、物品的属性信息等融入推荐系统,提高推荐的准确性和多样性。






















协同过滤算法英语作文Title: The Application and Advancements of Collaborative Filtering Algorithm。

Collaborative filtering algorithm, a cornerstone in the field of recommender systems, has garnered widespread attention for its ability to predict user preferences and provide personalized recommendations. In recent years, with the exponential growth of online platforms and the increasing volume of data generated by users, collaborative filtering algorithms have become indispensable tools for businesses seeking to enhance user experience and drive engagement. This essay explores the principles, applications, and advancements of collaborative filtering algorithms, shedding light on their significance in today's digital landscape.At its core, collaborative filtering relies on the principle of leveraging collective user behavior to make predictions about the interests of individual users. Byanalyzing user interactions, such as ratings, purchases, and preferences, collaborative filtering algorithmsidentify patterns and similarities among users to generate recommendations. There are two main approaches to collaborative filtering: memory-based and model-based.Memory-based collaborative filtering, also known as neighborhood-based collaborative filtering, operates by calculating similarities between users or items based on their historical interactions. One of the most widely used techniques in this approach is cosine similarity, which measures the cosine of the angle between two vectors representing user preferences. By identifying users with similar preferences, memory-based collaborative filtering generates recommendations based on items liked or purchased by similar users.On the other hand, model-based collaborative filtering involves building a mathematical model based on the user-item interaction data. Techniques such as matrix factorization and singular value decomposition (SVD) are commonly employed to decompose the user-item matrix intolatent factors representing user preferences and item characteristics. By learning these latent factors, model-based collaborative filtering can make accurate predictions even in the presence of sparse data.The applications of collaborative filtering algorithms are manifold, spanning across various industries including e-commerce, media streaming, social networking, and more.E-commerce platforms utilize collaborative filtering to recommend products based on the browsing and purchasing history of users, thereby increasing sales and customer satisfaction. Similarly, media streaming services leverage collaborative filtering to suggest movies, TV shows, or music based on users' past viewing or listening behavior, enhancing user engagement and retention.Furthermore, social networking platforms employ collaborative filtering to recommend friends, groups, or content tailored to the interests and preferences of users. By analyzing the social graph and user interactions, these platforms can foster connections and facilitate content discovery, thereby enriching the user experience.Additionally, collaborative filtering algorithms are usedin content-based filtering and hybrid recommender systems, combining multiple approaches to generate more accurate and diverse recommendations.Despite its effectiveness, collaborative filtering algorithms are not without limitations. One of the primary challenges is the cold start problem, which occurs when new users or items have limited interaction data, making it difficult to generate accurate recommendations. To address this issue, techniques such as demographic filtering, content-based filtering, and hybrid approaches are employed to supplement collaborative filtering and improve recommendation quality.Moreover, collaborative filtering algorithms may suffer from the problem of popularity bias, wherein popular items tend to receive more recommendations, leading to a lack of diversity in recommendations. To mitigate this bias, techniques such as diversity-aware recommendation and serendipity enhancement are employed to ensure that users are exposed to a variety of items across differentcategories.In recent years, significant advancements have been made in collaborative filtering research, driven by innovations in machine learning, deep learning, and data mining techniques. Deep learning models, such as neural collaborative filtering (NCF) and recurrent neural networks (RNNs), have shown promising results in capturing complex patterns and dependencies in user-item interactions, thereby improving recommendation accuracy and scalability.Furthermore, the integration of contextual information, such as temporal dynamics, location-based factors, and social influence, has enhanced the capabilities of collaborative filtering algorithms to provide context-aware recommendations. By considering contextual factors, such as time of day, user location, or social connections, collaborative filtering algorithms can adapt recommendations to better suit the preferences and situational needs of users.In conclusion, collaborative filtering algorithms playa crucial role in the era of big data and personalized recommendation systems. By harnessing the collective wisdom of users, collaborative filtering enables businesses to deliver tailored recommendations that enhance user experience, drive engagement, and foster loyalty. With ongoing research and advancements in machine learning and data science, collaborative filtering algorithms are poised to remain at the forefront of recommender systems, shaping the future of digital commerce and content consumption.。







协同过滤(collaborative filtering)[5]在推荐系统中最为常用,它的根本思想是根据相似的用户群体或者项目群体来向目标用户推荐其可能感兴趣的项目资源。




二、协同过滤推荐算法(一)核心内容1、计算相似度为了计算用户或项目之间的相似度,协同过滤推荐算法主要利用皮尔逊相关度系数[9](Pearson Correlation Coefficient,PCC)来实现,其中PCC的取值范围是[-1,1]。















1. 数据处理数据处理是任何一个推荐系统的核心。






协同过滤算法范文协同过滤算法(Collaborative Filtering)是一种常用的个性化推荐算法,其核心思想是基于用户和项目之间的相似性进行推荐。


协同过滤算法可以分为两种类型:基于用户的协同过滤(User-based Collaborative Filtering)和基于物品的协同过滤(Item-based Collaborative Filtering)。















一种改进方法是基于矩阵分解的协同过滤算法(Matrix Factorization)。










































































三、关键技术与方法1. 数据预处理:该模块主要负责收集用户的历史观影记录和电影的属性信息。



2. 协同过滤算法:本系统采用基于用户的协同过滤算法。


3. 推荐结果生成:该模块根据协同过滤算法的结果,结合电影的属性和其他相关因素,生成个性化的电影推荐结果。


4. 用户界面:本系统提供友好的用户界面,方便用户查看和操作。



四、系统实现1. 数据采集与处理:通过爬虫程序从各大电影网站和社交媒体平台收集电影信息和用户的历史观影记录。


2. 协同过滤算法实现:采用基于余弦相似度的算法计算用户之间的相似度。





























1. 基于用户的协同过滤基于用户的协同过滤主要是通过计算用户之间的相似性来找到兴趣相似的用户群体,然后根据目标用户的兴趣和其他相似用户的喜好信息为目标用户进行推荐。


2. 基于项目的协同过滤基于项目的协同过滤则是通过计算项目之间的相似性来为目标用户推荐相似的项目。


































二、相关技术概述1. 协同过滤协同过滤是一种基于用户行为的推荐算法,通过分析用户的历史行为数据,找出相似用户,从而为用户推荐其可能感兴趣的物品。


2. 深度学习深度学习是机器学习的一个分支,通过模拟人脑神经网络的工作方式进行学习和推理。




具体步骤如下:1. 数据预处理:对用户行为数据进行清洗和转化,提取出有用的特征信息。

2. 特征提取:利用深度学习模型提取用户和物品的深层特征,包括用户兴趣、物品属性等。

3. 相似度计算:结合协同过滤的思想,计算用户之间的相似度和物品之间的相似度。


4. 推荐生成:根据用户的历史行为数据、相似度计算结果以及深度学习模型提取的特征信息,生成个性化的推荐结果。


5. 评估与优化:通过实验评估推荐算法的准确性和效果,根据评估结果对算法进行优化和改进。















(二)算法实现步骤1. 构建用户信任网络:通过分析用户的历史行为数据和社交网络数据,构建用户之间的信任网络。


2. 计算用户相似度:在考虑了信任关系的基础上,计算用户之间的相似度。


3. 预测用户评分:根据用户相似度和信任关系,预测用户对未评分商品的评分。


4. 生成推荐列表:根据预测的评分,为用户生成推荐列表。

















三、算法原理基于信任的协同过滤推荐算法主要包含以下步骤:1. 构建信任网络:根据用户的历史行为数据和社交网络信息,构建一个信任网络。


2. 计算信任度:通过分析用户的历史行为数据和社交网络信息,计算用户之间的信任度。


3. 协同过滤:利用计算得到的信任度,对用户的兴趣进行协同过滤。


4. 推荐结果优化:根据用户的反馈信息和实时数据,对推荐结果进行优化和调整,以提高推荐准确性和用户体验。

四、方法与技术在实现基于信任的协同过滤推荐算法时,需要采用以下技术和方法:1. 数据预处理:对用户的历史行为数据进行清洗、去重和格式化等预处理操作,以便后续分析。

2. 信任网络构建:采用图论、机器学习和深度学习等技术,根据用户的历史行为数据和社交网络信息构建信任网络。









在构建基于信任的协同过滤推荐算法时,需要考虑以下方面:1. 信任关系的获取:这通常通过分析用户的社交网络、交互记录、评分数据等获得。

2. 信任度的计算:根据不同的数据源和上下文,采用不同的算法计算用户间的信任度。

3. 推荐策略的制定:结合用户的兴趣和信任关系,制定合适的推荐策略。

































外文:1】Introduction to RecommenderSystemApproaches of Collaborative Filtering: Nearest Neighborhood and Matrix Factorizationa We are leaving the age of information and entering the age of recommendation.”Like many machine learning techniques, a recommender system makes prediction based on users, historical behaviors. Specifically, it,s to predict user preference for a set of items based on past experience. To build a recommender system, the most two popular approaches are Content-based and Collaborative Filtering.Content-based approach requires a good amount of information of items, own features, rather than using users, interactions and feedbacks. For example, it can be movie attributes such as genre, year, director, actor etc., or textual content of articles that can extracted by applyingNatural Language Processing. Collaborative Filtering, on the other hand, doesn't need anything else except users' historical preference on a set of items. Because it's based on historical data, the core assumption here is that the users who have agreed in the past tend to also agree in the future. In terms of user preference, it usually expressed by two categories. Explicit Rating, is a rate given by a user to an item on a sliding scale, like 5 stars for Titanic. This is the most direct feedback from users to show how much they like an item. Implicit Rating, suggests users preference indirectly, such as page views, clicks, purchase records, whether or not listen to a music track, and so on. In this article, I will take a close look at collaborative filtering that is a traditional and powerful tool for recommender systems.Nearest NeighborhoodThe standard method of Collaborative Filtering is known as Nearest Neighborhood algorithm. There are user-based CF and item-based CF. Let's first look at Userbased CF. We have an n 义m matrix of ratings, with user u, i =1, ...n and item p , j=i, ・・・m. Now we want to predict the rating r if target user i did not watch/rate an item j. The process is to calculate the similarities between target user i and all other users, select the top X similar users, and take the weighted average of ratings from these X users with similarities as weights.2 Similaries(Ui,%t)0fr ij= ------------ 7~^-7 -------------------number of ratingsWhile different people may have different baselines when giving ratings, some people tend to give high scores generally, some are pretty strict even though they are satisfied with items. To avoid this bias, we can subtracteach user,s average rating of all items when computingweighted average, and add it back for target user, shown as below.2 Similaries(u h况口(为一凡)‘尸0 + number of ratingsTwo ways to calculate similarity are PearsonCorrelation and Cosine Similarity.£(% - r iK r kj- r Jt)Pearson Correlation : $1咻蚂,碌)二—'厂门产£(% 一以产£向缶Cosine Similarity : Si(n(u it趣)= ? * = —]J卒卒Basically, the idea is to find the most similar users to your target user (nearest neighbors) and weight their ratings of an item as the prediction of the rating of this item fortarget user.Without knowing anything about items and users themselves, we think two users are similar when they give the same item similar ratings . Analogously, for Item-based CF, we say two items are similar when they received similar ratings from a same user. Then, we will make prediction for a target user on an item by calculating weighted average of ratings on most X similar items from this user. One key advantage of Item-based CF is the stability which is that the ratings on a given item will not change significantly overtime, unlike the tastes of human beings.There are quite a few limitations of this method. It doesn't handle sparsity well when no one in the neighborhoodrated an item that is what you are trying to predict for target user. Also, it's not computational efficient as the growth of the number of users and products.Matrix FactorizationSince sparsity and scalability are the two biggest challenges for standard CF method, it comes a more advanced method that decompose the original sparse matrix to low-dimensional matrices with latent factors/features and less sparsity. That is Matrix Factorization.Beside solving the issues of sparsity and scalability, there's an intuitive explanation of why we need low-dimensional matrices to represent users' preference. A user gave good ratings to movie Avatar, Gravity, and Inception. They are not necessarily 3 separate opinions but showing that this users might be in favor of Sci-Fi movies and there may be many more Sci-Fi movies that this user would like. Unlike specific movies, latent features is expressed by higher-level attributes, and Sci-Fi category is one of latent features in this case. What matrix factorization eventually gives us is how much a user is aligned with a set of latent features,and how much a movie fits into this set of latent features. The advantage of it over standard nearest neighborhood is that even though two users haven,t rated any same movies, it,s still possible to find the similarity between them if they share the similar underlying tastes, again latent features. To see how a matrix being factorized, first thing tounderstand is Singular Value Decomposition(SVD) .Based on Linear Algebra, any real matrix R can be decomposed into 3 matrices U, £, and V. Continuing using movie example, U is an n x r user-latent feature matrix, V is an m x r movie-latent feature matrix. £ is an r x r diagonal matrix containing the singular values of original matrix, simply representing how important a specific feature is to predict user preference.R= UHV TU e E £IR rxr. V E IR rXfnTo sort the values of £ by decreasing absolute value and truncate matrix £ to first k dimensions( k singular values), we can reconstruct the matrix as matrix A. The selection of k should make sure that A is able to capture the most of variance within the original matrix R, so that A is theapproximation of R, A Q R. The difference between A and R is the error that is expected to be minimized. This is exactly the thought of Principle Component Analysis.When matrix R is dense, U and V could be easily factorized analytically. However, a matrix of movie ratings is super sparse. Although there are some imputation methods to fill in missing values , we will turn to a programming approach to just live with those missing values and find factor matrices U and V. Instead of factorizing R via SVD, we are trying find U and V directly with the goal that when U and V multiplied back together the output matrix R, is the closest approximation of R and no more a sparse matrix. This numerical approximation is usually achieved with Non-Negative Matrix Factorization for recommender systems since there is no negative values in ratings.See the formula below. Looking at the predicted rating for specific user and item, item i is noted as a vector q i? and user u is noted as a vector p u such that the dot product of these two vectors is the predicted rating for user u on item i. This value is presented in the matrix R' at row u and column i.Predicted Ratings : r f ui— p4生How do we find optimal 5 and p u? Like most of machine learning task, a loss function is defined to minimize the cost of errors.+ lMr ui is the true ratings from original user-item matrix. Optimization process is to find the optimal matrix P composed by vector p u and matrix Q composed by vector q in order to minimize the sum square error between predicted ratingsand the true ratings r^. Also, L2 regularization has been added to prevent overfitting of user and item vectors. It,s also quite common to add bias term which usually has 3 major components: average rating of all items p, average rating of item i minus p(noted as b u), average rating given by user u minus u(noted as b i).min - (plQi +〃 + 九+ 瓦))2+4(帆『+ ll@lF + 庆 + 导) p国瓦,瓦—OptimizationA few optimization algorithms have been popular to solve Non-Negative Factorization. Alternative Least Square is one of them. Since the loss function is non- convex in this case, there,s no way to reach a global minimum, while it still can reach a great approximation by finding localminimums. Alternative Least Square is to hold user factor matrix constant, adjust item factor matrix by taking derivatives of loss function and setting it equal to 0, and then set item factor matrix constant while adjusting user factor matrix. Repeat the process by switching and adjusting matrices back and forth until convergence. If you apply Scikit-learn NMF model, you will see ALS is the default solver to use, which is also called Coordinate Descent. Pyspark also offers pretty neat decomposition packages that provides more tuning flexibility of ALS itself.Some ThoughtsCollaborative Filtering provides strong predictive power for recommender systems, and requires the least information at the same time. However, it has a few limitations in some particular situations.First, the underlying tastes expressed by latent features are actually not interpretable because there is no content- related properties of metadata. For movie example, it doesn,t necessarily to be genre like Sci-Fi in my example. It can be how motivational the soundtrack is, how good theplot is, and so on. Collaborative Filtering is lack of transparency and explainability of this level of information.On the other hand, Collaborative Filtering is faced with cold start. When a new item coming in, until it has to be rated by substantial number of users, the model is not able to make any personalized recommendations . Similarly, for items from the tail that didn,t get too much data, the model tends to give less weight on them and have popularity bias by recommending more popular items.It,s usually a good idea to have ensemble algorithms to build a more comprehensive machine learning model such as combining content-based filtering by adding some dimensions of keywords that are explainable, but we should always consider the tradeoff betweenmodel/computational complexity and the effectiveness of performance improvement.中文翻译推荐系统介绍协同过滤的方法:最近邻域和矩阵分解“我们正在离开信息时代,而进入推荐时代。

Mach Learn(2008)72:231–245DOI10.1007/s10994-008-5068-4A collaborativefiltering framework based on both localuser similarity and global user similarityHeng Luo·Changyong Niu·Ruimin Shen·Carsten UllrichReceived:22June2008/Revised:22June2008/Accepted:23June2008/Published online:8July2008 Springer Science+Business Media,LLC2008Abstract Collaborativefiltering as a classical method of information retrieval has been widely used in helping people to deal with information overload.In this paper,we intro-duce the concept of local user similarity and global user similarity,based on surprisal-based vector similarity and the application of the concept of maximin distance in graph theory. Surprisal-based vector similarity expresses the relationship between any two users based on the quantities of information(called surprisal)contained in their ratings.Global user simi-larity defines two users being similar if they can be connected through their locally similar neighbors.Based on both of Local User Similarity and Global User Similarity,we develop a collaborativefiltering framework called LS&GS.An empirical study using the MovieLens dataset shows that our proposed framework outperforms other state-of-the-art collaborative filtering algorithms.Keywords Collaborativefiltering·Similarity measure·Information theory1IntroductionCollaborativefiltering algorithms are widely applied on e-commerce web sites,where they predict user preferences of items taking into consideration the opinions(in the formEditors:Walter Daelemans,Bart Goethals,Katharina Morik.H.Luo()·C.Niu·R.Shen·C.UllrichDepartment of Computer Science and Technology,Shanghai Jiao Tong University,1954Huashan Road, Shanghai200030,Chinae-mail:hengluo@C.Niue-mail:cyniu@R.Shene-mail:rmshen@C.Ullriche-mail:ullrich_c@of preference ratings)of other“similar”users.Generally,there are two major classes of collaborativefiltering algorithms,memory-based algorithms and model-based algorithms (Breese et al.1998).Because of their simplicity and robustness,memory-based algo-rithms are widely applied in practice,e.g.(Herlocker et al.1999;Linden et al.2003; Resnick et al.1994).To estimate a prediction for a particular user(i.e.,an active user), the memory-based algorithmsfirstfind users from the database that are most similar to this active user,and then combine those ratings together.The measurement techniques of the similarity between users include the Pearson Correlation Coefficient(Resnick et al.1994), Vector Space Similarity(VSS)algorithm(Breese et al.1998),and the extended generalized vector space model(Soboroff and Nicholas2000).These algorithms can be considered as user-based algorithms.However in practice,systems based on collaborativefiltering algorithms often face the problem of having at their disposal only an insufficient amount of preferences ratings of their individual users.Therefore,one of the biggest challenges of designing a collaborative filtering system is how to provide accurate recommendations with the sparse user profile data.To estimate an active user’s rating of a particular item,traditional user-based methods firstfind the user’s neighbors(the users who are similar to the active user).Then,the active user’s rating is predicted by averaging the(weighted)known ratings on the item by his/her neighbors.This kind of methods is based on the assumption that similar users have similar rating patterns.Unfortunately,due to the data sparsity problem,firstly,often there does neither exist a sufficient amount of similar neighbors,nor a sufficient amount of ratings of the particular item.The measurement of the similarity between users plays a fundamental role in user-based algorithms(Resnick et al.1994;Wang et al.2006;Jin et al.2004).Traditional methods of computing similarity,however,have two important shortcomings.Firstly,usually all items are treated the same when computing the similarity of users.This is addressed by(Jin et al. 2004),which assign different weights to items in order to allow for items to contribute in different strength to the user similarity calculation.The second problem is that the similarity of two users cannot be calculated if they have not rated any identical item.In other words, due to the data sparsity problem,the neighbors of active user cannot be found.To solve this problem,it seems promising to transitively examine whether the neighbors of the two users are similar.That means we should estimate similarities between any two users from a global perspective.In this paper,we address these two problems by proposing to divide user similarity into two parts,namely local user similarity and global user similarity.Local similarity is deter-mined based on surprisal-based vector similarity(SVS).In SVS,the rating of each item isfirstly modeled as a Laplacian random variable.Then the quantities of information(sur-prisal)contained in the ratings of a specific user will be used to represents his/her prefer-ence.The similarity of any two users’surprisal vector is defined as the local similarity of them.We will show that some of the ratings of the same item carry more discriminative information than others.Furthermore,we argue that less common ratings for a specific item tend to provide more discriminative information than the most common ratings.Second,the global similarity measures the similarity of two users by further considering the extent to which their neighbors are locally similar(using the local similarity).In this way,the global similarity takes the data sparsity problem in consideration by propagating similarity mea-surement.All local similarities of any two users represented as the weights of edges will be used to construct a user graph.The global similarity can be calculated as the maximin distance of any two nodes in the graph.Under global user similarity,two users become more similar if they can be connected through a series of locally similar neighbors.In brief,thelocal similarity attempts to accurately measure the similarity of two users’preference.The global similarity tries tofind more similar users when the data of user’s preference is sparse. On this basis,we propose a collaborativefiltering framework that employs both Local User Similarity and Global User Similarity(LS&GS).The major contributions of this paper are as follows.(1)We propose a novel method(SVS)to compute local user similarity.(2)We apply Maximin distance to capture global relationships of users to address the prob-lem of data sparsity.(3)A collaborativefiltering framework(LS&GS)is proposed to based on local user simi-larity and global user similarity.The remainder of our paper is organized as follows:Sect.2will introduce the necessary background and related work.In Sect.3,we will present the definition of the local user simi-larity and the global user similarity.Section4introduces the proposed collaborativefiltering framework.In Sect.5,the experimental results are provided,followed by the conclusions in Sect.6.2Notations and related workThere are two major classes of collaborativefiltering algorithms:memory-based and model-based approaches(Breese et al.1998).Memory-based algorithms make recommendations based on the entire user profile database.Model-based algorithms,in contrast,use a com-pact model which usually was previously learned from the user profile database to produce recommendations.In this section,we describe the most relevant existing approaches of memory-based al-gorithms and briefly introduce the model-based algorithms.First,we describe the notations that are used throughout this paper.Given a recommendation system consisting of M users and N items,there is a M×N user-item matrix R.Each entry r m,n=x represents the rating that user m gives to item n, where x∈{1,2,...,r max}.The default r m,n value,meaning that the rating is unknown,is0.The user-item matrix can be decomposed into row vectors:R=[u1,...,u M]T,u m=[r m,1,...,r m,N]T,m=1,...,M.The row vector u m represents the ratings of user m for all of N items.Alternatively,the matrix can also be represented by its column vectors:R=[i1,...,i N]T,i n=[r1,n,...,r M,n]T,n=1,...,N.The column vector i n represents the ratings of item m by all of M users.2.1Memory-based approachesMemory-based algorithms were applied successfully in various real-life applications(Her-locker et al.1999;Linden et al.2003).The major types of memory-based approaches are user-based approaches(Breese et al.1998)and item-based approaches(Linden et al.2003; Sarwar et al.2001).The former approaches form a heuristic implementation of the“Word of Mouth”phenomenon(Shardanand and Maes1995).The later one attempts to improve the scalability of collaborativefiltering er-based collaborativefiltering predictsan active user’s interest in a particular item based on rating information from similar user profiles(Breese et al.1998;Herlocker et al.1999;Resnick et al.1994).Each user profile corresponds to a row vector sorted in the user-item matrix.In detail,user-based approaches first calculate all similarities of any two row vectors.For predicting a user’s rating of a par-ticular item,a set of top-N similar users can be identified.Those top-N users’ratings for the item will be averaged as the prediction by weighted.Consequently,the predicted ratingˆr a,y of test item y by test user a is computed asˆr a,y= Kk=1w a,ukr uk,yk=1|w a,uk|where w a,uk denotes the similarity between the test user and his neighbors u k.Item-based approaches use the similarity between items instead of users.First,the simi-larity of items(column vectors in the user-item matrix)can be calculated.Then the unknown ratings can be predicted by averaging the ratings of other similar items rated by this activeuser.That isˆr a,y= Kk=1w y,ikr a,ik Kk=1|w y,ik|,where w y,ik indicates the similarity between the test item and the most similar items i k.Similarity computation methods,such as the Pearson Correlation Coefficient(PCC)al-gorithm(Resnick et al.1994)and the Vector Space Similarity(VS)algorithm(Breese et al. 1998)are applied in user-based and item-based methods.The PCC method defines the similarity between two users w up,u qasw up,u q ={i|r p,i,r q,i=0}(r p,i−¯r p)(r q,i−¯r q){i|r p,i,r q,i=0}(r p,i−¯r p)2·{i|r p,i,r q,i=0}(r q,i−¯r q)2,where¯r p denotes the mean of user p’s ratings.While the VS method defines the similarity asw up,u q ={i|r p,i,r q,i=0}r p,i r q,i{i|r p,i=0}r2p,i·{i|r q,i=0}r2q,i.2.2Model-based approachesThe model-based algorithms present good scalability once they have built the model.How-ever,the overhead introduced for building and updating the model should be counted in when evaluating this kind of algorithms.Various popular model-based algorithms exist,such as the aspect model(AM)(Hofmann and Puzicha1999),the Personality Diagnosis model (PD)(Pennock et al.2000)and the User Rating Profile model(URP)(Marlin2004a).The aspect model(Hofmann and Puzicha1999)is a probabilistic latent-space model, which models individual preferences to a convex combination of preference factors.The latent class variable is associated with each observation pair of a user and an item.The aspect model assumes that users and items are independent from each other given the latent class variable.However,the aspect model cannot perform inference on novel user profiles (Marlin2004b).In other words,in order to make predictions for novel users,AM has to be retrained based on the new training set,which should include the ratings of novel users.The Personality diagnosis approach(Pennock et al.2000)considers each user in the user-item matrix as an individual model.To predict the unknown rating of an item by an active user,PDfirst calculates the likelihood for the active user to be in the‘model’of each training user and then uses the aggregate average of ratings for the item by the training users as the estimator.The User Rating Profile model(Marlin2004a)is a generative,latent variable model which represents each user as a mixture of user attitudes,and the mixing proportions are distributed according to a Dirichlet random variable.URP is different from AM by making novel users’rating predication possible.3Local user similarity and global user similarityThe key to many memory-based approaches is to estimate the similarity between two users (Resnick et al.1994;Jin et al.2004).In this section,we willfirst introduce our method called surprisal-based vector space similarity to compute local user similarity.Then,addressing the data sparsity problem,global user similarity will be proposed.Global user similarity makes two users to become more similar if they can be connected through their locally similar neighbors.3.1Local user similarity(surprisal-based vector space similarity)The Pearson Correlation Coefficient(PCC)algorithm is widely applied in collaborative filtering algorithms to compute user similarity(Breese et al.1998;Linden et al.2003; Resnick et al.1994;Wang et al.2006;Sarwar et al.2001).Breese et al.(1998)proposed that items with similar ratings should have less important impact in determining user similarity than those with different ratings.They suggested using the Inverse User Frequency as the weights of items.Herlocker et al.(1999)adopted vari-ance weighting to improve PCC.The results turned out be to slightly worse than with no weighting(Herlocker et al.1999).The ratings of a specific item are usually centralized around an average attitude.In the PCC algorithm,if two users give an item the same rating,these two ratings will make the two users more similar.We argue that we need to additionally consider the difference between the rating and the average attitude.If the rating is close to the average attitude,the rating only represents that these two users act like most other people.Based on the rating we cannot conclude that the preferences of these two users are similar or dissimilar.On the other hand,if the rating is totally different from the average attitude,the rating will provide more discriminative information to determine whether their preferences are similar.Intuitively, a rarely given rating for an item will be extremely useful to help us distinguish the user which gives the unexpected rating from other users.For example,the movie“Godfather”is highly favored by lots of people.The fact that a user likes the movie tells us almost nothing about his/her preference.In contrast,if a user dislikes the movie and gives it a very low rating(i.e.,the kind of rating that is rare)for it,we can easily distinguish him/her from others and know something about his/her preference(e.g.,that he/she maybe dislike mafia movies).Although most users’ratings of a specific item are centralized around an average attitude, there still exist some users who give much higher(or lower)ratings than the average attitude. In other words,the distribution of the ratings has fat tails.To implement the intuition above, we modeled the rating of each item as Laplacian random variables Laplace(¯u i,b i)ratherthan Gaussian random variables.The probability density function of the Laplacian random variable isf(r|μ,b)=12bexp−|r−μ|b=12bexp(−μ−r)if r<μ,exp(−r−μb)if r>μ.Here,μis a location parameter and b>0is a scale parameter.Given M ratings,in-dependent and identically distributed samples r1,i,r2,i,...,r M,i,then using the maximum likelihood estimator,estimators ofμi and b i are expressed as(Norton1984)ˆμi=1MMp=1r p,i,ˆbi=1MMp=1r p,i−ˆμi.We propose a method for computing local user similarity based on the users’surprisal vector, rather than on the users’ratings er p’s surprisal vector S p is defined as following S p=[s p,1,...,s p,N]T=[sgn(r p,1−ˆμ1)∗I(r p,1),...,sgn(r p,N−ˆμN)∗I(r p,N)]T,p=1,...,M where sgn(r p,1−ˆμi)presents whether the attitude of user p about item i is positive or negative in comparison with the average attitude about the item,and I(r p,i)is the quantity of information(surprisal)of the rating r p,i.I(r p,i)is defined asI(r p,i)=−ln(f(r=r p,i|ˆμi,ˆb i))=ln(2ˆb i)+|r p,i−ˆμi|ˆbi.Given the users’surprisal vectors,we can adopt the Vector SPACE Similarity(VS)al-gorithm to calculate the user local similarity.We call this method surprisal-based vector similarity(SVS),which is defined assim L(u p,u q)={i|r p,i,r q,i=0}s p,i∗s q,i{i|r p,i,r q,i=0}s2p,i·{i|r p,i,r q,i=0}s2q,i.Ma et al.(2007)proposed to add a correlation significance weighting factor that would devalue similarity weights that were based on a small number of co-rated items,simL (u p,u q)=Min(|I up∩I uq|,γ)γsim L(u p,u q)where|I up ∩I uq|is the number of items which user u p and user u q rated in common.Ifthe number of co-rated items is smaller thanγ,the similarity of these users will be deval-ued.This change avoids overestimating the similarities of users who have rated a few items identically,but may not have similar overall preferences.The method is adopted to compute the local user similarity called surprisal-based vector similarity with significance weighting(SVSS).In this paper,we aim to emphasize that less common ratings for a specific item tend to provide more discriminative information than the most common ones.With regard to the choice of the distribution for modeling ratings,some sophisticated variations of the Lapla-cian distribution are available (Kotz et al.2001).3.2Global user similarityUnder this similarity,we can find more neighbors of an active user even when he/she has few immediate neighbors using local user similarity.To attain this,we first construct a user graph using the local similarity as the weight of edges.Then,we use the maximin distance of two users in the graph as the measurement of the global similarity between them.3.2.1User graphWe construct a user graph that describes their relationships,as follows.Definition 1(User graph)A user graph is an undirected weighted graph G =(U,E),where(1)U is the node set (each user is regarded as a node of the graph G );(2)E is the edge set.Associated with each edge e pq ∈E,w pq is a weight subject tow pq >0,w pq =w qp .In this paper,we employ local user similarity as the weights of edges,w pq = sim L (u p ,u q )if sim L (u p ,u q )>0,0else .3.2.2Maximin distance on user graphGiven a user graph G =(U,E),a path from node u p to u q (u p ,u q ∈U)is a sequence of links,P pq =(u p ,...,u i ,...,u q ),u p ,u i ,u q ∈U .If there are K paths between nodes u p and u q ,these paths will be indicated as P 1pq ,P 2pq ,...,P K pq .Given a path between u p and u qthe minimal hop distance of these nodes along any path P j pq is defined as follow:minimalhop j (u p ,u q )=min u i ,u i +1⊂P j pq w i,i +1,∀u i ,u i +1∈P j pq ,1≤j ≤k.The maximal value of the two nodes’minimal hop distance along any paths is called the maximin distance of the two nodes,maximinhop (u p ,u q )=max k =1,...,Kminimalhop k (u p ,u q )=max k =1,...,Kmin u i ,u i +1⊂P k ij w i,i +1 ,∀u i ,u i +1∈P k pq.The corresponding path is called as maximin path.The global similarity of two users is defined as the maximin distance between them:sim G (u p ,u q )=maximinhop (u p ,u q ).For any two users u p and u q,if sim G(u p,u q)=0,it means there are d users form-ing a sequence S={(u p,u1),...,(u d−1,u k),(u d,u q)},and∀(u i,u j)∈S,sim L(u i,u j)≥sim G(u p,u q).It can be interpreted as meaning that user u pfinds a similar user u q through u1,...,u k,while all of these are similar in sequence.From this we can derive the following propositions.Proposition1∀(u p,u q)∈U,sim G(u p,u q)≥0Proof∀u p,u q∈U,w pq≥0sim G(u p,u q)=maxk=1,...,Kminu i,u i+1⊂P k ijw i,i+1,∀u i,u i+1∈P k pq,sim G(u p,u q)≥0.Proposition2∀u p,u q∈U,sim G(u p,u q)≥sim L(u p,u q)Proof If∀u p,u q∈U,w u p,u q=sim L(u p,u q)>0.There is at least one path from u p to u q, (u p,u q).The minimalhop distance of the path(u p,u q),minimalhop i(u p,u q)=w up,u q=sim L(u p,u q),1≤i≤k,sim G(u p,u q)=maximinhop(u p,u q)maxk=1,...,Kminimalhop k(u p,u q)≥w pq=sim L(u p,u q).If∀u p,u q∈U,w u p,u q=0,then sim L(u p,u q)≤0.From Proposition1,sim G(u p,u q)≥simL(u p,u q).That is the global user similarity is non-negative and not less than the local user simi-larity.In addition,if the global user similarity between u p and u q is larger than their local similarity,it means there exists a path between them,along which any consecutive pair of nodes have larger local user similarity than sim L(u p,u q).In other words,two users become more similar because they can be connected through some locally more similar neighbors.The Floyd-Warshall algorithm can be adopted to effectively compute all-pairs maximin distances(Aho and Hopcroft1974;Cormen et al.1992).The complexity of this algorithm is O(N3).In practice,an efficient algorithm(Kim and Choi2007)based on message passing could be used to query the global similarity between a specific user u∗and the rest users, which exhibits a time complexity of O(N2).Previous work(Fouss et al.2007;Gori and Pucci2007)has investigated global similarity measures for collaborativefiltering.In(Fouss et al.2007)and(Gori and Pucci2007),col-laborativefiltering has been modeled as a bipartite graph,where nodes are users and items. These algorithms are random-walk based scoring algorithms,which can be used to rank items according to the active user’s preferences rather than to predict his/her explicit ratings on items.However,our method aims to quantify the active user preferences;in a result it provides more information to recommendation systems than those just ranking items based on the active user’s preferences.4The collaborativefiltering frameworkTaking both local and global users similarity into account,we propose the following col-laborativefiltering framework.To predict an active user’s(u a)rating on a particular item, under local similarity and global similarity wefirstfind his k nearest neighbors both for thek local nearest neighbors(nn kL (u a))and the k global nearest neighbors(nn kG(u a)).Then weemploy both nn kL (u a)and nn kG(u a)to predict the user’s ratingˆr a,i=(1−α)u k∈nn k L(u a)simL(u k,u a)r k,iu k∈nn k L(u a)simL(u k,u a)+αu k∈nn k G(u a)sim G(u k,u a)r k,iu k∈nn k G(u a)sim G(u k,u a).(1)The parameterαdetermines the extent to which the prediction relies on local user similar-ity and global user similarity.Withα=0,it indicates that the prediction depends completely on local user similarity and withα=1,it states that the prediction depends completely on global user similarity.αcan be determined experimentally by using cross-validation.5ExperimentsWe conducted several experiments to examine the performance of the proposed collaborative filtering framework(LU&GU),and address the following questions in particular:(1)How does our approach of computing the local user similarity compare with traditionalmethods?For this question,we employ PCC(Pearson Correlation Coefficient)(Resnick et al.1994),PCCS(Pearson Correlation Coefficient with significance weighting)(Ma et al.2007),SVS(surprisal-based vector similarity)and SVSS(surprisal-based vector similarity with significance weighting)as different methods to compute user similar-ity.Then we use these similarities in the traditional user-based collaborativefiltering (Resnick et al.1994)and compare the performance.(2)How does our collaborativefiltering framework compare with other algorithms?For thisquestion,we compare our method(LS&GS)with the user-base algorithm(Resnick et al.1994),the item-base algorithm(Sarwar et al.2001),the similarity fusion algorithm (SF)(Wang et al.2006)and the effective missing data prediction algorithm(EMDP) (Ma et al.2007).(3)How does the parameterαaffect the accuracy of prediction?Parameterαbalances howmuch the prediction takes into account local similarity and global similarity.We vary the value ofαfrom0to1to observe the differences in performance.5.1Experimental setupWe experimented with a popular database,the MovieLens1dataset by the GroupLens Re-search group at University of Minnesota.The MovieLens data set contains100,000ratings (1–5scales)from943users on1682movies(items),where each user has rated at least20 movies.To compare algorithms more thoroughly,we conducted the experiments under sev-eral configurations.We randomly exacted a subset of500users,altered the training size to be300(200,100)users in the subset,and used the remaining200(300,400)users 1/.as the active users.The respective sets were named MovieLens300,MovieLens200and MovieLens100.As for the ratings from the active users,we varied the number of rat-ings provided by the active users from5,10,and20,naming them Given5,Given10and Given20,respectively.This results in9configurations in total,which we call M300G20, M300G10,M300G5,M200G20,M200G10,M200G5,M100G20,M100G10and M200G5. Different configuration represents different training data sparsity and test item of active user sparsity.These protocols are widely adopted(Wang et al.2006;Ma et al.2007; Xue et al.2005).Furthermore,we also adopted the protocol of“All-but-one”(Breese et al.1998),within which we extracted a single randomly selected rating for each user in the whole data set,and tried to predict its value given all the other ratings the user has voted on. The protocol is also widely adopted(Marlin2004a,2004b;DeCoste2006).In order to examine the performance of our approach and to compare it with experiments reported in the literature,e.g.(Resnick et al.1994;Wang et al.2006;Sarwar et al.2001; Ma et al.2007),we adopted the mean absolute error(MAE)(Sarwar et al.2001).The MAE is computed byfirst summing the absolute errors of the N corresponding ratings-prediction pairs and then averaging the sum.Formally,MAE= Ni=1|r i−ˆr i|N.A smaller value of MAE indicates a better accuracy.5.2Surprisal-based vector similarityIn order to examine the performance of SVSS and SVS,we compared our methods of computing the user similarity with other traditional methods,PCC and PCCS.We used these methods in the traditional user-based collaborativefiltering and compared their per-formance.The parameterγ(used in SVSS and PCCS)of the significance weighting was set to20.We compared SVSS and SVS with other methods in all experimental configurations. The number of nearest neighbors in user-based collaborativefiltering was set as35in all configurations.The results are presented in Table1and Table2.We can see that:(1)SVSS and SVS outperform the other methods in all configurations.(2)The performance of SVSS and SVS improves with the number of items rated by theusers.(3)Significance Weighting improves SVS much more than PCC except in the All-but-oneprotocol.The reason for that is SVS can get more accurate contributions of each rating to the value of similarity than PCC ing significance weighting amplifies the influence.Next,in order to examine the sensitivity of the neighborhood size,we performed an experiment where we varied the number of nearest neighbors that were used and computed the MAE for each variation.In this article,we report only the results for the configurations M100G5and M300G20,however,the other configurations yield similar results.The results are shown in Fig.1.We can observe that the size of neighborhood does affect the performance.Both SVS and SVSS improve the accuracy of prediction as the neighborhood size increases from5to15. For greater values,the curveflattens.Again,SVS and SVSS outperform the other methods.。
