基于Spark的并行ALS协同过滤算法研究

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

∗
并存储计算过程中产生的大量特征矩阵。 Hadoop 的 HA （高可用性）用来解决 HDFS 分布式文件系统的 NameNode 单点故障法在 Spark 上的并行化运行；通过和基于 Hadoop 的 MapReduce 思想的 ALS 协同过滤算法在 Netflix 数据集上的比对实验表明，基于 Spark 平台的 ALS 协同过滤算法的并行化计算效率有明显提升，并且更适合处理海量数据。关键词中图分类号 ALS；协同过滤；矩阵分解； High Available； Spark TP301 DOI： 10. 3969/j. issn. 1672-9722. 2017. 11. 024
［3］［1］
二类是信息过滤技术 -推荐系统。区别在于搜索
is calculated by a combination of a large number of user rating data，and stored the calculation process of a large number of charac⁃ teristic matrix. Hadoop-HA（High Available）is used to solve the problem of the single point of failure of the NameNode. The Spark is a computing framework based on new type of large data come up with distributed memory， at the same time it has excellent comput⁃ orative filtering algorithm based on the Spark of parallel operation. Through the comparation experiments（the ALS collaborative fil⁃ tering algorithm based on Hadoop graphs thought and the Netflix data set）， the study based on Spark platform of parallel computation is more efficiency. It is more suitable for processing huge amounts of data. Key Words Class Number TP301 ALS， collaborative filtering， Matrix decomposition， High Available， Spark ing performance. This study uses the QJM（Quorum Journal Manager） to construct the HA Hadoop big data platform. In this study， uses the ALS collaborative filtering algorithm with the spark coding Framework， at the same time， this study realizes the ALS collab⁃
Research on Parallel Als Algorithm Based on Spark
（Kunming University of Science and Technology， School of Information Engineering and Automation， Kunming Abstract HOU Jingru WU Sheng LI Yingna 650500）
ALS （最小二乘法）协同过滤推荐算法是通过矩阵分解进行推荐，它通过综合大量的用户评分数据进行计算，
问题。 Spark 是一种基于内存的新型分布式大数据计算框架，具有优异的计算性能。文章基于 QJM （Quorum Journal Manag⁃ er）构建了 HA 下的 Hadoop 大数据平台，并在 Spark 计算框架基础上研究使用 ALS 协同过滤算法，实现基于 ALS 协同过滤算
总第 337 期
பைடு நூலகம்
2017 年第 11 期
计算机与数字工程 Computer & Digital Engineering 计算机与数字工程
Vol. 45 No. 11 2197
基于 Spark 的并行 ALS 协同过滤算法研究
侯敬儒吴晟李英娜
昆明 650500）（昆明理工大学信息工程与自动化学院摘要
零售等新兴业态形式下，互联网用户飞速增长以及互联网科技迅猛发展，在以用户为中心的信息生产模式下，互联网信息爆炸式增长，不仅数据量越来越大，数据类型也越来越大，人们正面临着严重的 “信息过载” 问题［2］。目前，解决该问题的技术主要分为两类，第一类是信息检索技术-搜索引擎，第
ALS（least square）is a collaborative filtering recommendation algorithm recommended by matrix decomposition， it
1
引言
当前，整个世界已经迎来了大数据时代，在新
引擎依赖用户对信息的准确描述，而推荐系统则是以用户历史行为和数据为基点，建立相关数据模型从而挖掘出用户需求和兴趣，从而以此为依据从海量的信息中为用户筛选出用户感兴趣的信息。由此可见，在用户需求不明确时，推荐系统的作用显得尤为重要。推荐系统中的协同过滤推荐技术简单、高效，得到了业界广泛的认同和应用。然而协同过滤技术也有缺点和不足，例如可扩展的问题、数据稀疏的问题、冷启动的问题，这些问题往往会导致推荐