数据可视化分析外文文献
python 电影数据可视化英文文献
![python 电影数据可视化英文文献](https://img.taocdn.com/s3/m/151c79130166f5335a8102d276a20029bc646351.png)
Python电影数据可视化英文文献IntroductionIn recent years, with the advancement of technology and the popularity of movie streaming platforms, the amount of movie data available has increased exponentially. To make sense of this vast amount of data, it is crucial to analyze and visualize it effectively. Python, with its extensive libraries and packages, provides a powerful tool for movie data visualization. In this article, we will explore various techniques and libraries in Python for visualizing movie data.Libraries for Movie Data VisualizationPython offers several libraries that are specifically designed for data visualization. Below are some of the most commonly used ones:1. MatplotlibMatplotlib is a widely used plotting library in Python. It provides a variety of functions to create static, animated, and interactive visualizations. With Matplotlib, we can create bar charts, line graphs, scatter plots, histograms, and more.2. SeabornSeaborn is built on top of Matplotlib and provides a high-levelinterface for creating aesthetically pleasing and informativestatistical graphics. It offers additional features like automatic color palette selection, easy manipulation of plot aesthetics, and integration with Pandas data frames.3. PlotlyPlotly is a library that specializes in creating interactive visualizations. It provides a rich set of tools to create and shareinteractive plots, dashboards, and data applications. Plotly supports a wide range of plot types, including 3D plots, maps, and animations.4. BokehBokeh is another library that focuses on interactivity. It isspecifically designed for creating interactive visualizations for the web. Bokeh supports both static and dynamic plots, and it can handle large and streaming datasets efficiently.Techniques for Movie Data VisualizationNow that we have an understanding of the libraries available, let’s explore some techniques for visualizing movie data using Python.1. Ratings DistributionOne of the fundamental insights we can gain from movie data is the distribution of ratings. We can use bar charts or histograms tovisualize the ratings data and understand the overall distribution. Additionally, we can compare the ratings distribution across different genres or time periods.2. Box Office PerformanceBox office performance is another crucial aspect of movie data. We can use bar charts or line graphs to visualize the revenue generated by movies over time or across different regions. This visualization can help us identify trends and patterns in box office performance.3. Genre PopularityMovies are often categorized into different genres, such as action, comedy, drama, etc. We can create pie charts or bar charts to visualize the popularity of different movie genres. This visualization can provide insights into audience preferences and help filmmakers make informed decisions.4. Movie RecommendationsBy analyzing movie ratings and user preferences, we can create a recommendation system that suggests similar movies to users. We can visualize these recommendations using network graphs or chord diagrams to show the relationships between movies based on genre, actors, or directors.ConclusionPython provides a wide array of libraries and techniques for visualizing movie data. With the help of libraries like Matplotlib, Seaborn, Plotly, and Bokeh, we can create informative and visually appealing visualizations. By visualizing movie data, we can gain insights into ratings distribution, box office performance, genre popularity, and even create movie recommendation systems. Utilizing these techniques can aid filmmakers, movie analysts, and enthusiasts in making data-driven decisions and understand the movie industry better.Overall, Python, with its ease of use and extensive libraries, is an excellent choice for movie data visualization. With the power of Python, we can unlock the hidden patterns and trends in movie data, leading to a deeper understanding of the movie industry.。
数据可视化论文
![数据可视化论文](https://img.taocdn.com/s3/m/ae65fb17bc64783e0912a21614791711cd797957.png)
数据可视化论文以下是几篇关于数据可视化的论文:1. "The Value of Visualization in Scientific Computing" - 作者:Kenneth I. Joy等人。
该论文探讨了科学计算中数据可视化的价值和应用。
论文中提出了一些数据可视化的应用方法,并分析了可视化在科学计算中的优势和局限性。
2. "A Taxonomy of Tools for Data Visualization" - 作者:Martin Wattenberg等人。
该论文提出了一个数据可视化工具的分类体系,包括了不同类型的可视化工具以及它们的特点和应用场景。
论文还讨论了数据可视化工具的未来发展方向。
3. "Interactive Data Visualization for the Web" - 作者:Scott Murray。
该论文介绍了在Web上进行交互式数据可视化的方法和技术。
论文中详细讲解了使用JavaScript和D3.js库创建交互式可视化的步骤和技巧。
4. "Visual Analysis of Large and Heterogeneous Networks" - 作者:Tamara Munzner。
该论文讨论了大规模和异构网络数据的可视化分析方法。
论文介绍了一些针对不同类型网络数据的可视化技术,并提出了解决挑战的方法和工具。
5. "The Grammar of Graphics" - 作者:Leland Wilkinson等人。
该论文提出了一种数据可视化的通用描述语言,用于描述和生成各种类型的可视化图表。
论文中介绍了这种语言的基本元素和规则,并提供了实际案例和应用示例。
这些论文涵盖了数据可视化的不同方面和应用场景,可以帮助读者了解和深入研究该领域的相关知识。
基于Web of Science及CNKI数据库可视化分析国际近30年牛科学研究概况
![基于Web of Science及CNKI数据库可视化分析国际近30年牛科学研究概况](https://img.taocdn.com/s3/m/7b3f67a76429647d27284b73f242336c1eb93030.png)
专 论中国牛业科学2020,46(6) :24-34Chona Ca t lSconc%基于Web of Sciencc 及CNKI 数据库可视化分析国际近30年牛科学研究概况毕 谊1**,何祎雯2*,何礼邦1,张 琪】,韦青侠1,收稿日期#2020-08-07 修回日期:2020-08-15基金项目:国家自然科学基金(面上项目)(No.31872331)和国家肉牛柱牛产业技术体系(CARS -37)作者简介:毕谊* (1999—),女,在读生,动物科学专业,何祎雯和毕谊为同等贡献者*何祎雯(1990—),女,助理馆员,硕士,主要从事期刊评价和学科服研究** *通讯作者:陈宏(1955—),男,教授,博士,主要从事动物遗传育种研究蓝贤勇(1979—),男,教授,博士,主要从事动物遗传育种研究*唐琦J 雷初朝J 陈宏1**,蓝贤勇1**(1.西北农林科技大学动物科技学院,陕西杨凌712100; 2.西北农林科技大学图书馆,陕西杨凌712100)摘 要:本文通过对1992 -2019年期间Web of Science 核心合集数据库以及中国CNKI 数据库收录的牛研究论文进行产出分析,了解全球范围内牛科学研究概况。
采用文献计量学 方法,以“牛”为检索词进行标题检索(共检索到SCI 论文75507篇,中文论文29915篇),并结合GraphPad Prism 8.4.2软件,分析近30年全球牛科学领域论文的数量变化、所属国家、 机构、作者、期刊、研究方向、热门学科、关键词等信息。
经统计分析发现,美国在牛科学研究 领域优势明显,各高质量期刊、论文也大多源于美国。
中国牛科学研究在2010年后开始进入飞速发展阶段,近10年论文发表数量位居全球第二,其中,西北农林科技大学、中国农业科学院贡献突出。
此外,兽医学、繁殖生物学、生物技术与应用微生物学为全球牛科学领域的主要研究方向,中国在牛主题研究方向上与国际接轨。
基于CiteSpace可视化分析的茶多酚研究进展
![基于CiteSpace可视化分析的茶多酚研究进展](https://img.taocdn.com/s3/m/69c28e09f6ec4afe04a1b0717fd5360cba1a8d1d.png)
摘 要 在 WOS和 CNKI数据库的基础上,本论文收集了 2012—2022年期间该领域的相关文献资料,并利
用可视化软件 CiteSpace所具有的计量经济学功能,对茶多酚核心作者、发文机构、研究热点等方面进行知识图谱绘 制。本文以国内外相关茶多酚的资料为基础,对国内外茶多酚的发展现状、前沿和热点进行了全面的分析,为提高 跟踪科学研究前沿的工作效率,更全面地获得有关资料,进一步为茶多酚的研究、生产和应用提供参考。
份资料[7-12]。可以看出,这一领域的论文数量总体 呈增长趋势,意味着关于茶多酚的研究越来越受到 人们的重视,图表是基于最近 10年 CNKI和 WOS 的学术刊物发表数量的折线,数量越多,就表示对这 方面的研究越深入,同时表明茶多酚这一研究词汇 也就越来越受到关注。茶多酚于 1986年被发现,由 于技 术 及 器 械 受 限,无 法 进 行 大 量 的 研 究,直 到 1992年,学术界 才 开 始 关 注 茶 叶 中 的 茶 多 酚;如 图 2、图 3所示,对于茶多酚的研究分析从 2012年开 始,大致分 为 二 个 阶 段:第 一 阶 段 为 迅 速 发 展 阶 段 (2012—2016年),该阶段是根据食品保存检测的要 求,发现茶多酚有保鲜功能,因此该阶段科研产出较 快,处 于 上 升 阶 段;第 二 阶 段 为 平 稳 发 展 阶 段 (2017—2022年),基本上保持着 85篇左右的发文 量,表示这几年来一直都在进行着相关的学术研究。 近十年来的外文文献也基本保持稳定快速发展,一 直保持稳定上升趋势,关注度基本持平。人们对于 茶多酚的研究发展从未中断过,主要是得益于茶多 酚越来越多的保健功能被发掘出来 。 [13] 2.2 空间分布特征 2.2.1 作者分布 本文从作者与核心作处理流程
数据库英文参考文献(最新推荐120个)
![数据库英文参考文献(最新推荐120个)](https://img.taocdn.com/s3/m/0a85a57f0029bd64793e2c31.png)
由于我国经济的高速发展,计算机科学技术在当前各个科技领域中迅速发展,成为了应用最广泛的技术之一.其中数据库又是计算机科学技术中发展最快,应用最广泛的重要分支之一.它已成为计算机信息系统和计算机应用系统的重要技术基础和支柱。
下面是数据库英文参考文献的分享,希望对你有所帮助。
数据库英文参考文献一:[1]Nú?ez Matías,Weht Ruben,Nú?ez Regueiro Manuel. Searching for electronically two dimensional metals in high-throughput ab initio databases[J]. Computational Materials Science,2020,182.[2]Izabela Karsznia,Marta Przychodzeń,Karolina Sielicka. Methodology of the automatic generalization of buildings, road networks, forests and surface waters: a case study based on the Topographic Objects Database in Poland[J]. Geocarto International,2020,35(7).[3]Alankrit Chaturvedi. Secure Cloud Migration Challenges and Solutions[J]. Journal of Research in Science and Engineering,2020,2(4).[4]Ivana Nin?evi? Pa?ali?,Maja ?uku?i?,Mario Jadri?. Smart city research advances in Southeast Europe[J]. International Journal of Information Management,2020.[5]Jongseong Kim,Unil Yun,Eunchul Yoon,Jerry Chun-Wei Lin,Philippe Fournier-Viger. One scan based high average-utility pattern mining in static and dynamic databases[J]. Future Generation Computer Systems,2020.[6]Jo?o Peixoto Martins,António Andrade-Campos,Sandrine Thuillier. Calibration of Johnson-Cook Model Using Heterogeneous Thermo-Mechanical Tests[J]. Procedia Manufacturing,2020,47.[7]Anna Soriani,Roberto Gemignani,Matteo Strano. A Metamodel for the Management of Large Databases: Toward Industry 4.0 in Metal Forming[J]. Procedia Manufacturing,2020,47.[8]Ayman Elbadawi,Karim Mahmoud,Islam Y. Elgendy,Mohammed Elzeneini,Michael Megaly,Gbolahan Ogunbayo,Mohamed A. Omer,Michelle Albert,Samir Kapadia,Hani Jneid. Racial disparities in the utilization and outcomes of transcatheter mitral valve repair: Insights from a national database[J]. Cardiovascular Revascularization Medicine,2020.[9]Maurizio Boccia,Antonio Sforza,Claudio Sterle. Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches[J]. INFORMS Journal on Computing,2020.[10]. Inc.; Patent Issued for Systems And User Interfaces For Dynamic Access Of Multiple Remote Databases And Synchronization Of Data Based On User Rules (USPTO 10,628,448)[J]. Computer Technology Journal,2020.[11]. Bank of America Corporation; Patent Issued for System For Electronic Data Verification, Storage, And Transfer (USPTO 10,628,058)[J]. Computer Technology Journal,2020.[12]. Information Technology - Database Management; Data from Technical University Munich (TU Munich) Advance Knowledge in Database Management (Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines)[J]. Computer Technology Journal,2020.[13]. Information Technology - Database Management; Studies from Pontifical Catholic University Update Current Data on Database Management (General dynamic Yannakakis: conjunctive queries with theta joins under updates)[J]. Computer Technology Journal,2020.[14]Kimothi Dhananjay,Biyani Pravesh,Hogan James M,Soni Akshay,Kelly Wayne. Learning supervised embeddings for large scale sequence comparisons.[J]. PloS one,2020,15(3).[15]. Information Technology; Studies from University of California San Diego (UCSD) Reveal New Findings on Information Technology (A Physics-constrained Data-driven Approach Based On Locally Convex Reconstruction for Noisy Database)[J]. Information Technology Newsweekly,2020.[16]. Information Technology; Researchers from National Institute of Information and Communications Technology Describe Findings in Information Technology (Efficient Discovery of Weighted Frequent Neighborhood Itemsets in Very Large Spatiotemporal Databases)[J]. Information Technology Newsweekly,2020.[17]. Information Technology; Investigators at Gdansk University of Technology Report Findings in Information Technology (A Framework for Accelerated Optimization of Antennas Using Design Database and Initial Parameter Set Estimation)[J]. Information Technology Newsweekly,2020.[18]. Information Technology; Study Results from Palacky University Update Understanding of Information Technology (Evaluation of Replication Mechanisms on Selected Database Systems)[J]. Information Technology Newsweekly,2020.[19]Runfola Daniel,Anderson Austin,Baier Heather,Crittenden Matt,Dowker Elizabeth,Fuhrig Sydney,Goodman Seth,Grimsley Grace,Layko Rachel,MelvilleGraham,Mulder Maddy,Oberman Rachel,Panganiban Joshua,Peck Andrew,Seitz Leigh,Shea Sylvia,Slevin Hannah,Youngerman Rebecca,Hobbs Lauren. geoBoundaries: A global database of political administrative boundaries.[J]. PloS one,2020,15(4).[20]Dupré Damien,Krumhuber Eva G,Küster Dennis,McKeown Gary J. A performance comparison of eight commercially available automatic classifiers for facial affect recognition.[J]. PloS one,2020,15(4).[21]Partha Pratim Banik,Rappy Saha,Ki-Doo Kim. An Automatic Nucleus Segmentation and CNN Model based Classification Method of White Blood Cell[J]. Expert Systems With Applications,2020,149.[22]Hang Dong,Wei Wang,Frans Coenen,Kaizhu Huang. Knowledge base enrichment by relation learning from social tagging data[J]. Information Sciences,2020,526.[23]Xiaodong Zhao,Dechang Pi,Junfu Chen. Novel trajectory privacy-preserving method based on clustering using differential privacy[J]. Expert Systems With Applications,2020,149.[24]. Information Technology; Researchers at Beijing University of Posts and Telecommunications Have Reported New Data on Information Technology (Mining top-k sequential patterns in transaction database graphs)[J]. Internet Weekly News,2020.[25]Sunil Kumar Sharma. An empirical model (EM: CCO) for clustering, convergence and center optimization in distributive databases[J]. Journal of Ambient Intelligence and Humanized Computing,2020(prepublish).[26]Naryzhny Stanislav,Klopov Nikolay,Ronzhina Natalia,Zorina Elena,Zgoda Victor,Kleyst Olga,Belyakova Natalia,Legina Olga. A database for inventory of proteoform profiles: "2DE-pattern".[J]. Electrophoresis,2020.[27]Noel Varela,Jesus Silva,Fredy Marin Gonzalez,Pablo Palencia,Hugo Hernandez Palma,Omar Bonerge Pineda. Method for the Recovery of Images in Databases of Rice Grains from Visual Content[J]. Procedia Computer Science,2020,170.[28]Ahmad Rabanimotlagh,Prabhu Janakaraj,Pu Wang. Optimal Crowd-Augmented Spectrum Mapping via an Iterative Bayesian Decision Framework[J]. Ad Hoc Networks,2020.[29]Ismail Boucherit,Mohamed Ould Zmirli,Hamza Hentabli,Bakhtiar Affendi Rosdi. Finger vein identification using deeply-fused Convolutional Neural Network[J]. Journal of King Saud University - Computer and Information Sciences,2020.[30]Sachin P. Patel,S.H. Upadhyay. Euclidean Distance based Feature Ranking andSubset Selection for Bearing Fault Diagnosis[J]. Expert Systems With Applications,2020.[31]Julia Fomina,Denis Safikanov,Alexey Artamonov,Evgeniy Tretyakov. Parametric and semantic analytical search indexes in hieroglyphic languages[J]. Procedia Computer Science,2020,169.[32]Selvine G. Mathias,Sebastian Schmied,Daniel Grossmann. An Investigation on Database Connections in OPC UA Applications[J]. Procedia Computer Science,2020,170.[33]Abdourrahmane Mahamane Atto,Alexandre Benoit,Patrick Lambert. Timed-image based deep learning for action recognition in video sequences[J]. Pattern Recognition,2020.[34]Yonis Gulzar,Ali A. Alwan,Abedallah Zaid Abualkishik,Abid Mehmood. A Model for Computing Skyline Data Items in Cloud Incomplete Databases[J]. Procedia Computer Science,2020,170.[35]Xiaohan Yang,Fan Li,Hantao Liu. Deep feature importance awareness based no-reference image quality prediction[J]. Neurocomputing,2020.[36]Dilana Hazer-Rau,Sascha Meudt,Andreas Daucher,Jennifer Spohrs,Holger Hoffmann,Friedhelm Schwenker,Harald C. Traue. The uulmMAC Database—A Multimodal Affective Corpus for Affective Computing in Human-Computer Interaction[J]. Sensors,2020,20(8).[37]Tomá? Pohanka,Vilém Pechanec. Evaluation of Replication Mechanisms on Selected Database Systems[J]. ISPRS International Journal of Geo-Information,2020,9(4).[38]Verheggen Kenneth,Raeder Helge,Berven Frode S,Martens Lennart,Barsnes Harald,Vaudel Marc. Anatomy and evolution of database search engines-a central component of mass spectrometry based proteomic workflows.[J]. Mass spectrometry reviews,2020,39(3).[39]Moscona Leon,Casta?eda Pablo,Masrouha Karim. Citation analysis of the highest-cited articles on developmental dysplasia of the hip.[J]. Journal of pediatric orthopedics. Part B,2020,29(3).[40]Nasseh Daniel,Schneiderbauer Sophie,Lange Michael,Schweizer Diana,Heinemann Volker,Belka Claus,Cadenovic Ranko,Buysse Laurence,Erickson Nicole,Mueller Michael,Kortuem Karsten,Niyazi Maximilian,Marschner Sebastian,Fey Theres. Optimizing the Analytical Value of Oncology-Related Data Based on an In-Memory Analysis Layer: Development and Assessment of the Munich OnlineComprehensive Cancer Analysis Platform.[J]. Journal of medical Internet research,2020,22(4).数据库英文参考文献二:[41]Meiling Chai,Changgeng Li,Hui Huang. A New Indoor Positioning Algorithm of Cellular and Wi-Fi Networks[J]. Journal of Navigation,2020,73(3).[42]Mandy Watson. How to undertake a literature search: a step-by-step guide[J]. British Journal of Nursing,2020,29(7).[43]. Patent Application; "Memorial Facility With Memorabilia, Meeting Room, Secure Memorial Database, And Data Needed For An Interactive Computer Conversation With The Deceased" in Patent Application Approval Process (USPTO 20200089455)[J]. Computer Technology Journal,2020.[44]. Information Technology; Data on Information Technology Detailed by Researchers at Complutense University Madrid (Hr-sql: Extending Sql With Hypothetical Reasoning and Improved Recursion for Current Database Systems)[J]. Computer Technology Journal,2020.[45]. Science - Metabolomics; Study Data from Wake Forest University School of Medicine Update Knowledge of Metabolomics (Software tools, databases and resources in metabolomics: updates from 2018 to 2019)[J]. Computer Technology Journal,2020.[46]. Sigma Computing Inc.; Researchers Submit Patent Application, "GeneratingA Database Query To Dynamically Aggregate Rows Of A Data Set", for Approval (USPTO 20200089796)[J]. Computer Technology Journal,2020.[47]. Machine Learning; Findings on Machine Learning Reported by Investigators at Tongji University (Comparing Machine Learning Algorithms In Predicting Thermal Sensation Using Ashrae Comfort Database Ii)[J]. Computer Technology Journal,2020.[48]. Sigma Computing Inc.; "Generating A Database Query Using A Dimensional Hierarchy Within A Graphical User Interface" in Patent Application Approval Process (USPTO 20200089794)[J]. Computer Technology Journal,2020.[49]Qizhi He,Jiun-Shyan Chen. A physics-constrained data-driven approach based on locally convex reconstruction for noisy database[J]. Computer Methods in Applied Mechanics and Engineering,2020,363.[50]José A. Delgado-Osuna,Carlos García-Martínez,JoséGómez-Barbadillo,Sebastián Ventura. Heuristics for interesting class association rule mining a colorectal cancer database[J]. Information Processing andManagement,2020,57(3).[51]Edival Lima,Thales Vieira,Evandro de Barros Costa. Evaluating deep models for absenteeism prediction of public security agents[J]. Applied Soft Computing Journal,2020,91.[52]S. Fareri,G. Fantoni,F. Chiarello,E. Coli,A. Binda. Estimating Industry 4.0 impact on job profiles and skills using text mining[J]. Computers in Industry,2020,118.[53]Estrela Carlos,Pécora Jesus Djalma,Dami?o Sousa-Neto Manoel. The Contribution of the Brazilian Dental Journal to the Brazilian Scientific Research over 30 Years.[J]. Brazilian dental journal,2020,31(1).[54]van den Oever L B,Vonder M,van Assen M,van Ooijen P M A,de Bock G H,Xie X Q,Vliegenthart R. Application of artificial intelligence in cardiac CT: From basics to clinical practice.[J]. European journal of radiology,2020,128.[55]Li Liu,Deborah Silver,Karen Bemis. Visualizing events in time-varying scientific data[J]. Journal of Visualization,2020,23(2–3).[56]. Information Technology - Database Management; Data on Database Management Discussed by Researchers at Arizona State University (Architecture of a Distributed Storage That Combines File System, Memory and Computation In a Single Layer)[J]. Information Technology Newsweekly,2020.[57]. Information Technology - Database Management; New Findings from Guangzhou Medical University Update Understanding of Database Management (GREG-studying transcriptional regulation using integrative graph databases)[J]. Information Technology Newsweekly,2020.[58]. Technology - Laser Research; Reports from Nicolaus Copernicus University in Torun Add New Data to Findings in Laser Research (Nonlinear optical study of Schiff bases using Z-scan technique)[J]. Journal of Technology,2020.[59]Loeffler Caitlin,Karlsberg Aaron,Martin Lana S,Eskin Eleazar,Koslicki David,Mangul Serghei. Improving the usability and comprehensiveness of microbial databases.[J]. BMC biology,2020,18(1).[60]Caitlin Loeffler,Aaron Karlsberg,Lana S. Martin,Eleazar Eskin,David Koslicki,Serghei Mangul. Improving the usability and comprehensiveness of microbial databases[J]. BMC Biology,2020,18(1).[61]Dean H. Barrett,Aderemi Haruna. Artificial intelligence and machine learningfor targeted energy storage solutions[J]. Current Opinion in Electrochemistry,2020,21.[62]Chenghao Sun. Research on investment decision-making model from the perspective of “Internet of Things + Big data”[J]. Future Generation Computer Systems,2020,107.[63]Sa?a Adamovi?,Vladislav Mi?kovic,Nemanja Ma?ek,Milan Milosavljevi?,Marko ?arac,Muzafer Sara?evi?,Milan Gnjatovi?. An efficient novel approach for iris recognition based on stylometric features and machine learning techniques[J]. Future Generation Computer Systems,2020,107.[64]Olivier Pivert,Etienne Scholly,Grégory Smits,Virginie Thion. Fuzzy quality-Aware queries to graph databases[J]. Information Sciences,2020,521.[65]Javier Fernando Botía Valderrama,Diego José Luis Botía Valderrama. Two cluster validity indices for the LAMDA clustering method[J]. Applied Soft Computing Journal,2020,89.[66]Amer N. Kadri,Marie Bernardo,Steven W. Werns,Amr E. Abbas. TAVR VS. SAVR IN PATIENTS WITH CANCER AND AORTIC STENOSIS: A NATIONWIDE READMISSION DATABASE REGISTRY STUDY[J]. Journal of the American College of Cardiology,2020,75(11).[67]. Information Technology; Findings from P. Sjolund and Co-Authors Update Knowledge of Information Technology (Whole-genome sequencing of human remains to enable genealogy DNA database searches - A case report)[J]. Information Technology Newsweekly,2020.[68]. Information Technology; New Findings from P. Yan and Co-Researchers in the Area of Information Technology Described (BrainEXP: a database featuring with spatiotemporal expression variations and co-expression organizations in human brains)[J]. Information Technology Newsweekly,2020.[69]. IDERA; IDERA Database Tools Expand Support for Cloud-Hosted Databases[J]. Information Technology Newsweekly,2020.[70]Adrienne Warner,David A. Hurley,Jonathan Wheeler,Todd Quinn. Proactive chat in research databases: Inviting new and different questions[J]. The Journal of Academic Librarianship,2020,46(2).[71]Chidentree Treesatayapun. Discrete-time adaptive controller based on IF-THEN rules database for novel architecture of ABB IRB-1400[J]. Journal of the Franklin Institute,2020.[72]Tian Fang,Tan Han,Cheng Zhang,Ya Juan Yao. Research and Construction of the Online Pesticide Information Center and Discovery Platform Based on Web Crawler[J]. Procedia Computer Science,2020,166.[73]Dinusha Vatsalan,Peter Christen,Erhard Rahm. Incremental clustering techniques for multi-party Privacy-Preserving Record Linkage[J]. Data & Knowledge Engineering,2020.[74]Ying Xin Liu,Xi Yuan Li. Design and Implementation of a Business Platform System Based on Java[J]. Procedia Computer Science,2020,166.[75]Akhilesh Kumar Bajpai,Sravanthi Davuluri,Kriti Tiwary,Sithalechumi Narayanan,Sailaja Oguru,Kavyashree Basavaraju,Deena Dayalan,Kavitha Thirumurugan,Kshitish K. Acharya. Systematic comparison of the protein-protein interaction databases from a user's perspective[J]. Journal of Biomedical Informatics,2020,103.[76]P. Raveendra,V. Siva Reddy,G.V. Subbaiah. Vision based weed recognition using LabVIEW environment for agricultural applications[J]. Materials Today: Proceedings,2020,23(Pt 3).[77]Christine Rosati,Emily Bakinowski. Preparing for the Implementation of an Agnis Enabled Data Reporting System and Comprehensive Research Level Data Repository for All Cellular Therapy Patients[J]. Biology of Blood and Marrow Transplantation,2020,26(3).[78]Zeiser Felipe André,da Costa Cristiano André,Zonta Tiago,Marques Nuno M C,Roehe Adriana Vial,Moreno Marcelo,da Rosa Righi Rodrigo. Segmentation of Masses on Mammograms Using Data Augmentation and Deep Learning.[J]. Journal of digital imaging,2020.[79]Dhaked Devendra K,Guasch Laura,Nicklaus Marc C. Tautomer Database: A Comprehensive Resource for Tautomerism Analyses.[J]. Journal of chemical information and modeling,2020,60(3).[80]Pian Cong,Zhang Guangle,Gao Libin,Fan Xiaodan,Li Fei. miR+Pathway: the integration and visualization of miRNA and KEGG pathways.[J]. Briefings in bioinformatics,2020,21(2).数据库英文参考文献三:[81]Marcello W. M. Ribeiro,Alexandre A. B. Lima,Daniel Oliveira. OLAP parallel query processing in clouds with C‐ParGRES[J]. Concurrency and Computation: Practice and Experience,2020,32(7).[82]Li Gao,Peng Lin,Peng Chen,Rui‐Zhi Gao,Hong Yang,Yun He,Jia‐Bo Chen,Yi ‐Ge Luo,Qiong‐Qian Xu,Song‐Wu Liang,Jin‐Han Gu,Zhi‐Guang Huang,Yi‐Wu Dang,Gang Chen. A novel risk signature that combines 10 long noncoding RNAs to predict neuroblastoma prognosis[J]. Journal of Cellular Physiology,2020,235(4).[83]Julia Krzykalla,Axel Benner,Annette Kopp‐Schneider. Exploratory identification of predictive biomarkers in randomized trials with normal endpoints[J]. Statistics in Medicine,2020,39(7).[84]Jianye Ching,Kok-Kwang Phoon. Measuring Similarity between Site-Specific Data and Records from Other Sites[J]. ASCE-ASME Journal of Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering,2020,6(2).[85]Anne Kelly Knowles,Justus Hillebrand,Paul B. Jaskot,Anika Walke. Integrative, Interdisciplinary Database Design for the Spatial Humanities: the Case of the Holocaust Ghettos Project[J]. International Journal of Humanities and Arts Computing,2020,14(1-2).[86]Sheng-Feng Sung,Pei-Ju Lee,Cheng-Yang Hsieh,Wan-Lun Zheng. Medication Use and the Risk of Newly Diagnosed Diabetes in Patients with Epilepsy: A Data Mining Application on a Healthcare Database[J]. Journal of Organizational and End User Computing (JOEUC),2020,32(2).[87]Rashkovits Rami,Lavy Ilana. Students' Difficulties in Identifying the Use of Ternary Relationships in Data Modeling[J]. International Journal of Information and Communication Technology Education (IJICTE,2020,16(2).[88]Yusuf Akhtar,Dipti Prasad Mukherjee. Context-based ensemble classification for the detection of architectural distortion in a digitised mammogram[J]. IET Image Processing,2020,14(4).[89]Gurpreet Kaur,Sukhwinder Singh,Renu Vig. Medical fusion framework using discrete fractional wavelets and non-subsampled directional filter banks[J]. IET Image Processing,2020,14(4).[90]Qian Liu,Bo Jiang,Jia-lei Zhang,Peng Gao,Zhi-jian Xia. Semi-supervised uncorrelated dictionary learning for colour face recognition[J]. IET Computer Vision,2020,14(3).[91]Yipo Huang,Leida Li,Yu Zhou,Bo Hu. No-reference quality assessment for live broadcasting videos in temporal and spatial domains[J]. IET Image Processing,2020,14(4).[92]Panetta Karen,Wan Qianwen,Agaian Sos,Rajeev Srijith,Kamath Shreyas,Rajendran Rahul,Rao Shishir Paramathma,Kaszowska Aleksandra,Taylor Holly A,Samani Arash,Yuan Xin. A Comprehensive Database for Benchmarking Imaging Systems.[J]. IEEE transactions on pattern analysis and machine intelligence,2020,42(3).[93]Rahnev Dobromir,Desender Kobe,Lee Alan L F,Adler William T,Aguilar-Lleyda David,Akdo?an Ba?ak,Arbuzova Polina,Atlas Lauren Y,Balc? Fuat,Bang Ji Won,Bègue Indrit,Birney Damian P,Brady Timothy F,Calder-Travis Joshua,Chetverikov Andrey,Clark Torin K,Davranche Karen,Denison Rachel N,Dildine Troy C,Double Kit S,Duyan Yaln A,Faivre Nathan,Fallow Kaitlyn,Filevich Elisa,Gajdos Thibault,Gallagher Regan M,de Gardelle Vincent,Gherman Sabina,Haddara Nadia,Hainguerlot Marine,Hsu Tzu-Yu,Hu Xiao,Iturrate I?aki,Jaquiery Matt,Kantner Justin,Koculak Marcin,Konishi Mahiko,Ko? Christina,Kvam Peter D,Kwok Sze Chai,Lebreton Ma?l,Lempert Karolina M,Ming Lo Chien,Luo Liang,Maniscalco Brian,Martin Antonio,Massoni Sébastien,Matthews Julian,Mazancieux Audrey,Merfeld Daniel M,O'Hora Denis,Palser Eleanor R,Paulewicz Borys?aw,Pereira Michael,Peters Caroline,Philiastides Marios G,Pfuhl Gerit,Prieto Fernanda,Rausch Manuel,Recht Samuel,Reyes Gabriel,Rouault Marion,Sackur Jér?me,Sadeghi Saeedeh,Samaha Jason,Seow Tricia X F,Shekhar Medha,Sherman Maxine T,Siedlecka Marta,Skóra Zuzanna,Song Chen,Soto David,Sun Sai,van Boxtel Jeroen J A,Wang Shuo,Weidemann Christoph T,Weindel Gabriel,WierzchońMicha?,Xu Xinming,Ye Qun,Yeon Jiwon,Zou Futing,Zylberberg Ariel. The Confidence Database.[J]. Nature human behaviour,2020,4(3).[94]Taipalus Toni. The Effects of Database Complexity on SQL Query Formulation[J]. Journal of Systems and Software,2020(prepublish).[95]. Information Technology; Investigators from Deakin University Target Information Technology (Conjunctive query pattern structures: A relational database model for Formal Concept Analysis)[J]. Computer Technology Journal,2020.[96]. Machine Learning; Findings from Rensselaer Polytechnic Institute Broaden Understanding of Machine Learning (Self Healing Databases for Predictive Risk Analytics In Safety-critical Systems)[J]. Computer Technology Journal,2020.[97]. Science - Library Science; Investigators from Cumhuriyet University Release New Data on Library Science (Scholarly databases under scrutiny)[J]. Computer Technology Journal,2020.[98]. Information Technology; Investigators from Faculty of Computer Science and Engineering Release New Data on Information Technology (FGSA for optimal quality of service based transaction in real-time database systems under different workload condition)[J]. Computer Technology Journal,2020.[99]Muhammad Aqib Javed,M.A. Naveed,Azam Hussain,S. Hussain. Integrated data acquisition, storage and retrieval for glass spherical tokamak (GLAST)[J]. Fusion Engineering and Design,2020,152.[100]Vinay M.S.,Jayant R. Haritsa. Operator implementation of Result Set Dependent KWS scoring functions[J]. Information Systems,2020,89.[101]. Capital One Services LLC; Patent Issued for Computer-Based Systems Configured For Managing Authentication Challenge Questions In A Database And Methods Of Use (USPTO 10,572,653)[J]. Journal of Robotics & Machine Learning,2020.[102]Ikawa Fusao,Michihata Nobuaki. In Reply to Letter to the Editor Regarding "Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan".[J]. World neurosurgery,2020,135.[103]Chen Wei,You Chao. Letter to the Editor Regarding "Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan".[J]. World neurosurgery,2020,135.[104]Zhitao Xiao,Lei Pei,Lei Geng,Ying Sun,Fang Zhang,Jun Wu. Surface Parameter Measurement of Braided Composite Preform Based on Faster R-CNN[J]. Fibers and Polymers,2020,21(3).[105]Xiaoyu Cui,Ruifan Cai,Xiangjun Tang,Zhigang Deng,Xiaogang Jin. Sketch‐based shape‐constrained fireworks simulation in head‐mounted virtual reality[J]. Computer Animation and Virtual Worlds,2020,31(2).[106]Klaus B?hm,Tibor Kubjatko,Daniel Paula,Hans-Georg Schweiger. New developments on EDR (Event Data Recorder) for automated vehicles[J]. Open Engineering,2020,10(1).[107]Ming Li,Ruizhi Chen,Xuan Liao,Bingxuan Guo,Weilong Zhang,Ge Guo. A Precise Indoor Visual Positioning Approach Using a Built Image Feature Database and Single User Image from Smartphone Cameras[J]. Remote Sensing,2020,12(5).[108]Matthew Grewe,Phillip Sexton,David Dellenbach. Use Risk‐Based Asset Prioritization to Develop Accurate Capital Budgets[J]. Opflow,2020,46(3).[109]Jose R. Salvador,D. Mu?oz de la Pe?a,D.R. Ramirez,T. Alamo. Predictive control of a water distribution system based on process historian data[J]. Optimal Control Applications and Methods,2020,41(2).[110]Esmaeil Nourani,Vahideh Reshadat. Association extraction from biomedicalliterature based on representation and transfer learning[J]. Journal of Theoretical Biology,2020,488.[111]Ikram Saima,Ahmad Jamshaid,Durdagi Serdar. Screening of FDA approved drugs for finding potential inhibitors against Granzyme B as a potent drug-repurposing target.[J]. Journal of molecular graphics & modelling,2020,95.[112]Keiron O’Shea,Biswapriya B. Misra. Software tools, databases and resources in metabolomics: updates from 2018 to 2019[J]. Metabolomics,2020,16(D1).[113]. Information Technology; Researchers from Virginia Polytechnic Institute and State University (Virginia Tech) Describe Findings in Information Technology (A database for global soil health assessment)[J]. Energy & Ecology,2020.[114]Moosa Johra Muhammad,Guan Shenheng,Moran Michael F,Ma Bin. Repeat-Preserving Decoy Database for False Discovery Rate Estimation in Peptide Identification.[J]. Journal of proteome research,2020,19(3).[115]Huttunen Janne M J,K?rkk?inen Leo,Honkala Mikko,Lindholm Harri. Deep learning for prediction of cardiac indices from photoplethysmographic waveform: A virtual database approach.[J]. International journal for numerical methods in biomedical engineering,2020,36(3).[116]Kunxia Wang,Guoxin Su,Li Liu,Shu Wang. Wavelet packet analysis for speaker-independent emotion recognition[J]. Neurocomputing,2020.[117]Fusao Ikawa,Nobuaki Michihata. In Reply to Letter to the Editor Regarding “Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan”[J]. World Neurosurgery,2020,135.[118]Wei Chen,Chao You. Letter to the Editor Regarding “Treatment Risk for Elderly Patients with Unruptured Cerebral Aneurysm from a Nationwide Database in Japan”[J]. World Neurosurgery,2020,135.[119]Lindsey A. Parsons,Jonathan A. Jenks,Andrew J. Gregory. Accuracy Assessment of National Land Cover Database Shrubland Products on the Sagebrush Steppe Fringe[J]. Rangeland Ecology & Management,2020,73(2).[120]Jing Hua,Yilu Xu,Jianjun Tang,Jizhong Liu,Jihao Zhang. ECG heartbeat classification in compressive domain for wearable devices[J]. Journal of Systems Architecture,2020,104.以上就是关于数据库英文参考文献的全部内容,希望看完后对你有所启发。
python外文参考文献
![python外文参考文献](https://img.taocdn.com/s3/m/3ff629d980c758f5f61fb7360b4c2e3f57272507.png)
python外文参考文献Python是一种高级编程语言,其背后有着强大而庞大的社区,且它被广泛用于科研计算、数据分析、机器学习、人工智能等多个领域。
为了更好地了解Python,以下是一些外文参考文献及其主要内容。
1. Python 技术研发者必读书目推荐该参考文献是一份Python技术研发者必读的书目推荐。
其中包括了大量的经典书籍,如《Python编程快速入门》、《Python核心编程》、《流畅的Python》、《Python高效编程》、《精通Python设计模式》等。
这篇参考文献可以帮助Python开发者深入了解Python的应用和实践,并使他们在日常工作中更加得心应手。
2. 《Python 数据分析》Python已经成为了数据科学和分析领域的瑰宝。
《Python 数据分析》这部分参考文献介绍了各种Python数据科学库和工具,如pandas、NumPy、SciPy、matplotlib和scikit-learn等。
这些工具可以协助开发者进行数据的预处理、清洗、可视化以及机器学习和深度学习等领域的活动。
3. 《Python 机器学习基础教程》这部分参考文献提供了Python机器学习的基本概念和技术。
它介绍了各种机器学习库和工具,如Scikit-learn、TensorFlow等,并包含实际应用的案例。
这篇参考文献为开发者提供了知识和技能来解决各种机器学习问题,如分类、聚类、预测、推荐等。
4. 《Python 设计模式》Python 设计模式囊括了各种软件设计问题的解决方案,这些问题涵盖了面向对象设计、API设计、并发编程和I/O编程等。
该参考文献介绍了常见的设计模式,如单例、工厂、建造者、适配器和装饰器等,这些模式可以帮助开发者轻松重用和扩展代码,以满足复杂应用的需求。
5. 《深度学习笔记》深度学习是最热门的人工智能技术之一,Python已经成为了实现深度学习算法的主要语言之一。
这篇参考文献介绍了深度学习算法的基础知识和实现技术,包括神经网络、卷积神经网络、递归神经网络等。
基于Web of Science 数据库的文献可视化分析
![基于Web of Science 数据库的文献可视化分析](https://img.taocdn.com/s3/m/dd27846e24c52cc58bd63186bceb19e8b8f6ec30.png)
Science and Technology &Innovation ┃科技与创新2017年第20期·7·文章编号:2095-6835(2017)20-0007-03基于Web of Science 数据库的文献可视化分析徐立霞(曲阜师范大学图书馆,山东济宁273165)摘要:借助科睿唯安(Clarivate Analytics ,原汤森路透知识产权与科技事业部)的Web of Science 数据库,对该数据库中2006—2016年间的关于大气污染控制的7622篇文献进行可视化分析,分析文献的发表时间、国家/地区、研究方向、发文机构和发文期刊,并基于相关分析,评述和展望该领域近年来的研究情况。
关键词:Web of Science ;可视化分析;大气污染控制;文献计量学中图分类号:G354文献标识码:ADOI :10.15913/ki.kjycx.2017.20.007近年来,随着全球工业化水平的迅速提高,人类活动对环境的影响越来越显著,以氮氧化物、硫氧化物、颗粒污染物、臭氧、挥发性有机物(VOC )等为代表的大气污染物对大气环境有较大的影响,大气环境日益恶化,已经引起世界各国的高度关注[1]。
我国各地酸雨频频,美国洛杉矶和英国伦敦的光化学烟雾事件等无一不体现着全球环境污染的加剧。
工业废气的肆意排放,居民生活炉灶、采暖锅炉和交通工具的尾气排放成为大气的重要污染源。
随着居民生活水平的提高和城市化进程的加快,人类社会逐步迈进“汽车时代”,世界范围的汽车数量以惊人的速度增长。
此外,摩托车、通用汽柴油机等以化石燃料为驱动源的交通和动力工具也得到了快速发展。
由此不难看出,虽然汽车数量的发展在一定程度上推动了全球经济的发展,但是,汽车尾气的大量排放给全球污染治理带来了沉重的负担[2]。
此外,以挥发性有机物(VOC )为代表的室内空气污染物严重影响了室内空气。
VOC 主要来自燃料燃烧、交通运输、建筑和装饰材料、家具、家用电器、清洁剂和人体本身的排放等,它对环境和人体健康有很大的危害,主要体现在以下2个方面:①有些挥发性有机物可渗透进入土壤,污染土地、水源;②人体接触这些挥发性有机物可引起呼吸困难、头晕目眩的症状,神经和呼吸中枢损伤,严重的还可致癌。
基于python的企业数据可视化参考文献
![基于python的企业数据可视化参考文献](https://img.taocdn.com/s3/m/e4124f7286c24028915f804d2b160b4e767f81b1.png)
基于python的企业数据可视化参考文献基于Python的企业数据可视化是一种将企业数据转化为图形化展示的方法,它可以帮助企业更好地理解和分析数据,进而做出更明智的决策。
下面是一些关于基于Python的企业数据可视化的参考文献,这些文献提供了有关该主题的详细信息和实际应用案例。
1. "Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython" by Wes McKinney. 这本书是关于使用Python进行数据分析的经典教材之一,其中包含了丰富的企业数据可视化的示例和实践技巧。
2. "Python Data Science Handbook" by Jake VanderPlas. 该书是一本全面介绍Python数据科学工具和技术的指南,其中包括了数据可视化的相关内容,如Matplotlib和Seaborn等库的使用方法。
3. "Python Business Intelligence Cookbook" by Robert Dempsey. 这本书提供了许多实用的Python数据分析和可视化的示例,特别是针对企业业务需求的场景。
4. "Data Visualization with Python and JavaScript" by Kyran Dale. 该书介绍了如何使用Python和JavaScript创建交互式的数据可视化,包括企业数据可视化的实例和最佳实践。
5. "Python Data Visualization Cookbook" by Igor Milovanović. 这本书提供了大量的Python数据可视化示例和技巧,适用于企业数据分析和决策支持的场景。
6. "Mastering Python for Finance" by James Ma Weiming. 该书介绍了如何使用Python进行金融数据分析和可视化,包括企业财务数据的可视化方法。
SCI论文中数据的可视化和表解读技巧
![SCI论文中数据的可视化和表解读技巧](https://img.taocdn.com/s3/m/e02a5b47178884868762caaedd3383c4bb4cb423.png)
SCI论文中数据的可视化和表解读技巧科学研究中,数据的可视化和表解读技巧在SCI论文中起着至关重要的作用。
准确有效地呈现和解读数据,对于读者更好地理解研究结果,进而推进学术领域的发展具有重要意义。
本文旨在介绍SCI论文中常用的数据可视化和表解读技巧,以帮助科研人员提高论文质量和可读性。
一、数据的可视化技巧数据可视化是将抽象的数据通过图表等形式呈现给读者,并通过直观的信息传递方式,帮助读者更好地理解研究结果。
下面是SCI论文中常用的数据可视化技巧:1.折线图:折线图适合展示随时间等变量的趋势或比较不同组的差异程度。
在绘制折线图时,应确保X轴和Y轴的刻度清晰,同时标注数据点以增强可读性。
2.柱状图:柱状图适用于比较不同类别或组之间的差异,例如研究对象在不同条件下的表现。
柱状图的柱体宽度应适中,标注清晰,避免使用过于繁杂的颜色。
3.散点图:散点图常用于展示两个连续变量之间的关系,例如随着某个变量的增加,另一个变量的变化趋势。
在绘制散点图时,要注意为数据点添加合适的标记和颜色以区分不同类别。
4.饼图:饼图适用于展示不同类别在总体中的占比情况。
绘制饼图时,要使用清晰易懂的图例,确保各个扇区的角度和标注的字体大小适中。
二、表解读技巧除了数据可视化外,表格也是SCI论文中常用的一种呈现数据的形式。
正确的表解读技巧有助于读者更好地理解表格中的数据。
以下是表解读的一些建议:1.表格标题和序号:在SCI论文中,每个表格应有清晰的标题和序号,以便读者快速定位和理解表格的内容。
2.表头:表头应简明扼要地总结表格的主要内容,并提供单位和度量标准(如时间单位、百分比、各项指标的缩写等)。
3.行列标签:表格中的行列标签应明确表达,以便读者理解各项指标所代表的含义。
4.精确的数字和小数位数:在表格中呈现的数字应尽可能精确,不应省略重要的小数位数。
同时,要注意统一小数位数的个数以增强数据的可比性。
5.重要数据的突出:对于表格中的重要数据,可以使用粗体、斜体或不同的颜色进行突出,以引起读者的注意并强调其重要性。
数据分析外文文献+翻译
![数据分析外文文献+翻译](https://img.taocdn.com/s3/m/7bafed7666ec102de2bd960590c69ec3d4bbdb66.png)
数据分析外文文献+翻译文献1:《数据分析在企业决策中的应用》该文献探讨了数据分析在企业决策中的重要性和应用。
研究发现,通过数据分析可以获取准确的商业情报,帮助企业更好地理解市场趋势和消费者需求。
通过对大量数据的分析,企业可以发现隐藏的模式和关联,从而制定出更具竞争力的产品和服务策略。
数据分析还可以提供决策支持,帮助企业在不确定的环境下做出明智的决策。
因此,数据分析已成为现代企业成功的关键要素之一。
文献2:《机器研究在数据分析中的应用》该文献探讨了机器研究在数据分析中的应用。
研究发现,机器研究可以帮助企业更高效地分析大量的数据,并从中发现有价值的信息。
机器研究算法可以自动研究和改进,从而帮助企业发现数据中的模式和趋势。
通过机器研究的应用,企业可以更准确地预测市场需求、优化业务流程,并制定更具策略性的决策。
因此,机器研究在数据分析中的应用正逐渐受到企业的关注和采用。
文献3:《数据可视化在数据分析中的应用》该文献探讨了数据可视化在数据分析中的重要性和应用。
研究发现,通过数据可视化可以更直观地呈现复杂的数据关系和趋势。
可视化可以帮助企业更好地理解数据,发现数据中的模式和规律。
数据可视化还可以帮助企业进行数据交互和决策共享,提升决策的效率和准确性。
因此,数据可视化在数据分析中扮演着非常重要的角色。
翻译文献1标题: The Application of Data Analysis in Business Decision-making The Application of Data Analysis in Business Decision-making文献2标题: The Application of Machine Learning in Data Analysis The Application of Machine Learning in Data Analysis文献3标题: The Application of Data Visualization in Data Analysis The Application of Data Visualization in Data Analysis翻译摘要:本文献研究了数据分析在企业决策中的应用,以及机器研究和数据可视化在数据分析中的作用。
毕业设计数据可视化系统参考外文文献
![毕业设计数据可视化系统参考外文文献](https://img.taocdn.com/s3/m/38a7b5b6710abb68a98271fe910ef12d2af9a9a1.png)
毕业设计数据可视化系统参考外文文献毕业设计数据可视化系统是一个涉及多个领域的综合性项目,因此需要参考多方面的外文文献。
以下是一些可能相关的外文文献资源:1. Data Visualization: A Handbook for Data Driven Design作者: Isabel Meirelles这本手册提供了数据可视化的基础知识和技术,包括数据清理、数据转换和可视表示等方面的内容。
2. The Visual Display of Quantitative Information作者: Edward R. Tufte这是一本经典的数据可视化书籍,详细介绍了如何使用图表、图形和表格等视觉元素来表示和呈现定量数据。
3. Data Visualization: A Practical Introduction作者: Jacqueline Peterson这本书提供了一个全面的数据可视化指南,从数据清理和准备到可视表示和解释等方面都有详细的介绍。
4. Information Visualization: Perception for Design作者: Collin F. Lynch这本书介绍了信息可视化的基本概念和技术,包括认知、感知和可视化等方面的内容。
它还提供了一些实用的设计技巧和工具。
5. Visualizing Data: Exploring and Explaining Data Through Tables, Charts, Maps, and more作者: Andy Kirk这本书提供了一系列的数据可视化方法和技巧,包括各种图表、地图和图形等。
它还强调了数据可视化的解释和传达方面的内容。
此外,还可以查阅一些专门针对数据可视化的学术期刊和会议论文集,例如IEEE Transactions on Visualization and Computer Graphics、Proceedings of the IEEE Symposium on Information Visualization等。
基于CiteSpace 的大数据文献可视化分析
![基于CiteSpace 的大数据文献可视化分析](https://img.taocdn.com/s3/m/fa8bb04468eae009581b6bd97f1922791688beda.png)
空间”。该软件是由陈超美教授开发的一款分析和可
过多分析),大数据文献数量持续增多,增长率逐渐放
视共被引网络的 Java 应用程序,在科学计量学、数据和
缓。值得注意的是,2013 年被称为我国“大数据元年”,
信息可视化背景下逐渐发展起来,通过分析科学文献,
从文献分析来看,虽然我国信息通信技术在近几年得
专
《信息通信技术与政策》2018 年 12 月第 12 期
题
基于 CiteSpace 的大数据文献可视化分析
孙文沣
邱艳娟
高 岩
北京邮电大学经济管理学院硕士研究生
中国信息通信研究院产业与规划研究所高级工程师
中国信息通信研究院产业与规划研究所高级工程师
摘要:随着信息化、网络化时代的到来,大数据的发展受到越来越多的关注。本文选取了 2012—
(2016—2020 年)》
(工信部规[2016]412 号)。2017 年 2
从图谱中挖掘相关信息,分析大数据研究领域内的热
·25·
Copyright©博看网 . All Rights Reserved.
□Information and Communications Technology and Policy No.12
式定义了大数据的概念;12 月,工业和信息化信部把海
共中央政治局第二次集体学习时强调,实施国家大数
量数据存储、数据挖掘、图像视频智能分析等信息处理
据战略,加快建设数字中国,将大数据提升到前所未有
技术作为 4 项关键技术创新工程之一。2012 年 7 月,联
的战略高度。
合国在纽约发布了大数据白皮书《大数据促发展:挑战
即可,分别选择 Institution/Country/Keyword 等需要分
用于文献可视化分析的数据清洗方法研究
![用于文献可视化分析的数据清洗方法研究](https://img.taocdn.com/s3/m/cacd037d302b3169a45177232f60ddccda38e66c.png)
用于文献可视化分析的数据清洗方法研究方小利,刘㊀霞(武汉大学图书馆,430072)摘㊀要:可视化分析是文献计量分析中较为重要的一种㊂在进行可视化分析时,数据清洗工作至关重要,但目前的可视化分析软件一般不具备数据清洗功能,即使具备,也无法实现批量化的快速清洗㊂文章以石墨烯领域的WOS 文献题录数据可视化分析为对象,探索了利用OpenRefine 的聚类功能,对题录中的重要信息进行聚类,形成软件可识别的规范术语文件,进行可视化的方法,验证了该方法用于文献情报挖掘中可视化分析的优越性㊂研究结果表明,利用OpenRefine 聚类功能可以高效地对文献题录重要信息进行处理,机构合作网络中,重复节点减少了9%;关键词共现网络中,词频最大可增加742次,明显减少了重复节点,提高了可视化分析的准确性和情报挖掘的效率㊂关键词:大数据;CiteSpace;VOSviewer;OpenRefine;可视化;情报挖掘;数据清洗引用本文格式:方小利,刘霞.用于文献可视化分析的数据清洗方法研究[J].大学图书情报学刊,2021(6):56-60.Research of Data Cleaning Method for VisualizationFANG Xiao-li,LIU Xia(Wuhan University Library,Wuhan㊀430072,China)Abstract :Visualization analysis is an important method among the bibliometric analysis.During the process ofvisualization analysis,data cleaning is very important,but the current visualization software generally can t be used to dothis work,and even if some software can,it is relatively simple,needing to repeat one by one.The paper explores amodified strategy of information mining with the improvement of data visualization analysis which playsa significant role ininformation mining as an breakthrough point.Taking the visualization analysis of bibliography in graphene area from WOSas an example,the paper explores the method of using the clustering function of OpenRefine to deal with the importantinformation of the bibliography,forming a standardized glossary to meet the needs of visualizing software,and then carrying out the visualization.The superiority of this method in the visualization analysis is verified.The results show thatwith the help of the clustering function of OpenRefine,people can quickly process the important information of literature in bulk,the number of duplicate nodes is reduced by 9%in the organization cooperation network,and the keywords frequency can be increased by 742times in max,which significantly reduces the number of duplicate nodes in thekeyword co-occurrence network.This method can improve the accuracy of visualization analysis and the efficiency ofinformation mining.Key words :big data;CiteSpace;VOSviewer;OpenRefine;visualization;information mining;data cleaning1㊀引言文献计量学是采用数学与统计学等方法,对文献各个指标的数量特征进行分析,得到关于文献的年代㊁国家/地区㊁研究机构㊁作者㊁主题㊁文献来源㊁被引频次等分布或发展趋势的定量分析方法[1-2]㊂目前,该种方法已在许多领域获得应用,成为科学研究㊁决策咨询㊁大学图书馆学科服务等领域重要的数据分析方法㊂但是,传统的文献计量方法仅针对文献的各个指标进行分析,只展示了文献大数据的宏观态势,且分析的结果多以数据图表呈现,在信息呈现的层次和视觉效果方面有待提升㊂因此,研究者们基于文献计量学原理开发了大量的文献计量可视化分析工具,利用文献之间的引证关系及某一指标的突变,快速分析学科或者研究领域的结构和态势,并将分析发现的重基金项目: 武大通识3.0 项目(数据素养与数据利用)(2019年9月-2022年6月)(武大本函[2018]158号)2021年11月第39卷第6期㊀㊀㊀㊀㊀㊀㊀㊀大学图书情报学刊Journal of Academic Library and Information Science㊀㊀㊀㊀㊀㊀㊀㊀Nov ,2021Vol.39No.6要信息以直观精美的图谱方式呈现出来㊂常见的可视化分析工具可以分为三类,除了WOS㊁ESI㊁Incites㊁Scopus㊁CSSCI㊁CNKI等具有索引功能的数据库型的计量分析工具[3-6]和Python㊁R语言㊁SAS㊁Matlab㊁SPSS等[7]程序语言或统计分析工具,还有各种可视化分析软件,如CiteSpace㊁Histcite㊁Pajek㊁Netdraw㊁Ucinet㊁VOSviewer等[7]㊂前两类工具虽然具有分析功能,但仅限于宏观层面的统计分析[8]或者需要经过复杂的参数调试,对于大多分析者来说,并不是可视化分析工具的最佳选择,而可视化软件不仅能够从宏观层面上对文献进行分析,还能够以精美的图谱呈现文献之间的关联和突变特性等微观层面的信息,因而广受青睐㊂在可视化软件中,CiteSpace㊁VOSviewer使用最为广泛㊂CiteSpace和VOSviewer是基于Java开发的文献计量分析软件,利用这两个软件,可以对WOS㊁Pubmed㊁Elsevier等外文数据库中下载的大量文献题录数据进行关键词㊁主题词㊁机构㊁国别等合作网络分析㊁聚类分析,帮助研究者快速从大量的文献信息中了解研究领域结构㊁定位重点文献㊁重要研究团队等[9-10]㊂但是,由于不同作者在撰写论文时,对相同内容的撰写可能存在不同的表述形式,特别是外文机构名称或专业术语,从而导致利用CiteSpace或VOSviewer软件进行文献可视化分析时,在同一可视化网络中,经常出现同一专业术语或机构名称的多种书写形式的节点同时存在,降低了可视化分析的准确性,对后续分析结论也产生了影响,因此,有必要对可视化分析的数据进行清洗,优化情报挖掘的方法,从而帮助分析者得出正确的结论㊂虽然这两个软件本身自带了文本归并功能,但只适合少量文献的数据清洗,对于大样本的数据,需要多次归并且很容易遗漏㊂目前,对包括文献题录数据在内的数据进行清洗的工具有很多,除了需要付费使用的Data Stage (Ardent)㊁Data Transformation Service(Microsoft)㊁Warehouse Administrator(SAS)等工具,还有一些免费的工具,如Openfine㊁DataWrangler等,其中以OpenRefine最为常见㊂OpenRefine是谷歌公司基于Java开发的一款类似于Excel的开源网络应用,以数据的列和字段为对象,实现对文本㊁JSON㊁CSV㊁XLS㊁XML等格式的数据字段进行快速合并㊁分裂㊁聚类㊁批量编辑等㊂因此,本文以CiteSpace和VOSviewer软件为例,以WOS数据库2020-2021年主题为 graphene 的文献为对象,利用了开源软件OpenRefine的文本聚类功能(cluster and edit),介绍了一种能够快速大批量地对导入到可视化分析软件文献题录中的专业术语或机构名称进行归并的方法,以期通过该方法为其他可视化分析过程中的数据清洗提供借鉴,从而帮助数据分析者提高可视化分析的效率和准确性㊂2㊀数据采集及清洗流程文章以WOS数据库题录数据为对象,研究了利用OpenRefine的聚类功能快速完成数据清洗的方法㊂题录数据来源于WOS数据库,检索主题为 graphene ,时间跨度为2020-2021年,检索时间为2020年10月13日,文献类型为 article ,共10494条㊂利用OpenRefine对该题录数据中的机构㊁关键词进行聚类后,利用Excel一次形成软件规定的规范术语表㊂具体流程如图1所示㊂其中,在使用CiteSpace进行可视化时,根据使用的版本不同,数据有两种归并方式,除图中所述式外,对于有些版本的CiteSpace,主要采用 关键词1#关键词2 的方式,两种方式中,第一个出现的关键词即为网络中最终显示的关键词,它是所有与其同义㊁近义㊁书写形式㊁单复数等不同书写形式的代表㊂本文使用的3.9R7版的CiteSpace,采用后一种方式归并㊂3㊀数据处理效果及讨论3.1㊀文献题录中机构名称的处理及效果因从WOS导出的文献题录数据中,机构信息与作者地址字段(C1)对应,因此,首先要从C1中提取机构名称信息㊂本文采用OpenRefine提取机构信息,最终提取的所有机构名称在C1字段中共出现26918次(含重复出现),经透视表统计,最终得到了5752个机构名称㊂本部分主要是对5752个机构名称中书写形式不一样的机构名称利用聚类方式进行同义词合并,从而达到每个地址的名称书写方式唯一㊂经聚类合并后,机构数降为5224个,机构重复率减少了9.2%㊂为了证明聚类法对可视化结果的影响,本文将未经机构归并处理的原始数据导入VOSviewer,统计得出了机构发文数排名前18位的列表1,将聚类法形成的文件处理成带有标签列(label)和替换列(replace by)的可被VOSviewer识别的规范术语文件(VOSviewer thesaurus file),并将其应用到文件导入路径中,得出了机构发文数排名前18位的列表2㊂两个列表的机构发文数对比如图2所示㊂总第188期大学图书情报学刊2021年第6期图1㊀数据清洗流程图2㊀归并处理前后部分数据对比图㊀㊀从图2可以看出,与归并处理前相比,处理后的数据中,四川大学(sichuan univ)和中国科学技术大学(univ sci &technol china)的发文量分别增加了1篇㊂图2虽然未完全展示出所有机构发文统计数量在归并前后的对比变化,但是与归并前相比,统计显示的总机构数减少了528个,说明通过聚类可以将一些拼写不规范的机构名称进行统一命名处理,从而帮助提升后续分析结果的准确性㊂为了进一步证明OpenRefine 聚类法用于机构归并的效果,本文利用WOS 数据库自带的地址清洗功能(机构扩展)对机构发文数进行了统计并导出,共5332个机构名称,将该5332个机构名称数据导入OpenRfine 进行了聚类,部分聚类结果如图3㊂由图3可知,WOS 数据库的机构扩展功能虽然对机构名称进行规范化,但仍有遗漏,在导出的机构扩展地址中,存在因书写形式不同而导致同一机构重复出现㊂并且,由于WOS 数据库的机构扩展功能并未提供与扩展前作者们书写的机构名称的原始形式对比列表,使得在可视化软件进行分析时,使用WOS 机构扩展地址并不能直接替换原有地址进行机构合作网络分析和聚类分析㊂所以,与WOS 数据库的机构扩展功能和可视化软件自带的数据处理功能相比,OpenRefine 聚类功能不仅能够帮助分析者快速完成文献的清理归并,而且方式灵活,能够根据不同计量分方小利,刘㊀霞:用于文献可视化分析的数据清洗方法研究析软件的术语要求转换成相应的格式,优化了可视化分析的结果㊂图3㊀WOS 机构扩展部分聚类3.2㊀文献关键词处理及效果本文对该10494条文献中DE(作者关键词)和ID(系统关键词)进行了合并去重,得到18006组关键词㊂用OpenRefine 进行分裂后计数,得到27419个关键词,利用OpenRefine 对该27419个关键词进行聚类归并㊂虽然由于算法原因不能确定该27419个关键词最终归并成多少个,但是利用OpenRefine 能够最大显示聚类个数的算法 nearest neighbor ,选择默认参数进行两次聚类后,能够读取的关键词为15639个,按照该算法形成了5178个聚类㊂该种方法主要解决了单复数形式㊁连字符 - ,错误书写㊁大小写混用㊁缩写全称混用等引起的不同词组形式㊂聚类前后的词组示例如表1㊂表1㊀聚类举例及解决的问题关键词书写形式聚类标签问题类型Graphene oxide ,graphene oxide ,GRAPHENE OXIDE ,Graphene Oxide ,GRAPHENE-OXIDE ,Graphene-oxide ,Grapheneoxide ,Graphene-oxide ,graphene-oxideGraphene oxide大小写形式㊁连字符㊁词组书写不规范core-shell structure ,CORE-SHELL STRUCTURE ,CORE-SHELL STRUCTURES ,Core -shellstructures ,Core/shell structure ,Core@shell structures ,core-shell structures ,Core-shell structurecore -shell structure连字符㊁单复数NITROGEN -DOPED GRAPHENE ,nitrogen-doped graphene ,Nitrogen-doped graphene ,Nitrogen doped graphene ,Nitrogen-doped Graphene ,nitrogen doped graphene ,Nitrogen-doped graphene (N-G ),nitrogen-doped graphene (NG )NITROGEN-DOPEDGRAPHENE 简写缩写混用Absorbent ,adsorbentAdsorbent 同义词,形式相近Graphene ,graphemeGraphene错误写法㊀㊀在进行归并处理时,以聚类标签作为代表标签出现在可视化网络中,其它词组出现的次数均叠加在相应的代表标签上,从而进行归并㊂本文利用CiteSpace(CiteSpace version:3.9R7)对归并前后的系统关键词(Keywords Plus)进行了可视化分析,如图4㊂图4㊀归并前后的Keywords Plus 共现网络㊀㊀图4中,为了清楚显示关键词,设置了节点大小为3,关键词阈值为50(Keywords Plus 出现的次数最少为50次),文本大小为5㊂从图4a)可以看出,词组单复数被当成了不同节点同时出现在网络中,如图中圈出的节点㊂但归并后,如图4b)所示,这些由于单复数形式造成了意思相同的节点消失了㊂排名前十的关键词处理前后词频对比如表2㊂由表2可看出,归并前词频排名第8位的关键词总第188期大学图书情报学刊2021年第6期composites 在与关键词 composite 归并后,排名变成了第2位;归并前词频排名第6位的关键词 nanocomposites 在与关键词 nanocomposite 归并后,表2㊀归并前后词频排名前十位的关键词列表归并前归并后关键词词频关键词词频graphene1961graphene1961 performance1242composite1342 graphene oxide1184nanocomposite1316oxide1101performance1242 nanoparticles1045graphene oxide1184 nanocomposites759oxide1101 nanosheets690nanoparticles1045 composites684nanosheets690 fabrication681fabrication681 composite669carbon620排名变成了第3位㊂按照上文设定的阈值导出排名前100位的关键词及词频列表,发现词频变化较大的为 nanocomposite ,归并增加了742次, composite 的词频增加了673次, electrode (表中未列出)的词频增加了203次,等等㊂这说明通过聚类归并的方法能够快速解决关键词的单复数㊁连字符㊁书写错误等问题,对提高可视化分析的准确性有明显的意义㊂4㊀总结与展望本文在考虑了大数据时代数据发展速度快和数据处理速度要求高的前提下,提出了一种基于OpenRefine聚类法对文献重要信息进行优化的方法,研究了该方法对可视化网络的影响㊂研究表明, OpenRefine聚类法有助于快速有效地实现文献机构和关键词的归并,提高可视化网络的准确性和效率,也为大数据时代情报挖掘方法的优化提供了便捷途径㊂尽管如此,该方法还可进一步改进㊂4.1㊀形成结合聚类归并一体化的应用程序虽然本文提出一种相比于传统手工数据归并方法更为快速的方法,但在对数据进行聚类过程中,会出现两个问题:一是Openfine算法不能对所有关键字(关键词或者机构)进行聚类,而是有一定阈值,可能会对可视化结果产生影响;二是在关键字数量较多时,可能需要2-3次聚类,增加了数据处理的工作量㊂因此,后续研究可以针对聚类的阈值设置以及重复工作程序进行集成,形成一款专业的数据清洗工具,甚至与可视化软件结合,提高可视化分析的效率㊂4.2㊀对非结构化的聚类归并本文主要针对文献题录数据进行处理后进行可视化分析,但实际有许多可视化分析是针对自媒体网络数据,而大多数自媒体数据都是非结构化的,在进行可视化分析前,数据清洗工作要比文献题录数据复杂得多,仅数据中某些内容的缺失㊁数据一致性㊁数据归属等脏数据问题就要需要大量的工作,因此,研究如何快速针对自媒体数据进行聚类归并也是后续研究工作需要重点关注的㊂参考文献:[1]王崇德,庞学金.文献计量学术语(一)[J].情报理论与实践,1998(1):3-5.[2]吕晓赞.文献计量学视角下跨学科研究的知识生产模式研究 以大数据研究为例[D].杭州:浙江大学,2020.[3]王璇.基于文献计量工具分析学科发展状况[J].民营科技,2015(9):251.[4]刘雪立.一个新的引文分析工具 InCites数据库及其文献计量学指标的应用[J].中国科技期刊研究, 2013,24(2):277-281.[5]赵婷婷.科学研究成果评价与文献计量工具研究[J].科技创新导报,2008(5):175.[6]杨静,李玉斌.我国社会互动的研究热点与演进趋势 基于CNKI和CiteSpace的文献计量与知识图谱分析[J].软件,2020,41(2):267-272.[7]肖明.国内图书情报知识图谱实证研究[M].北京:中国经济出版社,2017.[8]王知津,王璇,马婧.索引作为文献计量学分析工具的科学性与局限性[C].//中国索引学会.2011年中国索引学会年会论文集,2011:24-34.[9]Chen C.Mapping Scientifi c Frontiers[M].London:UK: Springer-Verlag,2013:1-376.[10]Download VOSviewer1.6.16for Microsoft Windowssystems[EB/OL].(2020-11-25)[2020-12-23].https:///download.作者简介:方小利,女,博士,馆员㊂收稿日期:2021-03-15(责任编辑:王靖雯)方小利,刘㊀霞:用于文献可视化分析的数据清洗方法研究。
python 电影数据可视化英文文献
![python 电影数据可视化英文文献](https://img.taocdn.com/s3/m/4e268229b6360b4c2e3f5727a5e9856a57122658.png)
python 电影数据可视化英文文献以下是一些关于使用Python进行电影数据可视化的英文文献:1. Y. J. Kim, "Movie revenue prediction and genre classification using machine learning," Expert Systems with Applications, vol. 70, pp. 200-209, 2017.本文介绍了如何使用机器学习算法对电影进行分类,并使用可视化工具对电影收入进行预测。
2. J. H. Seo, D. H. Lee, and J. W. Lee, "Analysis of box office success prediction using machine learning and visualization," Journal of Supercomputing, vol. 75, no. 6, pp. 2874-2891, 2019. 本文介绍了如何使用机器学习算法和可视化工具来预测电影的票房成功,并探讨了不同变量对预测结果的影响。
3. C. Hu and C. Zhang, "Data-mining and visualization for movie box-office prediction," in Proceedings of the 2018 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 203-210, 2018.本文提出了一种基于数据挖掘和可视化的方法,用于电影票房预测和分析。
4. S. Kim and Y. J. Kim, "Movie success prediction using machine learning and visualization," Multimedia Tools and Applications, vol. 78, no. 24, pp. 35037-35053, 2019.本文介绍了如何使用机器学习算法和可视化工具来预测电影的成功,并提供了一个基于Web的平台,用于电影数据的可视化和分析。
基于CiteSpace_的图书馆评估研究可视化分析
![基于CiteSpace_的图书馆评估研究可视化分析](https://img.taocdn.com/s3/m/5f37065b77c66137ee06eff9aef8941ea66e4b48.png)
第7期2024年4月江苏科技信息Jiangsu Science and Technology InformationNo.7April,2024作者简介:陈嘉娜(1997 ),女,硕士研究生;研究方向:图书馆评估㊂基于CiteSpace 的图书馆评估研究可视化分析陈嘉娜(湘潭大学,湖南湘潭411100)摘要:为了深度剖析国内近10年图书馆评估的研究现状和前沿趋势,了解国内相关研究的发展阶段和特点,文章通过文献计量工具CiteSpace 对中国知网中2012 2022年的图书馆评估相关文献进行作者㊁机构㊁关键词3方面的可视化计量分析,梳理了我国对该主题的研究概况㊁研究热点和发展趋势,并提出了对未来研究的展望:增强对阅读推广的实证研究㊁进一步对成效评估进行深入研究㊁加强对图书馆空间评估的研究㊂关键词:图书馆评价;文献计量;CiteSpace ;可视化分析中图分类号:G251.5㊀㊀文献标志码:A 0㊀引言㊀㊀图书馆评估是用定性和定量的方法对图书馆实现其目标和满足读者需求程度所进行的评价与测度,目的在于改进图书馆工作,开展优质服务,以最小的成本消耗获取最大的服务效果[1]㊂2021年,文化和旅游部㊁国家发展改革委及财政部发布了‘关于推动公共文化服务高质量发展的意见“(以下简称‘意见“),提到依托行业组织,加强公共图书馆评估定级工作[2],足见国家对于图书馆评估工作的重视㊂目前,国内对于图书馆评估的研究成果颇多,也有国内学者针对这些研究成果进行了梳理㊂例如,崔倩[3]对2001 2012年图书馆馆藏评价方法的研究文献进行调研,在总结研究现状的基础上指出研究中存在的问题并提出建议;伍玉伟等[1]通过对2009 2018年高校图书馆评估相关论文进行统计分析,总结了高校图书馆评估的研究现状和未来趋势;石剑兰等[4]对2010 2020年间我国关于图书馆阅读推广评估的文献进行统计分析,阐述了国内图书馆阅读推广活动评估的研究现状㊂这些研究对图书馆评估都有所贡献,但仍存在部分局限性:研究内容较分散,较少从图书馆评估这一宏观主题展开;较少采用科学计量工具进行客观分析,以主观解读为主㊂因此,本研究采用文献计量软件CiteSpace 6.2.R3[5]对2012 2022年间图书馆评估的相关研究进行可视化,梳理其研究热点和发展趋势,以期为国内图书馆评估工作的开展提供借鉴㊂1㊀研究设计1.1㊀研究方法㊀㊀CiteSpace 软件由美国德雷克塞尔大学陈超美博士开发,被广泛应用于科学计量领域,其基于文献计量和知识图谱的可视化分析有助于分析图书馆评估的发展趋势和当下的研究热点㊂为了探析近10年国内图书馆评估研究整体发展的演进路径,本文主要从作者㊁机构及关键词等层面进行数据分析,梳理我国图书馆评估领域的研究概况,通过关键词聚类㊁时间线图㊁突变词等归纳总结国内图书馆评估研究的发展特征㊁研究热点,以期为学者进行相关理论研究提供借鉴㊂1.2㊀数据来源㊀㊀为了保证数据的科学性和可信度,样本文献来源于中国知网(CNKI)学术期刊检索,来源类别限定为中国社会科学引文数据库(CSSCI)和北大核心期刊㊂通过检索式 图书馆评价 OR 图书馆评估 进行主题检索,时间范围限定为2012 2022年㊂检索日期为2023年5月28日,检索结果共944条㊂经过人工筛选,剔除关联性不高的文献㊁新闻等,得到相关文献917篇㊂2㊀数据处理与分析2.1㊀文献机构分布㊀㊀将917篇文献的题录数据导入软件之后,设置节点类型为机构,运行得到机构合作图谱㊂国内研究图书馆评估的机构目前已基本形成几大合作网络,第一合作网络以武汉大学为中心,国家图书馆㊁北京大学等为成员,第二合作网络以南开大学为中心,天津图书馆㊁山东省图书馆等为成员,第三合作网络则包含中国科学院及军事科学院㊂从发文量上看,南开大学位居首位,共43篇,其次是南京大学信息管理学院,共24篇,武汉大学再次之,共22篇,但武汉大学和其他机构的合作更为密切㊂2.2㊀文献作者分布㊀㊀根据普莱斯定律,核心作者的计算公式为Mʈ0.749ˑN max(其中N max指的是对应年限中论文发表数量最多作者的论文数量),计算结果约为4,即发表4篇文章及以上的可视为核心作者,统计的核心作者共12个,排名靠前的有柯平㊁杨九龙㊁吴正荆㊁齐向华㊁田景梅㊁李新运㊂柯平所在的合作网络发文共计53篇,柯平发文28篇,超过总发文量的50%,形成了以柯平为核心的作者群㊂2.3㊀关键词聚类分析㊀㊀关键词是文献的高度概括,在一定程度上体现了研究主题㊂因此,对关键词进行聚类分析,有助于更好地把握研究现状㊂将节点类型修改为关键词,进行关键词分析,共得出8个类别(见表1)㊂表1㊀关键词聚类类别序号类别名称标识词0图书馆质量评价;大数据;高校图书馆;模型1美国馆藏评价;评价方法;馆藏资源;比较研究2服务质量评价体系;综合评价;信息素养;图书馆评价3绩效评价评估指标;绩效评估;图书馆评估;评价指标4图书馆学数字资源;网站评价;信息服务;评价主体5指标体系评价模型;过程系统;知识产权;风险评估6阅读推广实证研究;效益评价;评估定级;大专院校7评估评价;微服务;空间;绩效2.4㊀研究热点和发展趋势分析2.4.1㊀研究热点主题分析㊀㊀根据关键词聚类结果,将2012 2022年间的研究热点主题内容归纳为以下4个方面㊂(1)对不同类别图书馆主体评估的研究㊂不同图书馆的评估体系㊁评估指标㊁评估方法等都存在一定差异,因此,许多学者会针对不同类别图书馆的特点,对其评估体系㊁模型等进行深入研究㊂例如,唐晓玲等[6]基于QFD和Kano模型对数字图书馆服务质量作出量化评估;包颉[7]基于粗糙集和SECE模型构建了高校图书馆可持续发展评价体系㊂(2)对图书馆评估指标体系的研究㊂随着时代发展,旧的图书馆评估指标体系往往不适用于新型图书馆或新的社会环境,因此,虽然对于图书馆评估指标体系的研究早在20世纪60年代便出现,但相关研究仍在持续当中,相信未来也会是图书馆评估领域的研究热点㊂国内学者主要针对图书馆馆藏资源㊁服务质量㊁绩效评估等方面构建了评估指标体系㊂(3)对图书馆评价方法的研究㊂图书馆评价方法经历了从定性评价到定量评价再到定性与定量相结合的历程㊂定性方法的优点是简单易操作,但结果主观性较强;定量方法得出的结果较为客观合理,但一味追求量化指标而忽略其他因素,也会导致评价结果太过僵硬死板,因此,现在大多学者采用定性结合定量的方式进行图书馆评价㊂(4)对国内外图书馆评估的比较研究㊂例如美国㊁澳大利亚等国家在图书馆评估领域领先于我国,对这些国家的图书馆评估现状进行调研分析可以为我国图书馆评估工作提供借鉴㊂申彦舒[8]通过对美国高校图书馆价值评估研究进行调研,提出其为我国带来的启示;刘娟等[9]对加拿大公共图书馆绩效评估体系进行了分析,结合我国公共图书馆现状提出了适用我国的启示㊂2.4.2㊀发展趋势分析㊀㊀运用突变性检测功能,可以分析短期内产生大幅度变化的关键词,并展示其出现和结束的时间,有助于掌握某一阶段某类主题研究文献的爆发式增长和学界的关注点㊂运用时间线功能,可以呈现不同研究热点在不同时段内的发展情况㊂其结果分别如图1㊁图2所示,结合两个图谱可以对图书馆评估2012 2022年的发展趋势进行梳理㊂从图2可以看出,2012 2015年间图书馆评估研究较为多元化,研究热度较高㊂其中,2013 2015年的研究集中在评价模型,评估客体以公共图书馆和高校图书馆为主㊂在评价模型方面,以用户满意度评价模型为主,其中Servqual模型和LibQual+模型作为两个发展较为成熟的模型,常被学者作为基础模型进行扩展㊂2015 2020年间,由于图书馆评估体系趋于成熟,相关研究成果较为饱和,因此,整体研究热度略有㊀㊀图1㊀突变词图谱图2㊀时间线图下滑,除了评价模型㊁评价体系等基础研究主题,学者开始更多地结合社会热点㊁政策环境进行研究㊂2016年开始, 大数据 这一关键词出现,热度持续至2019年㊂研究内容集中在探究大数据时代下图书馆数字资源的质量评价,不过和大数据相关的研究主题主要在于图书馆服务方式的转变等,较少有将大数据和图书馆评估结合进行研究的文献,相关研究在2019年后的热度也有所消退㊂2017 2019年,陆续出现阅读推广㊁评估定级㊁科研评价㊁空间评估和成效评估等关键词,这一时期突变词的大量出现和国家政策的动向不无关系㊂首先是2017年第六次开展全国县级以上公共图书馆评估定级工作,促使图书馆评估相关研究大量出现;然后是同年中共十九大报告中提到的完善公共文化服务体系,图书馆作为公共文化服务的重要一部分,研究图书馆的评估方式㊁评估标准㊁评估效果等也有助于完善公共文化服务体系,因此,同样助力了此阶段相关研究文献的涌现㊂此外,2018年1月1日起正式实施的‘中华人民共和国公共图书馆法“体现了国家对公共图书馆事业发展的高度重视,意味着图书馆事业步入新时代,研究热度也随之高涨㊂2020年开始,图书馆评估再次步入新阶段,阅读推广㊁科研评价㊁空间评估㊁成效评估等仍是研究热点主题,相信国内对于图书馆评估的相关研究会一直持续㊂更多地参考国外发展经验,结合我国实际,构建出更适合不同类别图书馆㊁更符合当下社会环境的评估体系㊁评估指标㊁评估模型等,为我国公共文化服务事业的发展添砖加瓦㊂3 讨论㊀㊀基于以上分析结果,结合当前技术㊁政策环境等,笔者对未来我国图书馆评估领域的研究趋势作出展望,主要分为以下3方面㊂一是阅读推广的实证研究㊂阅读推广一直是图书馆评估领域的热点主题,但目前研究集中于评估标准㊁评估体系的构建,缺少通过实证研究验证阅读推广效果的实践过程㊂仅仅停留在理论层面的研究无法切实提升现实中的阅读推广效果,因此,建议扩大阅读推广评估的研究范围,可以从实证研究方面入手㊂二是对于成效评估的研究㊂成效评估作为国外图书馆界的研究热点,已证明其在图书馆评估发展的重要地位,但国内相关研究仍较为匮乏,大多停留在对国外成效评估的研究,缺乏结合我国实际对图书馆进行成效评估的研究,且国内图书馆目前主要采用绩效评估方式,未来对成效评估的研究还有待加强㊂三是对图书馆空间评估的研究㊂空间评估也是国外研究的热点之一,随着信息技术的发展,数字图书馆㊁移动图书馆等线上图书馆逐渐成为读者的首要选择,线下实体图书馆作为 第三空间 的趋势不可避免㊂然而,国内对于图书馆的空间再造研究较少,且多为对国外创客空间的研究,因此,应加强相关研究㊂参考文献[1]伍玉伟,胡霄,洪芳林.近十年(2009 2018年)我国高校图书馆评估研究述评[J].河北科技图苑,2020(1):14-20.[2]国务院.文化和旅游部国家发展改革委财政部关于推动公共文化服务高质量发展的意见[EB/OL]. (2021-03-08)[2023-06-02].https:/// zhengce/zhengceku/2021-03/23/content_5595153.htm.[3]崔倩.近十年国内图书馆馆藏评价方法研究述评[J].图书馆杂志,2012(4):11-14.[4]石剑兰,黄洁晶.我国图书馆阅读推广活动评估研究综述(2010 2020年)[J].图书馆理论与实践, 2021(6):112-117.[5]CHEN C M.A glimpse of the first eight months of the COVID-19literature on microsoft academic graph: themes,citation contexts,and uncertainties[J].Frontiers in Research Metrics and Analytics,2020(5):607286.[6]唐晓玲,何燕.基于QFD和Kano模型的数字图书馆质量评估研究[J].情报理论与实践,2013(6): 89-92.[7]包颉.大数据下基于粗糙集的高校图书馆可持续发展评价研究[J].情报理论与实践,2016(4): 103-107.[8]申彦舒.美国高校图书馆价值评估研究及其启示[J].图书馆工作与研究,2013(1):13-16.[9]刘娟,余红.加拿大公共图书馆绩效评估体系及启示[J].图书馆,2013(5):74-78.(编辑㊀何㊀琳)Visual analysis of library evaluation research based on CiteSpaceChen JianaXiangtan University Xiangtan411100 ChinaAbstract In order to deeply analyze the research status and frontier trend of library evaluation in China in the past10 years and understand the development stage and characteristics of domestic research CiteSpace was used to conduct visual econometric analysis of cooperative network citation and keywords of library evaluation related literature in CNKI from2012to2022 which sorts out the research overview research hotspots and development trends of this topic in China.Based on these the prospects for future research have been put forward strengthen empirical research on reading promotion further conduct in-depth research on effectiveness evaluation and strengthen research on library space evaluation.Key words library evaluation bibliometrics CiteSpace visual analytics。
科研文献的可视化分析(Citespace)PPT课件
![科研文献的可视化分析(Citespace)PPT课件](https://img.taocdn.com/s3/m/5e1f7257b6360b4c2e3f5727a5e9856a57122663.png)
数据清洗是数据准备的重要步骤,需要删除无关数据、处理缺失值、异常值等。可以使用 Excel等工具进行数据清洗。
参数设置与可视化效果
参数设置
在Citespace中,可以通过调整参数 来控制可视化效果。常见的参数包括 时间分割、阈值设置、节点类型和连 线等。
可视化效果
Citespace可以将科研文献数据以可 视化的方式呈现出来,常见的可视化 效果包括聚类图、时间线图、网络图 等。可以根据需要选择合适的可视化 效果来展示数据。
启动
安装完成后,双击桌面上的Citespace图标,即可启动软件。
数据准备
数据来源
科研文献数据主要来源于学术数据库,如Web of Science、CNKI等,也可以通过其他途 径获取数据。
数据格式
Citespace支持多种数据格式,如CNKI的TXT格式、EndNote的ENW格式等。在导入数 据前,需要将数据转换成Citespace支持的格式。
Citespace的未来发展方向
跨数据库整合
未来Citespace可能会整合更多类型的数据库,包括中文数据库和 其他小语种数据库,以扩大数据来源。
算法优化
随着技术的进步,Citespace的算法可能会进一步优化,以提高处 理大规模数据和复杂网络结构的效率。
智能化分析
Citespace可能会引入更多智能化分析功能,如自动识别关键节点、 自动推荐研究主题等。
核心主题、研究前沿和知识流动。相比之下,文献管理软件的可视化功能相对较弱,难以提供深入的洞察。
Citespace与科学计量软件比较
总结词:分析深度
详细描述:Citespace不仅提供了传统的科学计量指标,如论文数量、作者合作 网络等,还通过可视化手段揭示了知识结构和演进规律。这使得Citespace在分 析深度上超越了传统的科学计量软件。
商务英语专业毕业论文数据可视化分析——以2020届本科毕业论文为例
![商务英语专业毕业论文数据可视化分析——以2020届本科毕业论文为例](https://img.taocdn.com/s3/m/2f7ac2cb453610661fd9f43b.png)
商务英语专业毕业论文数据可视化分析——以2020届本科毕业论文为例作者:谢志明来源:《校园英语》 2020年第33期文/谢志明【摘要】本科毕业论文具有“培养”和“考核”学生的双重功效。
以此定位,利用Voyant软件,以2020届毕业论文的“标题”和“论文”为研究对象,从选题、形式和能力等方面进行定量研究,并把数据进行可视化呈现,直观地展现了毕业论文撰写所取得的一定成果,但同时也存在不足。
【关键词】商务英语专业;毕业论文;定量研究;数据可视化【作者简介】谢志明(1973-),男,江西宁都人,景德镇学院外国语学院,副教授,研究方向:英语翻译与教学、陶瓷文化研究。
两年前,教育部出台了《普通高等学校本科专业类教学质量国家标准》,而《外国语言文学类本科专业教学质量国家标准》(简称《国标》)是外语类本科专业准入、构建和衡量的依据,明确界定了外语类本科专业办学的宗旨,既有“规矩”又有“空间”。
就毕业论文而言,《国标》明确规定其是课程结构中不可或缺的组成部分。
2004年修订的《中华人民共和国学位条例》(简称《条例》)规定,普通高等学校本科毕业生,成绩优良和达到一定学术水平者才能授予学士学位。
对学生的具体要求为:(一)较好地掌握本门学科的基础理论、专门知识和基本技能;(二)具有从事科学研究工作或担负专门技术工作的初步能力。
在此,分析与比对《国标》《条例》对英语类,特别是商务英语专业毕业论文的规定和标准,从现有数据出发分析商务英语类专业本科毕业论文的现状,有助于掌握现状、发现不足,调整今后的努力方向。
一、毕业论文的定位与要求新《国标》具体规定了课程构成,包括通识教育课程、专业核心课程、培养方向课程、实践教学环节、毕业论文等五项重要课程系列。
毕业论文不再是一个或有或无、游离于课程有机体系的环节,而是课程体系中不可或缺的构成部分。
同时,毕业论文具有“培养和检验学生综合运用所学理论知识研究并解决问题的能力和创新能力”,这使得毕业论文在“培养”和“考核”方面的双重功效得以强化。
关于大数据的学术英文文献
![关于大数据的学术英文文献](https://img.taocdn.com/s3/m/02e17d398f9951e79b89680203d8ce2f006665b0.png)
关于大数据的学术英文文献Big Data: Challenges and Opportunities in the Digital Age.Introduction.In the contemporary digital era, the advent of big data has revolutionized various aspects of human society. Big data refers to vast and complex datasets generated at an unprecedented rate from diverse sources, including social media platforms, sensor networks, and scientific research. While big data holds immense potential for transformative insights, it also poses significant challenges and opportunities that require thoughtful consideration. This article aims to elucidate the key challenges and opportunities associated with big data, providing a comprehensive overview of its impact and future implications.Challenges of Big Data.1. Data Volume and Variety: Big data datasets are characterized by their enormous size and heterogeneity. Dealing with such immense volumes and diverse types of data requires specialized infrastructure, computational capabilities, and data management techniques.2. Data Velocity: The continuous influx of data from various sources necessitates real-time analysis and decision-making. The rapid pace at which data is generated poses challenges for data processing, storage, andefficient access.3. Data Veracity: The credibility and accuracy of big data can be a concern due to the potential for noise, biases, and inconsistencies in data sources. Ensuring data quality and reliability is crucial for meaningful analysis and decision-making.4. Data Privacy and Security: The vast amounts of data collected and processed raise concerns about privacy and security. Sensitive data must be protected fromunauthorized access, misuse, or breaches. Balancing data utility with privacy considerations is a key challenge.5. Skills Gap: The analysis and interpretation of big data require specialized skills and expertise in data science, statistics, and machine learning. There is a growing need for skilled professionals who can effectively harness big data for valuable insights.Opportunities of Big Data.1. Improved Decision-Making: Big data analytics enables organizations to make informed decisions based on comprehensive data-driven insights. Data analysis can reveal patterns, trends, and correlations that would be difficult to identify manually.2. Personalized Experiences: Big data allows companies to tailor products, services, and marketing strategies to individual customer needs. By understanding customer preferences and behaviors through data analysis, businesses can provide personalized experiences that enhancesatisfaction and loyalty.3. Scientific Discovery and Innovation: Big data enables advancements in various scientific fields,including medicine, genomics, and climate modeling. The vast datasets facilitate the identification of complex relationships, patterns, and anomalies that can lead to breakthroughs and new discoveries.4. Economic Growth and Productivity: Big data-driven insights can improve operational efficiency, optimize supply chains, and create new economic opportunities. By leveraging data to streamline processes, reduce costs, and identify growth areas, businesses can enhance their competitiveness and contribute to economic development.5. Societal Benefits: Big data has the potential to address societal challenges such as crime prevention, disease control, and disaster management. Data analysis can empower governments and organizations to make evidence-based decisions that benefit society.Conclusion.Big data presents both challenges and opportunities in the digital age. The challenges of data volume, velocity, veracity, privacy, and skills gap must be addressed to harness the full potential of big data. However, the opportunities for improved decision-making, personalized experiences, scientific discoveries, economic growth, and societal benefits are significant. By investing in infrastructure, developing expertise, and establishing robust data governance frameworks, organizations and individuals can effectively navigate the challenges and realize the transformative power of big data. As thedigital landscape continues to evolve, big data will undoubtedly play an increasingly important role in shaping the future of human society and technological advancement.。
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
ThemeRiver: Visualizing Theme Changes over Time Susan Havre, Beth Hetzler, and Lucy NowellBattelle Pacific Northwest DivisionRichland, Washington 99352 USA1+509+375-6948{susan.havre | beth.hetzler | lucy.nowell}@AbstractThemeRiver™ is a prototype system that visualizes thematic variations over time within a large collection of documents. The “river” flows from left to right through time, changing width to depict changes in thematic strength of temporally associated documents. Colored “currents” flowing within the river narrow or widen to indicate decreases or increases in the strength of an individual topic or a group of topics in the associated documents. The river is shown within the context of a timeline and a corresponding textual presentation of external events.Keywords: visualization metaphors, trend analysis, timeline1.IntroductionIn exploratory information visualization, one goal is to present information so that users can easily discern patterns. Patterns reveal trends, relationships, anoma-lies, and structure in the data, and may help usersFigure 1: ThemeRiver™ uses a river metaphor to represent theme changes over time.confirm knowledge or hypotheses. Perhaps more impor-tantly, they also raise unexpected questions leading users to new insights. The challenge is to create visuali-zations that enable users to find patterns quickly and easily. ThemeRiver, shown in Figure 1, is a prototype system designed to reveal temporal patterns in text collections.Information visualization systems such as Envision [13], BEAD[1], LyberWorld [ 3, 4] and SPIRE [18] represent each document or group of documents with a glyph or icon, portraying various document attributes. Various methods have been explored for showing change over time in document-centric visualizations. See Section 3 below.However, a user may be less interested in documents themselves than in theme changes within the whole col-lection over time. For example, how did Shakespeare’s themes change during various periods of his life or in relation to contemporary events? Such information is difficult, if not impossible, to glean from most visuali-zations. A visualization that focuses on themes, rather than documents, could be more useful for such explora-tion.ThemeRiver provides users with a macro-view of thematic changes in a corpus of documents over a serial dimension. It is designed to facilitate the identification of trends, patterns, and unexpected occurrence or non-occurrence of themes or topics. In our prototype, we use time as the serial dimension. We provide contextual information through a timeline and markers for co-occurring events of interest. Figure 1 shows a sample ThemeRiver visualization. This paper describes the design of ThemeRiver, walks through a sample informa-tion exploration session, and discusses results of forma-tive usability testing.2.DesignOur major design goal was to provide a visualization of theme change over time. Consider using a histogram to visualize these changes. In a histogram (such as the one shown in Figure 2), each bar represents a time slice, and color variations and size within the bar represent the relative strength of themes specific to that slice. However, understanding the histogram requires users to work at integrating the themes across time because the bars are anchored to a baseline and the position of a particular theme within the bars may vary considerably.Like a histogram, ThemeRiver uses variations in width to represent variations in strength or degree of representation. However, it connects the strength values in adjacent time slices with smooth and continuous curves. The horizontal flow of the river represents the flow of time. Colored currents that run horizontally within the river represent themes. Each vertical section of the river corresponds to an ordered time slice.The width of each current changes to reflect the thematic strength for each time slice. For example, in Figure 1 the theme “soviet” increases in relative strength in June 1960 as indicated by the widening of the upper bright orange current. “Soviet” loses relative strength in July and August; thus the same current nar-rows in the next two time slices. “Soviet” then increases significantly in relative strength in September; the current widens proportionately.Currents maintain their integrity as a single entity over time. If a theme ceases to occur in the documents for a period of time and then recurs, the current likewise disappears and then reappears. Consistent color and relative position to other themes make theme currents easy to recognize. In Figure 1, the lower purple band depicts the changes in relative strength of the theme “cane.” The “cane” current occurs grows and shrinks over time; “cane” occurs most strongly in March 1961.We believe that ThemeRiver’s continuous curves have much to do with its usability. The Gestalt School of Psychology [8], founded in 1919 in Germany, theorized that with perception, “the whole is greater than the sum of the parts.” Simply put, during the perception process humans do not organize individual, low-level, sensed elements, but sense more complete “packages” that represent objects or patterns. In his recent book [6], Hoffman presents a compelling discus-sion of how our perceptual processes identify curves and silhouettes, recognize parts, and group them togeth-er into objects. Numerous aspects of the image influ-ence our ability to perceive these parts and objects, including similarity, continuity, symmetry, proximity, and closure. For example, it is easier to perceive objects that are bounded by continuous curves than those that contain abrupt changes [17].The vertical proximity of the river currents makes it easy for users to judge the relative width of currents and thus the relative strength of the themes. Similarly, sym-metry around the horizontal axis of the river, a current, or group of currents makes it easier for users to perceive flow patterns and changes. Widths of currents combine to show cumulative widening and narrowing, represent-ing changing strength for the selected set of themes as a whole.Values for theme strength can be calculated various ways. For example, they might represent the number of documents containing the word. Because the river loses its continuity and structure if there are too few or too many themes, we created several theme subsets for exploration.We have implemented a proof-of-principle prototype and used it to explore data from multiple sources. Figure 1 portrays data from a collection of speeches, interviews, articles, and other text associated with Fidel Castro. The visualization includes the river, a timeline below the river, and markers for related historical events along the top. With ThemeRiver, users may •display topic and event labels•display time and event grid lines•display the raw data points•choose among drawing algorithms for the currents and river.Users may also display the associated time or theme name by simply moving the mouse across the image. In addition, users may pan and zoom to see other time periods or parts of the river and to see more detail or broader context. In this sample data set, we found several interesting correspondences between themes and events, such as the expansion of the “oil” theme just before Castro confiscated American oil refineries (see Figure 1).3.Related WorkMany systems include features for viewing time. One common method is to show discrete time slices. For ex-ample, in the Spatial Paradigm for Information Retriev-al and Exploration (SPIRE) Galaxy visualization [18], users may choose to progressively step through time, showing only the icons for documents originating within each specified time period. Another common approach is to show time as an attribute of documents, as done in the Virginia Tech’s Envision system, which lets users map various metadata values, including date, to x-axis, y-axis, or color, shape, or size graphical encodings [13].More similar to ThemeRiver in intent are systems that focus directly on time. The LifeLines system, developed jointly by the University of Maryland and IBM, has been used to visualize medical records and juvenile criminal records [14, 15]. The visualization displays time along the x-axis and uses the y-axis to categorize events. Bars depict duration for a given event, and graphical attributes such as color show event attributes. TmViewer uses a similar approach, adding the ability to show parent-child relationships with lines between related time bars [10]. The DIVA system [12] uses animation to show how particular measured values change in relation to the temporal flow of a video. To help groups collaborating to create a document or other artifact, the Timewarp system developed at Xerox PARC [2] lets users view and edit multiple timelines of the changing state of that artifact. The metaphor used is similar to a state diagram, with lines connecting state nodes and branches. Additional work on timelines includes Karam’s [7] and Kullberg’s [9].We know of no other systems that use the river meta-phor to depict the passage of time. However, Tufte [16] presents a similar idea in an artist’s illustration showing trends in music. In that illustration, width represents sales and proximity indicates influence of preceding styles. Our work differs in several aspects, such as the use of color, the inclusion of contextual events, and the ability to generate the visualization automatically from a potentially very large collection of documents.ability EvaluationEarly in ThemeRiver’s development, we carried out a simple formative usability evaluation with two users. Questions we wanted to answer with this evaluation included•Do users understand the metaphor?•Can they identify themes that are more often discussed?•Does the visualization help them raise new questions about the data?•Do they interpret details of the visualization in ways we had not expected?•How does their interpretation of the visualization differ from that of a histogramshowing the same data?The data were the Castro collection described above, focusing on the years 1960-1963. We represented the same data both in ThemeRiver and in a histogram that we created using a spreadsheet. (See Figure 2.) We made the content of the histogram as similar as possible to ThemeRiver’s. For example, the histogram depicted thematic content by months, using the same values that drive ThemeRiver. The month timeline was shown along the bottom and we added an event line to the histogram like the one in ThemeRiver.Usability evaluation began with a brief explanation of the purpose of the session, followed by an introduc-tion to the data. Both participants viewed the data in both visualizations; one participant started first with thehistogram and one with ThemeRiver. We asked each participant questions about what they observed in each display.Examples of specific questions include•In July ’62, what are the three most discussedthemes?•Where is a new theme introduced?Examples of more general questions include•What looks interesting here – what do youwant to explore?•How would you like to change or manipulatethe view?We captured verbal protocol during this discussion. At the end, we asked participants to complete a short questionnaire, with feedback about the visualization and possible enhancements.From the verbal protocol and from user behavior, we observed that the users had no difficulty in understand-ing the metaphor. They were able to identify themes that were strongly represented and able to understand the relationship between the width of the current and theme strength. The visualization also triggered ques-tions about the reasons behind certain theme strengths and patterns. For exploratory visualizations, this is a good result; we believe that a visualization should help the user identify questions of interest to explore.Questionnaire responses showed that users found ThemeRiver easy to understand. They also found ThemeRiver useful, particularly for identifying macro trends. They told us that it was less useful for identi-fying minor trends because the curves tend to de-emphasize very small values. We asked about the value of the river metaphor, and users rated it highly as well. They observed that the connectedness of the river helped them follow a trend more easily over time than in the histogram; this result is compatible with the per-ception principles described by Ware [17].Users liked some features of the histogram and rec-ommended adding them to ThemeRiver. One such fea-ture is the ability to see numeric values that drive the histogram and river currents. One user expressed more trust in the histogram, because she “knew” that the bars were exactly the data values, whereas she was not sure exactly what the data values were in ThemeRiver. Her point is a valid one, especially because the curved linesFigure 2: Like ThemeRiver TM in Figure 1, this histogram uses the Castro collection data and depicts changes in thematic content over time.of ThemeRiver do require that we interpolate between data points to produce the curves. We have added the capability for users to see the exact data points on demand.Although users liked the abstraction to the whole collection and thus away from individual documents,both users suggested adding features to access docu-ments if desired. They wanted the ability to see the total number of documents during any time period and to get the text of each document on demand. They wanted to select a current and see the documents that contributed to it.Users also wanted the ability to reorder the theme currents. Options they discussed included user-defined ordering and ordering by correlation, so that themes appearing together in the documents would be nearby in the river.5. Interactions and Sample UsageBased on usability evaluation results, we added a number of features to combine the best of both the river metaphor and histogram capabilities. This section pre-sents a sample usage scenario, illustrating the capabilities of the current version.We used ThemeRiver to explore the 1990 Associ-ated Press (AP) newswire data from the TREC5 distri-bution disks, a set of over 100,000 documents (see Figure 3). To explore the selected themes in this collec-tion, a user might begin with a high-level survey of the visualization by panning along the course of the river.The user might look for wider currents that signal heavy use of a topic, such as the one for “baghdad” in Figure 3. Changes in the color distribution of the river signal changes in themes. We see such a change in August 1990, when the “kuwait” current, which had vanished in late July, suddenly appears and rapidly widens. The user could also look for narrow currents in the river that signal relatively light use of particular themes.In an earlier paper, Hetzler et al. [5] explored the AP data set with a variety of our visual analysis tools, fo-cusing on large theme changes surrounding the Iraqi invasion of Kuwait on August 2. ThemeRiver also re-flects these large theme changes. Near the right side ofFigure 3, we see several currents that expand dramatic-Figure 3: AP data from July - August 1990. A wide current in the river indicates heavy use of a topic,while changes in color distribution correlate to changes in themes.ally at the time of the invasion, which is shown on the event line above the river. Labels have been turned on for currents representing the themes “kuwait,” “iraq,”“saddam,” and “baghdad.” ThemeRiver reveals some additional detail not noted in the earlier study. The theme “oil,” which is persistent across the image, also expands noticeably at this time. The themes of “ku-wait,” “iraq,” and “saddam” show up in small bursts before the invasion but are not persistent. News stories corresponding with these bursts covered the verbal con-flicts leading up to the invasion. This distinction between persistent and bursty themes is one advantage that ThemeRiver provides over document-centric visu-alizations.During late June and throughout July 1990, the themes appear relatively consistent. A user interested in the more prominent themes might turn on theme labels as shown in Figure 3 to discover that the main themes represent “bush” (President Bush), “germany” (the re-unification discussions), and “communist.” Some smaller variations in theme are also apparent, such as the widening of the “nato” band, related to the NATO decision to redefine their military strategy.Figure 4 shows the ThemeRiver from earlier in the summer of 1990. In late May, a large change in theme strength is shown, this time not matching any previously identified events. Some of the larger currents here are “gorbachev,” “bush,” and “summit.” This might suggest that Bush and Gorbachev both attended a summit. Viewing the pertinent news documents from that time, we found that a four-day summit meeting took place in Washington among several world leaders, including Bush and Gorbachev.Figure 4: ThemeRiver™of AP data from June - July 1990 identifies very different events from those revealed immediately afterwards (Figure 3).Some more subtle changes can also be seen in Figure 4. For example, a small current near the middle of the river expands slightly near the beginning of June and again near the end of the month. This is the current for “earthquake.” The wider areas correspond with the quakes in Peru and Iran respectively.In each of the figures shown so far, there are portions of the river that are extremely narrow overall. In fact, for the AP rivers (Figures 3 and 4), the river seems to narrow quite frequently. On closer inspection, we see that the narrow spots correspond with Sundays. Because the river contains only a subset of the themes in thecollection, we do not know at this point whether the news is generally lighter on Sunday or whether other topics dominate on that day. This uncertainty is one of the points that came up early in user testing. In response, we have added a feature allowing the user to show a histogram representing the total number of documents in a given time slot, along with the portion represented by the themes in the river (see Figure 5). With this histogram, it is apparent that in general fewer news stories are released on Sunday than on other days of the week.Sometimes users may want to compare theme changes in one set of documents to those in another set; alternatively, they may wish to partition a collectionbased on metadata and compare the themes in the two partitions as separate rivers. Figure 6 shows two parallel rivers: the lower river shows AP news stories from Washington, D. C. and the upper river shows the news stories from New York. Some differences in major themes are immediately apparent. The Washington themes emphasize Bush, the Senate, and the Supreme Court. The New York stories show a major growth in the themes “apartheid” and “mandela”; this corresponds with the visit of Nelson Mandela to the US. He arrived first in New York, where he spent several days before proceeding to Washington.6.Discussion and Design ChallengesIdeally, a visual metaphor facilitates discovery by presenting data in an intuitive, easy way that is consis-tent with the user’s perceptual and cognitive abilities. Lakoff and Johnson [11] argue that metaphors are wired into our understanding of particular concepts, using evidence from common linguistic expressions. One ex-ample they cite is the many English expressions that imply that Anglo-Americans understand time in termsFigure 5: The addition of a histogram to Theme-River™ reveals that news is light on Sundays, not that themes shift.Figure 6: Parallel rivers let users compare AP data from Washington, D.C. and New York from the same time period.of motion relative to ourselves. Some expressions char-acterize time as moving (e.g., “the time will come,”“don’t let the opportunity pass”), while others imply that people are the ones moving through time (e.g., “as we go through the years”). From formative usability evaluation and anecdotal feedback, we have observed that the river metaphor is intuitive and easy to under-stand. We believe the river metaphor of theme currents changing over time gets part of its strength from this cultural understanding.Focusing on themes rather than documents changes issues of scalability. ThemeRiver visualizations have little dependence on the number of documents repre-sented. For example, if theme strength is determined by the number of documents containing each theme word,a single pass through the collection is needed to calculate the values, which may be displayed similarly regardless of collection size. On the other hand, the number of currents that can be reasonably included in a single river is limited. Options for addressing this issue include grouping through color families, as suggested in Figure 7, or using each current to represent a set of themes rather than a single theme.Color choices pose an interesting design challenge.Color perception depends on local contrast. However,because themes come and go, it is impossible to predict which colors will be adjacent at any given time. More-over, we want to show a relatively large number of themes in the river and still achieve acceptable dis-criminability. Currently we are exploring a solution sug-gested during formative usability evaluation: sortingthemes into related groups and displaying each group with a color family. Figure 7 shows a portion of our color legend with such an ordering, which emphasizes changes in related themes and may make it easier to understand relationships among them.A key cognitive advantage of the river metaphor over a simple histogram lies in the curving continuous lines that define the boundaries between topic currents.But it is also important that the visualization not mislead users. Because dates are not continuous data,we must approximate the true boundaries by interpolat-ing between discrete data points. As long as the reso-lution of the data is sufficient, ThemeRiver provides an overview that meets our criteria for intuitiveness, ease of use, and integrity. If the user zooms in farther than the data resolution supports, the “truthfulness”approximated by the interpolated lines is questionable.While the resolution of data forces a lower limit on the level of zoom, we can deal with the problem of “too much” resolution by combining time slices. That is, as the user zooms out, we can increase the amount of time per time slice and combine theme weights. In this way,we can maintain a suitable level of truthfulness without slowing the rendering speed to a crawl by trying to draw more detail than necessary.With interactive visualizations, calculation and draw-ing speeds are important. For the current features of ThemeRiver, it is sufficient to calculate the drawing points on startup and then recalculate only after a con-figuration change. Nevertheless, a fast, efficient algo-rithm is needed. We are investigating curved-line algo-rithms and ways to speed up both the calculations and the rendering.7. ConclusionsThemeRiver is a demonstration prototype, developed to test the value of the metaphor. We are continuing to add interaction capabilities to it. We also need to develop ways to build the event timeline automatically and to provide more flexibility in selecting and ordering the theme currents. From formative usability evaluation,we learned that users want to know more about the context of the river and want to access the documents that contribute to it at a particular point in time.We conclude that ThemeRiver is potentially valuable for information analysts and plan to develop it into afull production system.Figure 7: Tracking related themes is simplified by assigning them to the same color family. This ensures related themes appear together and are identifiable as a group.8.AcknowledgmentsWe gratefully acknowledge the contributions of our colleagues at Battelle to the development and testing of the ThemeRiver visualization. Special thanks for contributions to this paper go to Grant Nakamura, Alan Willse, Sharon Eaton, Wanda Mar, and Dan Donohoo. Battelle Memorial Institute’s Information Synthesis Platform funded this research.9.References1. D. Brodbeck, M. Chalmers, A. Lunzer, and P. Cotture,“Domesticating Bead: Adaptiing an Information Visualization System to a Financial Institution,”Proceedings of InfoViz ’97. IEEE Computer Society, Los Alamitos, CA, 1997, pp. 73-80.2.K.W. Edwards and E.D. Mynatt, “Timewarp: Techniquesfor Autonomous Collaboration,” Proceedings of CHI’97, Association for Computing Machinery, Inc., 1997, pp.218-225.3.M. Hemmje, “LyberWorld: a 3D Graphical User Inter-face for Fulltext Retrieval,” Conference Companion on Human Factors in Computing Systems, 1995, pp. 417 -418.4.M. Hemmje, C. Kunkel, and A. Willett. “LyberWorld - aVisualization User Interface Supporting Fulltext Re-trieval,” Proceedings of the 17th Annual International ACM-SIGR Conference on Research and Development in Information Retrieval, 1994, pp. 249 -259.5. B. Hetzler, P. Whitney, L. Martucci, L., and J.Thomas, “Multi-faceted Insight Through Interoperable Visual Information Analysis Paradigms,” Proceedings of IEEE Symposium on Information Visualization, InfoVis '98, 1998, pp.137-144.6.D.D. Hoffman, Visual Intelligence: How We CreateWhat We See, W.W. Norton & Company, Inc., New York, 1998.7. G.M. Karam, “Visualization Using Timelines,”Proceedings of the 1994 International Symposium on Software Testing and Analysis, 1994, pp. 125-137.8.K. Koffka, (1935), Principles of GestaltPsychology, Harcourt-Brace, New York, 1935.9.R.L. Kullberg, “Dynamic Timelines: Visualizingthe History of Photography,” Proceedings of CHI ’96, 1996, pp. 386-397.10.V. Kumar and R. Furuta, “Visualization ofRelationships,” Proceedings of Hypertext 99, ACM Press, Darmstadt, Germany, 1999.11.G. Lakoff and M. Johnson, Metaphors We Live By.University of Chicago Press, Chicago, 1983.12.W. Mackay and M. Beaudouin-Lafon, “Diva:Exploratory Data Analysis with Multimedia Streams,” Proceedings of CHI’98, 1998, pp. 416-423.13.L.T. Nowell, R.K. France, D. Hix, L.S. Heath, andE.A. Fox, “Visualizing Search Results: SomeAlternatives to Query-Document Similarity,”Proceedings of SIGIR ’96, ACM Press, Zurich, 1996, pp. 67-75.14. C. Plaisant, D. Heller, J. Li, B. Shneiderman, R.J.Mushinlin, and J. Karat, Visualizing Medical Records with LifeLines. CHI ’98 Summary, 1998, 28-29.15.C. Plaisant, B. Milash, A. Rose, S. Widoff, and B.Shneiderman, “Lifelines: Visualizing Personal His-tories,” Proceedings of CHI ’96, Association for Computing Machinery, Inc, 1996, pp. 221-227. 16. E.R. Tufte, Visual Explanations: Images andQuantities, Evidence and Narrative, Graphics Press, Cheshire, CT, 1997, 90-91.17.C. Ware, Information Visualization: Perception forDesign, Academic Press, San Diego, 2000.18.J.A. Wise, J.J. Thomas, K. Pennock, D. Lantrip, M.Pottier, A. Schur,, and V. Crow, “Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents,” S.K Card, J.D.Mackinlay, and B. Shneiderman, (editors.), Read-ings in Information Visualization: Using Vision to Think, Morgan Kaufmann, San Francisco, 1999, pp. 442-45。