Shared user behavior on the world wide web
- 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
- 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
- 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。
Copyright 1997. Association for the Advancement of Computing in Education (AACE). Distributed via the Web by permission of AACE.
Shared User Behavior on the World Wide Web
Ghaleb Abdulla, Edward A. Fox and Marc Abrams
Network Research Group, Computer Science Department, Virginia Tech
Blacksburg, VA 24061-0106
{abdulla, fox, abrams}@
Abstract: Studying accesses to Web servers from different user communities helps identify similarities and differences in user access
patterns. In this paper we identify invariants that hold across a collection of ten traces representing traffic seen by proxy servers. The
traces were collected from university, high school, governmental, industry, and online service provider environments, with request
rates that range from a few accesses to thousands of accesses per hour. In most of the workloads a small portion of the clients are
responsible for most of the accesses. In addition most of the accesses go to a small set of servers. By doing a longitudinal study on the
collected data we noticed that the identified invariants do not change over a year period. However, the percentage of script generated
documents, is increasing.
Introduction
In recent years the World Wide Web (WWW or Web) has grown rapidly as a dissemination tool for different kinds of information resources. Frequently, the Web is used for deployment of educational and commercial material. Educators are using the Web to post course notes, syllabi, homework assignments, and even exams and quizzes. Companies are using the Web for advertising, publicity, and to sell products.
The dynamics of Web traffic are not well understood. There are several differences between the Web and other types of network traffic. Those differences emerge from the HTTP protocol used and Web users’ behavior. With respect to the HTTP protocol, clicking on hyperlinks that are part of HTML pages generates traffic and, as a result, a new HTML page or an image is displayed. HTML pages contain formatted text and graphics. Sometimes links in HTML pages lead to other types of media, such as video or audio. In contrast, traditional network traffic has formatted or unformatted text, and rarely uses graphics, video or audio. With respect to users, the low level of expertise required to navigate with a Web browser has resulted in a large and diverse user population. Therefore, it is reasonable to assume that Web users behave differently from those who use other network resources. The status of Web servers and network connections and how fast they can respond is a factor that affects future accesses by users.
In this paper we examine ten traces that were collected from university, high school, governmental, industry, and online service provider environments, with request rates that range from a few accesses to thousands of accesses per hour. We analyze the traces in order to understand the way users interact with the Web and to explore if users with different backgrounds display different behavior when using the Web. We look for invariants that hold across the traces.
We examined the collected traces to find out if there are similarities between accesses from educational institutions versus accesses from industry, government, or home. We study accesses made by a group of users who either share the same workplace (and they are potential users of a proxy server if available) or use a proxy server. A proxy is a server that can act as a cache and a gateway. It can send requests for Web documents as well as serve Web documents from its cache. A company might not have individual PCs on the Internet for security, Yet the PCs are given Web access by using a gateway or a proxy. For a group of clients a proxy looks like a Web server and for a Web server it looks like a client. The browsers on the client side can be configured to point to the proxy so that any access from the client goes