计算生物学_Code-Red Worms Dataset(红虫数据数据集)

合集下载
  1. 1、下载文档前请自行甄别文档内容的完整性,平台不提供额外的编辑、内容补充、找答案等附加服务。
  2. 2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
  3. 3、如文档侵犯您的权益,请联系客服反馈,我们会尽快为您处理(人工客服工作时间:9:00-18:30)。

Code-Red Worms Dataset(红虫数据数据集)
数据摘要:
The first incarnation of the Code-Red worm (CRv1) began to infect hosts running unpatched versions of Microsoft's IIS webserver on July
12th, 2001. The first version of the worm uses a static seed for it's random number generator. Then, around 10:00 UTC in the morning of July 19th, 2001, a random seed variant of the Code-Red worm (CRv2) appeared and spread. This second version shared almost all of its code with the first version, but spread much more rapidly. Next, on August 4th, a new worm began to infect machines exploiting the same vulnerability in Microsoft's IIS webserver as the original Code-Red virus. Although the new worm had no relationship to the first one outside of exploiting the same vulnerability, it contained in its source code the string "CodeRedII" and was thus named CodeRed II. Finally, on September 18, 2001, the Nimda worm began to spread via backdoors left by CodeRedII, as well as via email, open network shares, and compromised web sites.
This dataset contains information useful for studying the spread of the Code-Red version 2, and CodeRedII worms. The dataset consists of a publicly available set of files that contain summarized information that does not individually identify infected computers.
中文关键词:
蠕虫,寄主,卵,出生和繁衍,传播,数据集,
英文关键词:
worm,hosts,seed,appeared and spread,spread,dataset,
数据格式:
TEXT
数据用途:
Information Processing
Classification
数据详细介绍:
Code-Red Worms Dataset
The Dataset on the Code-Red Worms
The first incarnation of the Code-Red worm (CRv1) began to infect hosts running unpatched versions of Microsoft's IIS webserver on July 12th, 2001. The first version of the worm uses a static seed for it's random number generator. Then, around 10:00 UTC in the morning of July 19th, 2001, a random seed variant of the Code-Red worm (CRv2) appeared and spread. This second version shared almost all of its code with the first version, but spread much more rapidly. Next, on August 4th, a new worm began to infect machines exploiting the same vulnerability in Microsoft's IIS webserver as the original Code-Red virus. Although the new worm had no relationship to the first one outside of exploiting the same vulnerability, it contained in its source code the string "CodeRedII" and was thus named CodeRed II. Finally, on September 18, 2001, the Nimda worm began to spread via backdoors left by CodeRedII, as well as via email, open network shares, and compromised web sites.
This dataset contains information useful for studying the spread of the Code-Red version 2, and CodeRedII worms. The dataset consists of a publicly available set of files that contain summarized information that does not individually identify infected computers.
Data included in the Code-Red Dataset includes:
Publicly Available:
Code-Red July: the first Code-Red version 2 outbreak (July 19-20, 2001)
distribution of start and end times of hosts performing port 80 TCP SYN scanning distribution of durations of time code-redv2-infected computers were observed to be scanning
country distribution of code-redv2-infected computers
a file containing a table with the following eight tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number, and AS name
Code-Red August: the second Code-Red version 2 outbreak and beginning of the spread of the CodeRedII worm (August 1-20, 2001)
distribution of start and end times of hosts performing port 80 TCP SYN scanning
distribution of durations of time code-redv2-infected computers were observed to be scanning
country distribution of code-redv2-infected computers
a file containing a table with the following seven tab-separated fields for each observed IP address: start time, end time, top-level domain, country, latitude, longitude, AS number
Data source:
Code-Red July:
The data source for this dataset includes packet headers collected from a /8 network at UCSD (the UCSD Network Telescope), timestamp/IP address pairs for TCP SYN packets received by two /16 networks at Lawrence Berkeley Laboratory (LBL), and sampled netflow from a router upstream of the /8 network at UCSD. These three data sources are used to maximize coverage of the expansion of the worm. Between midnight and 16:30 UTC, a passive network monitor recorded headers of all packets destined for the /8 research network. After 16:30 UTC, a filter installed on a campus router to reduce congestion caused by the worm blocked all external traffic to this network. Because this filter was put into place upstream of the monitor, we were unable to capture IP packet headers after 16:30 UTC. However, a second UCSD data set consisting of sampled netflow output from the filtering router was available at the UCSD site throughout the 24 hour period. Vern Paxson provided probe information collected by Bro on the LBL networks between 10:00 UTC on July 19, 2001 and 7:00 on July 20, 2001. We have merged these three sources into to produce the Code-Red July dataset.
Code-Red August:
The data source for this dataset includes only packet headers collected by a passive monitor on a /8 network at UCSD (the UCSD Network Telescope). Beginning August 4th, this data contains a mix of hosts infected by Code-Red version 2 and CodeRedII. It is not possible to determine which worm caused a host to send TCP SYN packets to port 80. Caveats that apply to this dataset:
The .ida vulnerability utilized by the Code-Red worms was exploited via TCP connections to port 80. Because the UCSD Network Telescope did not respond to connection attempts,
this dataset does not consist solely of worm traffic. All TCP SYN packets to port 80 received are included in these summaries, including non-worm traffic.
The DHCP Effect significantly impacts this dataset, particularly after the first 24 hours of each cycle of worm spread. Changing IP addresses on dynamically addressed machines cause an order of magnitude difference between the number of IP addresses active in any two hour period and the number of IP addresses active in a week. This dataset does not include IP address, so keep in mind that each start/end time or duration does not necessarily uniquely identify an infected computer. It identifies only a newly active IP address, with no information about whether that IP address represents a computer previously known to be infected.
Data Use Restrictions
Acceptable Use Policy for the public access files of the Dataset on the Code-Red Worms
Code-Red worm data, including every file in the Dataset on the Code-Red Worms, will not be redistributed.
I will not attempt to connect to, probe, or in any other way initiate contact with a machine or machine administrator identified via the Code-Red worm data.
In so far as possible, privacy of end users (hosts) and networks monitored by the network telescope will be respected by the researchers. Any publications will anonymize, aggregate or summarize IP addresses, network names, and domain names, as appropriate when the disclosure of such information may present a security risk to those organizations or the general Internet.
At the end of the research, or semi-annually (which ever is less), a summary of the research and any findings/conclusions will be reported to CAIDA. If any research is described on the WWW, a URL will be provided. This information is primarily used in reports to our funding agencies.
All users who publish a document (including web pages, and papers) using data from this dataset must provide CAIDA with a copy of the publication and must cite:
The CAIDA Dataset on the Code-Red Worms - July and August 2001, David Moore, Colleen Shannon, and kc claffy
/data/passive/codered_worms_dataset.xml.
Users are encouraged, but not required, to include the following attribution in the acknowledgments section of their document:
Support for the CAIDA Dataset on the Code-Red Worms was provided by Cisco Systems, the US Department of Homeland Security, the National Science Foundation, DARPA, and CAIDA Members.
All users who create a publicly available presentation using data from this dataset must provide CAIDA with a copy of the presentation and must use the full name of the dataset ("The CAIDA Dataset on the Code-Red Worms") in the presentation. Users are further encouraged, but not required, to include the url for the dataset
(/data/passive/codered_worms_dataset.xml) in their presentation. Code-Red Dataset Access
Publicly Available Code-Red Data
References
This dataset is cataloged in DatCat with handle
/collection/1-001P-M=CAIDA+Code-Red+Worm+Dataset.
For more information on the Code-Red-related worms (Code-Redv1, Code-Redv2, CodeRedII), see:
.ida vulnerability
eEye: Microsoft Internet Information Services Remote Buffer Overflow (SYSTEM Level Access)
Code-Red Worms
Code-Red version 1
Code-Red version 2
CodeRedII
Code-Red version 2 Spread Analysis
The Spread of the Code-Red Worm (CRv2)
Acknowledgments
Special thanks to Brian Kantor, Jim Madden, and Pat Wilson at UCSD and Barry Greene at Cisco for support of the UCSD Network Telescope Project. Rapid coordination of all of these folks in the face of a network crisis, along with an equally rapid and incredibly generous equipment donation from Cisco, allowed the collection of this unique dataset.
数据预览:
点此下载完整数据集。

相关文档
最新文档