Detect and Notify Abnormal SMTP Traffic and Email Spam over Aggregate Network

合集下载

1、下载文档前请自行甄别文档内容的完整性，平台不提供额外的编辑、内容补充、找答案等附加服务。
2、"仅部分预览"的文档,不可在线预览部分如存在完整性等问题,可反馈申请退款(可完整预览的文档不适用该条件!)。
3、如文档侵犯您的权益，请联系客服反馈,我们会尽快为您处理(人工客服工作时间：9:00-18:30)。

J OURNAL OF I NFORMATION S CIENCE AND E NGINEERING 21, 571-578 (2005)
Short Paper_________________________________________________ Detect and Notify Abnormal SMTP Traffic and
Email Spam over Aggregate Network
S U-C HIU Y ANG AND L I-M ING T SENG
Department of Computer Science and Information Engineering
National Central University
Chungli, 320 Taiwan
E-mail: center7@.tw
As all the traffic between the public Internet and the customer’s desktop must be interconnected through ISP’s access network, this work thus makes use of the transpor-
tation traffic log gathered from backbone router to develop SMTP flooding detection
system (SFDS), so that the most spam could be detected and stopped at the original
fan-out network. The system has been deployed over a TANet (Taiwan Academic Net-
work) backbone node for assisting network users grasping the abnormal SMTP sources
with suddenly increase email requests. The result indicates that there is a high proportion
of the notified spam could be detected in advance.
Keywords: SMTP flooding detection, spam, anomaly notification, Rwhois, IP route MIB
1. INTRODUCTION
Making use of the simplicity of SMTP, sophisticated intrusion tools, and self- propagation email viruses, spammers had successfully mounted up a range of innocent machines to cover themselves distribute massive spam to global network users. Once a system was infected by mail virus, it not only attempts to copy itself to the exploitable destination systems, it also initials its SMTP engine to delivery spam messages at very high speed that easily falls outside historical norm [1]. The sustaining flood of in-your- face spam often fills the intermediate systems, networks, disks with unwanted messages [2].
Obviously, spamming behavior itself can make the measurement task easier if the feature could be finely narrowed down according to its flooding characteristics. Making use the transportation traffic logs exported from access router, this work develops the SMTP flooding detection system (SFDS) for adding an extra level spam filter over the upstream access network of Internet service providers’ (ISPs) backbone network.
571
S U-C HIU Y ANG AND L I-M ING T SENG
572
1.1 Related Work
As spam often has characteristic styles, phrases, and disclaimers, several researches took advantage of these characteristics to detect spam activities by checking the mail content received by SMTP server systems. In [3], the authors employed data mining technology to search mail log file over the server node, to discover the consistent and useful patterns of email system for recognizing anomalies and known intrusions. Bayes-ian anti-spam filter has been widely deployed over mail servers to learn the terms usually found in the spam, assess and filter the new coming messages to mitigate the over-whelming spam traffic [4].
Since Bayesian anti-spam filters are deployed at SMTP server/client sites, however, the precious mail resources over intermediate systems are still heavily consumed for transmitting the massive spam and the reject messages. In addition, the complex content matching of statistical spam filter also caused itself the target of email bombing attack. A sophisticated SMTP flooding program can severely overwhelm the victim server and obviously affect its performance and availability. This situation drives us to measure and detect SMTP flooding over upstream access network.
1.2 Transportation Traffic Log
Rapid growth of Internet deployment and usage has created a massive increase in the demand for measurement technology to provide the information required to record network utilization and help network administrator solving network problem. Packet- based traffic traces captured by Tcpdump and flow-based NetFlow data aggregated by router are the most common resources exploited by network operators for measuring transportation traffic.
Tcpdump uses libpcap library to capture TCP/IP packets and presents these packets in textual format. In addition to the layer 4 traffic log, tcpdump could be even employed to echo packet information up to payload content, and standard output to a disk file [5]. Typically, network administrator adopts the collection tools to obtain traffic traces for gain better understanding of traffic characteristics [6-8].
One additional approach to gather transportation logs is the router that used many IP packet header fields to forward each packet that has been passed through it [9, 10]. Each NetFlow log records counter for packets and bytes that are updated according to each packet that matches the key of the 5-tuples flow identifier that consists of the source IP, source port, destination IP, destination port, and the adopted transportation protocol. And each bi-direction TCP session is typically expressed as both Netflow records, one repre-sents the traffic from client to server, and the other represents traffic from server to client.
This work use flow-tools [11], shareware developed by Ohio State University (OSU), to collect NetFlow data exported from one TANet backbone router to support subsequent SMTP traffic measurement. The remainder of the paper is organized as fol-lows. Section 2 presents feature of SMTP flooding, and illustrates feature based traffic aggregation and multi-thresholds anomaly detection mechanisms. After that, relation between the notified spam and detected anomalies is analyzed. Section 3 depicts the con-struction Rwhois direction service and the automatic anomaly notification. Finally, sec-tion 4 draws conclusions.
S PAM, A BNORMAL SMTP T RAFFIC M EASUREMENT 573
2. SMTP FLOODING DETECTION
The suddenly increase mail flows from a single mail client does not happen by ac-cident. Most of them are caused by the compromised system that had been exploited to distribute massive spam messages. Thus, the feature based traffic aggregation and multi- thresholds anomaly detection programs were employed by this work to measure and de-tect anomalies according to feature of SMTP flooding.
2.1 Feature of SMTP Flooding
Fig. 1 shows the typical SMTP transaction model. Since the spamming source might generate and send out excessive messages to destination servers, thus, the SMTP requests originated from the flooding source will hugely arises and converges at the destination SMTP port. While the other SMTP flows responded from server systems to spamming source diverges among the non-well-known port numbers 1024-65535.
Fig. 1. SMTP flooding model.
Therefore, this work constructed the set of feature that consists of source IP, desti-nation IP, destination SMTP port attributes as the virtual flow identifier to achieve the subsequently SMTP traffic aggregation and anomaly detection, not just the single appli-cation port or the single address attribute as many systems now do.
2.2 Feature-Based Traffic Aggregation
This work uses the 3-tuples flow attributes as the key to aggregate SMTP traffic sent from a mail source to destination server. The program firstly reads the NetFlow data from the dedicated disk files for each hour time interval, to accumulate traffic and store
S U -C HIU Y ANG AND L I -M ING T SENG
574
the results to the corresponding traffic variables, we use the flow h [vr_flow i ], pkt h [vr_flow i ] and byte h [vr_flow i ] to denote the amount of SMTP flows, packets, bytes, carried through the virtual flow. After that, the traffic variables were summarized and sorted out to high-light the abnormal SMTP traffic.
Fig. 2 shows the aggregated top-N SMTP traffic over the subject network on Nov 2nd 2003. Several traffic variables, such as, the mean packet size, the number of flow, the number of packet, the number of byte, were kept so that network administrator can figured out spam senders easily. Take the traffic transmitted from the host with the ad-dress of 163.25.154.253 as an example, the system successfully sent out thousands of emails to several servers in the single hour. Obviously, the suddenly increase SMTP traf-fic from the single source located at a high school campus was definitely different from
the other SMTP clients.
(a) Monitoring abnormal SMTP traffic.
(b) Monitoring the detected spam traffic.
Fig. 2. Monitoring SMTP flooding traffic.
S PAM, A BNORMAL SMTP T RAFFIC M EASUREMENT 575
2.3 Multi-Thresholds Anomaly Detection
Spammers deliberately use malformed and alternative flow rate to hide their spamming behavior from detect. A multi-threshold anomaly detection program was de-signed to aggregate the interested traffic for filtering out flooding traffic without too much false alarms. The program reads the aggregated traffic variables of the top-N SMTP sessions to accumulate the flooding duration, mean flow rate, and mean packet size of each SMTP source to help determining the abnormal SMTP flooding. We use flow_count [source i ], pkt_count [source i ], byte_count [source i ], pkt_size [source i ], dura-tion [source i ], and flow_rate [source i ] to denote the amount of SMTP flows, packets, bytes, mean packet size, flooding duration, and mean flow rate of each SMTP source system.
Firstly, the packet numbers, byte numbers, sent from each email source to any mail servers were aggregated according to formulas (1) though (2), and the mean packet size was evaluated accordingly (formula (3)). Then, the flow numbers, flooding duration, and mean flow rate variables of each email source were summarized according to formulas (4) though (6). Finally, the program compared the aggregated traffic variables with the flow rate, flooding duration and mean packet size thresholds values to determine the suddenly increase SMTP traffic accordingly (formula (7)).
_[][_]C i h i
i byte count source byte vr flow ==∑ (1) 0_[][_]C
i h i
i pkt count source pkt vr flow ==∑ (2)
_[]_[]_[]
i i i byte count source pkt size source pkt count source = (3) 0_[][_]C
i h i
i flow count source flow vr flow ==∑ (4) duration [source i ] = duration [source i ] + 1, for ‘each’ hour (5) _[]_[][]
i i i flow count source flow rate source duration source = (6) D smtp_flooding [source i ] = 1, if ((pkt_size [source i ] > threshold_pkt_size ) and
(flow_rate [source i ] > threshold_flow_rate ) and
(duration [source i ] > threshold_duration )) (7)
In addition to IP header (20 bytes in length) and TCP header (20 bytes in length), the mail segment also involves: Received, Date, From, Message-ID, and to lines [13].
S U-C HIU Y ANG AND L I-M ING T SENG
576
Thus, 100 bytes per packet was estimated by the system as the reasonable mean packet size to help distinguishing the successfully transmitted email from the rejected and re-transmitted undeliverable messages.
The flooding traffic with mean flow rate above 180 flows per hour, and has contin-ued the flooding behavior for more than 5 hours, could be detected and showed on Fig. 2 (b). While the traffic of rejected and retransmitted undeliverable messages, with mean packet size much less than 100 bytes per packet, was displayed on the lower part of the monitoring page. However, the single flooding source with IP address of 163.25.154.253 can be determined easily, for the huge amount of SMTP flows is outside its historical norm.
2.4 Link between Detected Anomalies and Reported Spam
In cope with the rapid growth of spam traffic, Internet users are encouraged to re-port the abuse event to the designated service sites, such as , and the respon-sible administrator, such as abuse@domain. In this work, a Perl script was employed to parse the anomalies detected by SDS system, and the abuse source list that had been no-tified by , to analyze the relation between the detected anomalous sources and the reported spam.
Table 1 lists the link between the spamming senders that had been reported to abuse@domain and the detected SMTP flooding sources over the subject network over the past several months. The result shows that there is a high proportion of the reported spam could be picked up from the detected flooding sources. Considering the high link between SMTP flooding sources and the notified spam, an automatic anomaly notifica-tion program was developed by this work so that the responsible owners can be notified to fix the compromised system as soon as possible.
Table 1. Link between detected anomalies and the reported spam.
Reported spamming host number
Spamming host number detected by the SFDS system
Oct-2003 8 6 of 8 75 %
Nov-2003 8 7 of 8 88 %
Dec-2003 9 7 of 9 78 %
Jan-2004 12 12 of 12 100 %
Feb-2004 8 6 of 8 75 %
Mar-2004 13 11 of 13 85 %
Apr-2004 20 18 of 20 90 %
May-2004 10 10 of 13 78 %
3. ANOMALY NOTIFICATION
Security is a combination of technique, administrative and physical controls [13]. Once the anomalous SMTP flooding was detected, the responsible administrator should
S PAM, A BNORMAL SMTP T RAFFIC M EASUREMENT 577 be notified to verify the problem and correct the incident. In this work, the IP route in-formation over the subject network was rebuilt for offering contact information data base. And Net::Rwhois.pm and Mail::Sendmail.pm Perl modules were integrated to achieve the notification.
3.1 IP Route MIB and Rwhois Direction Service
Typically, the contact technician information that consists of the organization name, contact administrator, email address, telephone number, IP sub-network addresses and IP address of the routing interface of the campus network are kept for being able to contact the technically-competent person so that the suspect and the identified threats can be quickly addresses and quickly resolved by the responsible technician.
The system firstly pulled ipRoute Simple Network Management Protocol (SNMP) Management Information Base (MIB) tree [14-16] form aggregate router to construct IP routing data. After that, the routing information was related to the contact technician data to construct Rwhois data base, a shareware package [17].
3.2 Automatic Anomaly Notification
Firstly, the program referred source IP attribute of the detected anomalous record, retrieve the responsible administration information through Rwhois.pm module. Based on the contact data, the program employs Sendmail.pm module to send out the aggre-gated flooding records to the responsible administrator. Thus, the responsible adminis-trator or technician can be informed to resolve the problem, rather than just monitoring.
4. CONCLUSIONS
Making use of NetFlow data gathered from aggregate network, this work developed SFDS system to determine SMTP flooding through feature-based traffic aggregation and multi-thresholds anomaly detection. And an automatic anomaly notification program was designed to notify the responsible administrator according to the detected flooding traffic. The analyzed result indicates that there is a high proportion of the reported spam could be detected by the FSDS system in advance. In the near future, the authors plan to inte-grated data mining and statistic technology into the system so that the sophisticated spamming with much lower flow rate can be determined.
REFERENCES
1.E. Levy, “The making of a spam zombie army: dissecting the sobig worms,” IEEE
Security and Privacy Magazine, Vol. 1, 2003, pp. 58-59.
2.D. Harris, “Drowning in sewage-SPAM,” Asia Pacific Regional Internet Conference
on Operational Technologies (APRICOT), 2004.
3.R. Grier and S. Schappel, “Junk email filter using naïve Bayesian classification,”
/news/archives/BayesianEmailFilter.pdf.
S U-C HIU Y ANG AND L I-M ING T SENG
578
4.H. Han, X. L. Lu, J. Lu, C. Bo, and R. L. Yong, “Data mining aided signature dis-
covery in network-based intrusion detection system,” ACM SIGOPS Operating Sys-tems Review, Vol. 36, 2002, pp. 7-13.
5.C. Williamson, “Internet traffic measurement,” IEEE Internet Computing, 2001, pp.
70-74.
6.D. Luca and S. P. A. Finsiel, “Effective traffic measurement using ntop,” IEEE
Communications Magazine, 2000, pp. 138-143.
7.H. Wang, D. Zhang, and K. G. Shin, “Detecting SYN flooding attacks,” The 21st
Annual Joint Conference of IEEE Computer and Communications Societies(IN-FOCOM 2002), Vol. 3, 2002, pp. 1530-1539.
8.M. Roesch, “Snort − lightweight intrusion detection for networks,” in Proceedings of
13th Systems Administration Conference, 1999, pp. 229-235.
9.Cisco IOS Netflow Technology, /warp/public/cc/pd/iosw/prod-
lit/iosnf_ds.htm.
10.G. Huston, “Measuring IP network performance,” The Internet Protocol Journal,
2003, pp. 2-9.
11.M. Fullmer, “The OSU flow-tools package and cisco netflow logs,” in Proceedings
of 14th Systems Administration Conference (LISA 2000), 2000, pp. 291-303.
12.B. Costales and E. Allman, Sendmail, O’Reilly and Associates, Inc., 2003.
13.C. P. Pfleeger and S. L. Pfleeger, Security in Computing, Pearson Education, Inc.,
Prentice Hall, 2003, pp. 491-551.
14.Request for Comments: 1213, Management Information Base for Network Manage-
ment of TCP/IP-based internets: MIB-II.
15.Request for Comments: 1354, IP Forwarding Table MIB.
16.C. Huitema, Routing in the Internet, Prentice Hall, 1995, pp. 27-64.
17.Request for Comments 2167, Referral Whois (RWhois) Protocol.
Su-Chiu Yang (楊素秋) received the M.S. degree in Electronics Engineering from National Taiwan Institute of Technology, Taiwan in 1980, and the Ph.D. degree in Com-puter Science and Information Engineering from National Central University, Taiwan in 2004. She currently works as a senior programmer at the Computer Center in National Central University. Her research interests are in the area of computer network, network management, and intrusion detection.
Li-Ming Tseng (曾黎明) received the B.S. degree in Engineering Science, the M.S. and the Ph.D. degrees in Electrical Engineering from National Cheng Kung University, Taiwan in 1970, 1974, and 1980. Dr. Tseng is currently a professor in the Department of Computer Science and Information Engineering in National Central University, Taiwan. His research interests are in the area of distribution system, computer network, operating system, and resource discovery.。