Information Technology

IBM Innovates Enterprise Anti-Spam Filter to Combat Email Spam

Spam is a massive problem – it currently accounts for between 1/3 and 1/2 of all emails and costs companies billions of dollars as the result of lower productivity, loss of legitimate messages and the need for increased bandwidth and storage. In a bid to try solve the problem, IBM has brought together scientists from different areas of research division to develop an enterprise anti-spam filtering system which combines several different filtering technologies to create the ultimate anti-spam system. For example one of the spam filters – Chung-Kwei – is a pattern-discovery-based system which uses an algorithm developed by life sciences researchers focused tackling computational biology challenges such as gene finding and protein annotation. By itself, Chung-Kwei detected 96.56 percent of spam messages with just a .066 percent false positive rate during tests conducted in IBM’s labs. By combining Chung-Kwei with the other spam filtering techniques, IBM researchers have created SpamGuru – a prototype anti-spam system which they believe has the potential to eliminate virtually all spam.

SpamGuru: An Enterprise Anti-Spam Filtering System

IBM Research is developing an enterprise-class anti-spam filter as part of our overall strategy of attacking the Spam problem on multiple fronts. Our anti-spam filter, SpamGuru, mirrors this philosophy by incorporating several different filtering technologies and intelligently combining their output to produce a single spamminess rating or score for each incoming message. The use of multiple algorithms improves the system’s effectiveness and makes it more difficult for spammers to attack. While a spammer may defeat any single algorithm, SpamGuru can rely on its remaining algorithms to maintain a high-degree of effectiveness.

SpamGuru’s filtering architecture uses multiple classification algorithms which are integrated into a single classification pipeline. SpamGuru’s pipeline allows it to benefit from multiple classifiers with minimum extra computational cost. SpamGuru’s classification technologies include spoof detection, Bayesian filtering, plagiarism detection, automatically generated white- and black-lists, and Chung-Kwei, a novel technique that uses advanced pattern-matching algorithms developed by IBM’s bioinformatics group.

Chung-Kwei: a Pattern-discovery-based System for the Automatic Identification of Unsolicited E-mail Messages (SPAM)

Chung-Kwei is a system that we developed recently for the analysis of electronic mail and the automatic identification and tagging of unsolicited messages (=spam). The underlying method uses pattern-discovery and has its underpinnings in a generic approach that has been behind successful solutions we developed for tackling computational biology problems such as gene finding and protein annotation. Chung-Kwei can be trained very quickly using a body of known spam/white messages and can do so without interrupting the ongoing classification of incoming e-mail. The prototype system, that we developed by training on a repository of 87,000 spam and white messages, achieved a sensitivity of 96.56% with a false positive rate of 0.066%, or one-in-six-thousand messages. In terms of speed, the Chung-Kwei prototype is capable of classifying approximately 200 messages per second, on a 2.2 GHz Intel-Pentium platform.

Comments (0)

Write a comment