Email Mining: Tasks, Common
Techniques, and Tools
Guanting Tang, Jian Pei, and Wo-Shun Luk
School of Computing Science, Simon Fraser University, Burnaby BC, CANADA
Techniques, and Tools
Guanting Tang, Jian Pei, and Wo-Shun Luk
School of Computing Science, Simon Fraser University, Burnaby BC, CANADA
Abstract.
Email is one of the most popular forms of communication nowadays, mainly due to
its efficiency, low cost, and compatibility of diversified types of information. In order
to facilitate better usage of emails and explore business potentials in emailing, various
data mining techniques have been applied on email data. In this paper, we present a
brief survey of the major research efforts on email mining. To emphasize the differences
between email mining and general text mining, we organize our survey on five major
email mining tasks, namely, spam detection, email categorization, contact analysis,
email network property analysis and email visualization. Those tasks are inherently
incorporated into various usages of emails. We systematically review the commonly
used techniques, and also discuss the related software tools available.
Keywords: Email, data mining, tools, classification, clustering, social network analy-
sis.
Email is one of the most popular forms of communication nowadays, mainly due to
its efficiency, low cost, and compatibility of diversified types of information. In order
to facilitate better usage of emails and explore business potentials in emailing, various
data mining techniques have been applied on email data. In this paper, we present a
brief survey of the major research efforts on email mining. To emphasize the differences
between email mining and general text mining, we organize our survey on five major
email mining tasks, namely, spam detection, email categorization, contact analysis,
email network property analysis and email visualization. Those tasks are inherently
incorporated into various usages of emails. We systematically review the commonly
used techniques, and also discuss the related software tools available.
Keywords: Email, data mining, tools, classification, clustering, social network analy-
sis.