Abstract
This paper proposes an advanced methodology of spam detection by including N-gram tf.idf feature selection and a deep multi-layer perceptron neural network, with further improvement through the modified distribution-based balancing algorithm. Considering high-dimensional data and class imbalance problems, the proposed method proved to outperform the state-of-the-art methods on benchmark datasets, including Enron, SpamAssassin, SMS spam collection, and social networking data. It also makes up an important enhancement in the classification of spam, as it captures complex features that reduce false positives and false negatives. These results show that combining deep learning with improved feature extraction and balancing techniques provides a very robust approach for spam detection.
View more >>