Abstract
Semi-supervised clustering (SSC) is an important research problem in machine learning. While it is usually expected that the use of unlabelled data can improve performance, in many cases SSL is outperformed by supervised learning using only labelled data. To this end, the construction of a performance-safe SSL method has become a key issue of SSC study. In this paper classified the effect of fast food on human body by clustering with supervised learning and improve the clustering. This paper also use feature selection and feature extraction. Clustering is the technique used for data reduction. It divides the data into groups based on pattern similarities such that each group is abstracted by one or more representatives. Recently, there is a growing emphasis on exploratory analysis of very large datasets to discover useful patterns. This paper explains extracting the useful knowledge represented by clusters from textual information contained in a large number of emails for text and data mining techniques. E-mail data that are now becoming the dominant form of inter and intra organizational written communication for many companies. The sample texts of two mails are verified for data clustering. The cluster shows the similar emails exchanged between the users and finding the text similarities to cluster the texts. In this paper the use of Pattern similarities i.e., the similar words exchanged between the users by considering the different Threshold values are made for the purpose. The threshold value shows the frequency of the words used. The representation of data is done using a vector space model. The semi-supervised projected model-based clustering algorithm (SeSProC) also includes a novel model selection approach, using a greedy forward search to estimate the final number of clusters. The quality of SeSProC is assessed using synthetic data, demonstrating its effectiveness, under different data conditions, not only at classifying instances with known labels, but also at discovering completely hidden clusters in different subspaces. Depending on whether the unlabeled instances can be classified according to one of the known labels or there is the possibility of discovering new previously unknown clusters, we can refer to this problem as semisupervised classification or semi-supervised clustering, respectively. Our approach can be classed as semi-supervised clustering since instances are classified according to known labels, but new clusters can also be found if necessary. CoClustering discovers clusters of similar objects with regard to the value as well as clusters of similar features with regard to the object associated by them. Conforming to that, this paper represents review on most clustering and coclustering techniques containing different kinds of data.
View more >>