Abstract
Cluster formation has three types as supervised clustering, unsupervised clustering and semi supervised. Clustering algorithms are based on active learning, with ensemble clustering-means algorithm, data streams with flock, fuzzy clustering for shape annotations, Incremental semi supervised clustering, Weakly supervised clustering, with minimum labeled data, self-organizing based on neural networks. Semi-supervised clustering is combination of supervised clustering and unsupervised clustering. It has an important impact on clustering [2]. Clustering ensemble is one of the most recentadvances in unsupervised learning. It aims to combine theclustering results obtained using different algorithms or fromdifferent runs of the same clustering algorithm for the samedata set, this is accomplished using on a consensus function, theefficiency and accuracy of this method has been proven in manyworks in literature.It introduces a method of clustering based on pairwise constraints [3]. This method uses neighborhood framework and select most informative point. By performing the query against all data points, data points are clustered. Therefore, a number of semi-supervised clustering algorithms have been proposed, but few of them are specially designed for high dimensional data. High dimensionality is a difficult challenge for clustering analysis due to the inherent sparsedistribution, and most of popular clustering algorithms including semi-supervised ones will be invalid in high dimensional space. A semi-supervised hierarchical clusteringalgorithm for high dimensional data is proposed, which is based on the combination of semisupervised clustering and dimensionality reduction [1]. In order to achieve high harmony betweendimensionality reduction and inherent cluster structure detection, the number of dimensions isreduced sequentially as the clusters are gradually formed in the hierarchical clusteringprocedure.Finding clusters in high dimensional data is a challenging task asthe high dimensional data comprises hundreds of attributes [4].Subspace clustering is an evolving methodology which, insteadof finding clusters in the entire feature space, it aims at findingclusters in various overlapping or non-overlapping subspaces ofthe high dimensional dataset. Density based subspace clusteringalgorithms treat clusters as the dense regions compared to noiseor border regions. Many momentous density based subspaceclustering algorithms exist in the literature [5]. Each of them ischaracterized by different characteristics caused by differentassumptions, input parameters or by the use of differenttechniques etc. Hence it is quite unfeasible for the futuredevelopers to compare all these algorithms using one commonscale [6].The aim of Semi-supervised clustering algorithm is to improve theclustering performance by considering the user supervision based onthe pairwise constraints. In this paper, we examine the active learningchallenges to choose the pairwise must-link and cannot-linkconstraints for semisupervised clustering [7].The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research anddevelopment [8]. The performance issues of the data clustering in high dimensional data it is necessary to study issues likedimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzedand improved [9].
View more >>