The fifth ZIDONG Youths Innovation Salon was held in the Coffee hall and hosted by Professor Zhang Wensheng in the afternoon of January 23rd. Professor Yu Jian, the department chairman of School of Computer and Information Technology, Beijing Jiaotong University, attended and given an invited report with the theme of “Research on Clustering Axiomatization and its applications”.
The report firstly proposed that clustering is one of the important abilities to perceive the world for human being. In the era of “bid data”, it can be applied to solving the data partitioning problem. Professor Yu said, actually there is no strict definition of clustering, but researchers reached a basic and common recognition, which is, when n objects are divided into c subset, similar constraint within the objects within each subset and dissimilar constraint within the objects within different subset are satisfied.
Then, the report presented four basic steps of clustering analysis: data representation, clustering criterion, clustering algorithms and clustering evaluation, and expounded their mathematical definitions respectively. After summing up the typical application and a variety of basic theory involved of clustering algorithm, the report puts forward some questions: "whether cluster analysis has theory", " whether clustering algorithm has common properties" and "whether research on clustering axiomatization is feasible", which stimulated the thinking of audiences.
Furthermore, the report showed three current clustering axiomatization approaches: clustering criterion (objective function) axiomatization, cluster mapping axiomatization and clustering validity function axiomatization. Professor Yu believed that these approaches seemed very reasonable, but there are some disadvantages: 1) they are too specific and not applicable for many clustering algorithms; 2) there are few or no algorithm to meet them; 3) they cannot distinguish between partition and clustering. Therefore, we concluded that the clustering axiomatization in exist literature and the basic requirement of clustering are not closely related.
The basic requirement for clustering is only for the clustering results, not involving clustering function, clustering criterion, or clustering validity. Professor Yu studied the public nature of clustering results, started from the basic requirements of clustering, and obtained three axioms: Sample separability , class separability and similarity separability.
Then report analyzed the similarities and differences between the clustering results and the partition, and given the concept of boundary set. If boundary set is empty, the hierarchical clustering algorithm and hard partition clustering algorithm followed clustering axioms. The soft partition clustering algorithm is more complicated, so are the clustering results and clustering axioms respectively. The report pointed out, simply following clustering axiom is not enough, because it is only the minimum standard. The clustering results should be away from the violation of clustering axiom as far as possible. Thus, we can develop the following three clustering criterion: class separability criterion, class compactness criterion, inferior class avoid criterion.
In the end, research work is summarized: 1) A clustering axiomatization system is proposed, and it is the first time that C-means, Model based clustering and other well-known clustering criterion can be deduced; 2) The concept of inferior class definition and boundary set is proposed; 3) Three principles of clustering criterion design are proposed; 4) A soft clustering algorithm theory analysis framework is proposed.