Mixture Model Clustering of Binned Uncertain Data

Hani Hamdan

Abstract


This paper addresses the problem of taking into account data imprecision in the mixture model clustering of binned data. Binning (or grouping) data is common in data analysis and machine learning. Recently, we developed an original method which fitted the binning data procedure to imprecise data. The idea was to model imprecise data by multivariate uncertainty zones and to assign each uncertainty zone to several bins with proportions proportional to its overlapping volumes with the bins. The experimental results of this method when it was associated with the binned-EM algorithm (mixture approach) were encouraging. However, the binned-EM algorithm has the disadvantage of being sometimes computationally expensive. To overcome this problem, we propose in this paper to apply our binning data procedure with the classification approach based on bin-EM-CEM algorithm which is much faster than the binned-EM algorithm. The paper concludes with a brief description of a flaw diagnosis application using acoustic emission. The experimental results compare our binning data procedure with the classical one (when applied to imprecise data) in the classification approach framework, and with the int-EM-CEM algorithm, in the context of binned bivariate measurements of acoustic emission event localization.

Full Text: PDF