Feature Selection for Classification of Old Slavic Letters

Cveta Martinovska Bande, Mimoza Klekovska, Igor Nedelkovski, Dragan Kaevski

Abstract


This paper describes methodology for extracting discriminative features for fuzzy classification of Old Slavic characters. Recognition process is based on structural and statistical features, such as number and position of spots in outer segments, presence and position of horizontal and vertical lines and holes, compactness and symmetry. Preprocessing is divided into the following steps: conversion to black and white bitmaps, normalization, contour extraction and segmentation. Features are extracted from contour profiles, histograms and character intersections. C4.5 decision trees are used for feature selection. The same feature set is appropriate for different Old Slavic Cyrillic alphabets because of the similarity of their graphemes. The classification accuracy and precision are tested on Old Macedonian manuscripts and the decision trees are created for two alphabets Macedonian and Bosnian. The main advantage of the proposed method is saving processing resources and eliminating the need of large training sets necessary for Bayesian classifiers or neural networks.


Keywords


classifiers, decision tree, fuzzy logic, character recognition, precision and recall, historical manuscripts

Full Text: PDF