Name: Segmentation-Verification for Handwritten Digit Recognition
Price: 47.95 EUR
Availability: InStock
Author: Abdeljalil Gattal
ISBN: 978-3-668-51810-0

Automatic reading of digit fields from an image document has been proposed in several applications such as bank checks, postal code and forms. In this context, two main problems occur when attempting to design a handwritten digit string recognition system. The first problem is the link between adjacent digits, which can be naturally spaced, overlapped or/and connected. The second problem is the unknown length of the digit string, which is not carefully written by people in real-life situations.

In this thesis, SVM-based segmentation-verification system for segmenting two connected handwritten digits using the oriented sliding window is proposed. It employs a segmentation-verification system using conjointly the oriented sliding window and Support Vector Machine (SVM) classifiers. Experimental results showed that the proposed system is more appropriate for segmenting simple and multiple connections. Its main advantage lays in the use few rules for finding the optimal segmentation path. Hence, the proposed approach constitutes a tradeoff between the correct segmentation and the number of the segmentation cuts.

Thereafter, we propose a new design of a handwritten digit string recognition system based on the explicit approach for the unknown-length digit strings. Three methods are combined according the link of adjacent digits, which are the histogram of the vertical projection dedicated for spaced digits, the contour analysis dedicated for overlapped digits and the Radon transform performed on the sliding window dedicated for connected digits. A recognition and verification module based on Support Vector Machine (SVM) classifiers allows analyzing and deciding the rejection or acceptance each segmented digit image. Experimental results conducted on the benchmark dataset show that the proposed system is effective for segmenting handwritten digit strings without prior knowledge of their length comparatively to the state-of-art.

Extrait

À CEUX QUE J'AIME ...ET CEUX QUI M'AIMENT

TO THOSE I LOVE AND THOSE WHO LOVE ME

BSTRACT

Automatic reading of digit fields from an image document has been proposed in several

applications such as bank checks, postal code and forms. In this context, two main problems

are occurred when attempting to design a handwritten digit string recognition system. The first

problem is the link between adjacent digits, which can be naturally spaced, overlapped or/and

connected. The second problem is the unknown-length of the digit string, which is not carefully

written by people in real-life situations.

In this thesis, SVM-based segmentation-verification system for segmenting two connected

handwritten digits using the oriented sliding window is proposed. It employs a segmentation-

verification system using conjointly the oriented sliding window and Support Vector Machine

(SVM) classifiers. Experimental results showed that the proposed system is more appropriate

for segmenting simple and multiple connections. Its main advantage lays in the use few rules

for finding the optimal segmentation path. Hence, the proposed approach constitutes a tradeoff

between the correct segmentation and the number of the segmentation cuts.

Thereafter, we propose a new design of a handwritten digit string recognition system based on

the explicit approach for the unknown-length digit strings. Three methods are combined

according the link of adjacent digits, which are the histogram of the vertical projection

dedicated for spaced digits, the contour analysis dedicated for overlapped digits and the Radon

transform performed on the sliding window dedicated for connected digits. A recognition and

verification module based on Support Vector Machine (SVM) classifiers allows analyzing and

deciding the rejection or acceptance each segmented digit image. Experimental results

conducted on the benchmark dataset show that the proposed system is effective for

segmenting handwritten digit strings without prior knowledge of their length comparatively to

the state-of-art.

III

ESUME

La lecture automatique de champs numériques à partir d'une image de document a été

envisagée dans plusieurs applications telles que les chèques bancaires, code postal et les

formulaires. Dans ce contexte, deux principaux problèmes sont survenus lors de la tentative de

concevoir un système de reconnaissance de la chaîne de chiffres manuscrits. Le premier

problème est le type de liaison entre les chiffres adjacents, qui peuvent être naturellement

espacées, se chevauchent et/ou connectés. Le deuxième problème est la longueur inconnue de

la chaîne de chiffres, qui ne sont pas soigneusement écrit par les gens dans des situations

réelles.

Dans cette thèse, Un système de segmentation-vérification basé sur SVM est proposé pour

segmenter deux chiffres manuscrits connectés utilisant la fenêtre glissante orientée. Il emploie

un système de segmentation-vérification utilisant conjointement la fenêtre glissante orientée et

le classifieurs Support Vector Machines (SVM). Les résultats expérimentaux ont montré que le

système proposé est plus approprié pour segmenter des simples et multiples

connexions. L'avantage principal réside dans l'utilisation de quelques règles pour trouver le

chemin de segmentation optimale. Par conséquent, l'approche proposée représente un

compromis entre la segmentation correcte et le nombre de coupes de segmentation.

Ensuite, nous proposons une nouvelle conception d'un système de reconnaissance de la chaîne

de chiffres manuscrits basée sur l'approche explicite pour la longueur inconnue de la chaîne de

chiffres. Trois méthodes sont combinées selon le type de la liaison entre les chiffres adjacents,

qui sont l'histogramme de projection verticale dédiée pour les chiffres espacés, l'analyse du

contour affecté aux chiffres qui se chevauchent et la transformée de Radon effectuée sur la

fenêtre glissante destiné aux chiffres connectés. Un module de reconnaissance et vérification

est basé sur le classifieurs Support Vector Machine (SVM) qui permet d'analyser et de décider le

rejet

ou l'acceptation de chaque image

chiffres

segmentés. Les résultats

expérimentaux effectués sur la base de données du benchmark montrent que le système

proposé est efficace pour segmenter les chaînes de chiffres manuscrits sans connaissance au

préalable de leur longueur relativement à l'état de l'art.

CKNOWLEDGEMENTS

First, I would like to thank Prof. Youcef CHIBANI for her valuable contributions, corrections and

discussions.

I would like to thank the members of the LISIC Laboratory for their encouragement and

support during the preparation of this thesis.

In addition, I would like to thank the members of my examining committee. Their comments

helped to improve the quality of the final version of this thesis.

ONTENT

Abstract

Résumé

III

Acknowledgements

Content

List of figures

List of tables

List of acronyms and symbols

Introduction

Chapter 1. 0verview of handwritten digit recognition

1.1. Introduction

1.2. Character recognition systems

1.2.1. Acquisition mode

1.2.1.1. On-line mode

1.2.1. 2.Off-line mode

1.2.2. Preprocessing

1.2.2.1. Noise reduction

1.2.2.2. Normalization

1.2.2.3. Smoothing

1.2.2.4. Skeletonization

1.2.3. Segmentation

1.2.4. Feature generation

1.2.4.1. Global features

1.2.4.2. Statistical features

1.2.4.3. Geometrical and topological features

1.2.5. Classification techniques

1.2.5.1. Statistical techniques

1.2.5.2. Structural techniques

1.2.5.3. Stochastic techniques

1.3. Overview of Support vector machines « SVM »

1.3.1. Basic principles

1.3.2. SVM Kernel

1.3.3. Multi-class SVM

1. 4. Summary

Chapter 2. Isolated handwritten digit recognition

2.1. Introduction

2.2. Overview of isolated handwritten digit recognition

2.3. Size normalization

2.4. Feature generation

2.4.1. Global features

2.4.1.1. Density

2.4.1.2. Center of gravity

2.4.1.3. Second order geometrical moments

2.4.1.4. Number of transitions

2.4.2. Hu's Moment Invariants

2.4.3. Skew

2.4.4. Zernike moments

2.4.5. Projections

2.4.6. Profile Features

2.4.7. Background features

2.4.8. Foreground features

2.4.8.1. Contour Based Features

2.4.8.2. Skeleton Based Features

2.4.9. Ridgelet Transform

2.4.10. Region sampling: uniform grid

2.5. Recognition

2.6. Experimental results

2.6.1. Experimental results on NIST SD19

2.6.2. Experimental results on CVL Single Digit Database

2.7. Summary

Chapter 3. Segmentation of two connected handwritten digit recognition

3.1. Introduction

3.2. Overview of two connected handwritten digit recognition

3.3. Segmentation-verification system

3.3.1. Segmentation of connected digits

3.3.1.1. Finding the interconnection points

3.3.1.2. Finding the cutting path

3.3.2. Feature generation

3.3.3. Recognition and verification

3.4. System evaluation

3.4.1. Databases

3.4.2. Parameter tuning of the SVM model

3.4.3. Results

3.4.3.1. Influence of the orientation angle

3.4.3.2. Complexity of the proposed segmentation-verification

3.4.3.3. Comparative analysis

3.5. Summary

Chapter 4. Handwritten digit string recognition

4.1. Introduction

4.2. Overview of handwritten digit string recognition

4.3. Segmentation methods of the digit string

4.3.1. Segmentation of spaced digits

4.3.2. Segmentation of overlapped digits

4.3.3. Segmentation of connected digits

4.3.3.1. Adjustment of the width

(

)

4.3.3.2. Finding the angular cut via the Radon transform

4.4. Design of the digit string recognition system

4.4.1. Digit recognition-verification

4.4.1.1. Feature generation

4.4.1.2. Design of the SVM classifiers on isolated digits

4.4.1.3. Digit verification

4.4.2. Spaced digit recognition-verification

4.4.3. Overlapped digit recognition-verification

4.4.4. Connected digit recognition-verification

4.5. Experimental results

4.5.1. Databases and evaluation criteria

4.5.2. Experimental setup

4.5.3. Design of SVM classifiers on isolated digits

4.5.4. Evaluation of the digit segmentation

4.5.4.1. Influence of the range of the projection angle

4.5.4.2. Adjustment of the angular step

4.5.5. Comparative analysis

4.5.6. Computation cost of the proposed system

4.6. Summary

Conclusion

References

VII

IST OF FIGURES

Figure 1.1. Steps of the typical character recognition system.

Figure 1.2. Samples of the basic methods for normalization: (a) Skew normalization (b) Slant

normalization (c) Size normalization.

Figure 1.3. Example of skeletonization

Figure 1.4. Misclassification caused by over-segmentation

Figure 1.5. Misclassification caused by under-segmentation

Figure 1.6. Maximum margin hyperplane and margins for an SVM

Figure 1.7. Hyperplane soft margin for non-linearly separable data

Figure 1.8. The data not linearly separable in 2-D are mapped onto three dimensions where a linear

decision surface between the classes can be made.

Figure 1.9. Regions not classified by OAA approach for a problem of three classes

Figure 1.10. Regions not classified by OAO approach for a problem of three classes

Figure 1.11. Directed acyclic graph SVM method.

Figure 2.1. Different configurations of concavity

Figure 2.2. Concavity labels for digit `9'

Figure 2.3. Contour detection: (A) Contour of the upper region, (B) Feature vector, and (C) 8-Freeman

directions

Figure 2.4. Features extracted from the skeleton

Figure 2.5. Example of splitting a digit using a uniform grid (2x2)

Figure 3.1. Different types of connected digit samples.

Figure 3.2. Possible segmentation paths using IP and BP: Hypothesis 1 (b) Hypothesis 2 (c) Hypothesis 3.

Figure 3.3. Crossing and the segmentation paths according the orientation of the window.

Figure 3.4. Illustrative example for segmenting two connected digits using the oriented sliding window

and recognition-verification strategy.

Figure 3.5. Number (#) of segmentation cuts versus the orientation angle

(°)

Figure 3.6. The overall rate (%) versus the orientation angle

(°)

Figure 3.7. The overall rate (%) versus the number (#) of segmentation cuts

Figure 3.8. Unsuccessful segmentation-verification of connected digits produced by the proposed system

Figure 3.9. Performance of the segmentation-recognition algorithms

Figure 4.1. Correct and incorrect segmentation using the HVP.

Figure 4.2. Sample images of broken handwritten digits.

Figure 4.3. Segmentation by contour analysis. (a) Possible segmentation. (b) Impossible segmentation

Figure 4.4. Examples of incorrect segmentation when using the contour analysis.

Figure 4.5. Processing of broken parts (a) two distinct overlapped single-digits (Rule1) (b) broken single-

digit (Rule2) (c and d) broken digit (Rule 3)

Figure 4.6. Influence of adjusting the width on the sliding window.

Figure 4.7. Radon transform for three examples of connected digits. The maximum value of the Radon

transform corresponds to the orientation angle

for cutting two connected digits.

Figure 4.8. Two selected projections of the Radon Transform showing

= 51°

Figure 4.9. Impact of selecting the orientation angle for segmentation (a) Segmentation with SWRT (b)

Segmentation without SWRT

Figure 4.10. Full segmentation system for handwritten digit string recognition.

Figure 4.11. Some samples of connected digits considered by SCA as isolated digits.

Figure 4.12. Connected digit recognition-verification (a) Original connected digits (b) Scan IPs (c) Fixing

sliding window Radon transform around IP (d) Segmentation paths (e) Final decision.

Figure 4.13. Segmentation example of our proposed digit string recognition system.

VIII

Figure 4.14. Impact of selecting the range of the projection angle for detecting the cutting path from

[1,179] to [80, 100].

Figure 4.15. Influence of selecting the range of the projection angle (°) for different digit string lengths:

(a) 2-Digit string length (b) 3-Digit string length (c) 4-Digit string length (d) 5-Digit string

length (e) 6-Digit string length (f) 10-Digit string length.

Figure 4.16. Influence of adjusting the angular step (°) for different digit string lengths: (a) 2-Digit string

length (b) 3-Digit string length (c) 4-Digit string length (d) 5-Digit string length (e) 6-Digit

string length (f) 10-Digit string length.

Figure 4.17. Impact of selecting the angular step for detecting the cutting path: (a) Correct segmentation

when the angular step is fixed to 3°. (b) Incorrect segmentation when the angular step is

fixed to 6°.

IST OF TABLES

Table 2.1. Summary of features

Table 2.2. The recognition rate for various experiments

Table 2.3. Recognition results on individual features

Table 2.4. Recognition results on feature combinations

Table 2.5. Recognition rates on individual digits

Table 2.6. Comparison of proposed method with state-of-the-art methods

Table 3.1. Comparison of different segmentation methods

Table 3.2. Distribution of the NIST SD19 database for handwritten digits

Table 3.3. Types of connected numeral strings

Table 3.4. Distribution of the database regarding the type of connection

Table 3.5. The recognition rate for various orientation angles

Table 3.6. Segmentation effect according the orientation angle

Table 3.7. Some examples of the successful segmentation-verification produced by the proposed system

according the connection type

Table 3.8. Some examples of the unsuccessful segmentation-verification produced by the proposed system

according the connection type

Table 3.9. Comparative analysis

Table 3.10. Rank of our method comparatively to the existing methods according the connection type

Table 4.1. Number of digit string samples (#Strings) distributed according to the Numbers of Spaced Digits

(#SD) and Connected and/or Overlapped Digits (#C-OD) expressed also in %.

Table 4.2. Recognition rate (%) obtained when training and retraining SVM classifiers.

Table 4.3. Average rates obtained for all digit string lengths for different ranges of the projection angle (°)

Table 4.4. Recognition rates according to the angular step for each string length.

Table 4.5. Comparative analysis of various segmentation systems for the unknown-length string performed

on NIST NSTRING SD19

Table 4.6. Examples of the correct and incorrect segmentation-recognition produced by the proposed

system from the NSTRING SD19 Database.

IST OF ACRONYMS AND SYMBOLS

PDA

Personal Digital Assistant

Tablet PCs

Tablet Personal Computers

SAM

Spectral Angle Mapper

ANN

Artificial Neural Networks

SVM

Support Vector Machines

Nearest Neighbor

HMM

Hidden Markov Models

Vapnik and Chervonenkis

KKT

Karush Kuhn Tucker

RBF

Radial Basis Function kernel

KMOD

Kernel with Moderate Decreasing

OAA

One Against All

OAO

One Against One

DAG

Directed Acyclic Graph

GMM

Gaussian Mixture Models

CVL

Computer Vision Lab

NIST

National Institute of Standards and Technology

MNIST

Modified NIST

NIST SD19

NIST Special Database 19

CENPARMI

Centre for Pattern Recognition and Machine Intelligence

CEDAR

Center of Excellence for Document Analysis and Recognition

NSTRING SD19 NIST STRING Special Database 19

MLP

Multi-Layer Perceptron

FIR MLP

Finite Impulse Response MultiLayer Perceptron

LIBSVM

A Library for Support Vector Machines

SOM

Self-Organizing Maps

Interconnection Point

Base Point

SWRT

Sliding Window Radon Transform

AHDSR

Automatic Handwritten Digit String Recognition

HVP

Histogram of the Vertical Projection

Segmented Components

SCA

Segmented Component Analysis

DRV

Digit Recognition-Verification

Grouped Component

GCA

Grouped Component Analysis

BSVM

binary SVM

Spaced Digits

C-OD

Connected and/or Overlapped Digits

Perpendicular vector to the separating hyperplane of an SVM

Transpose of w

Input space showing an example of the database

Bias setting an SVM

Possible decision space associated with an example x

VC dimension

Number of training set

probability

Regularization constant

Lagrange multipliers

Non-negative slack variables

Lagrangean function

Mapping function (quadratic transform)

Kernel function

Dimension of the feature space

Decision function of SVM

Scale parameter of RBF and KMOD

Parameter of KMOD.

Two-dimensional geometric moment

Zero

order moment

10T

Central moments

Second-order moments

The seven

moment invariant

Zernike polynomials

Radial polynomial

Zernike moments

()

Image function

Amplitudes of Zernike moments

Dirac distribution

Angular variable

Radial variable

Radon transform

XII

Orientation angle

Distance between two digits

Width of Sliding Window Radon Transform

Angular cut

Parameter defined experimentally

( , )

Radon transform

(

) Maximal value selected from 10 responses provided by SVM classifiers

Feature vector

Threshold for Digit verification

Fixed decision threshold

NTRODUCTION

Automatic handwriting recognition is intended to convert images that are understandable by

humans, in an interpretable code by a machine. It has been a subject of intensive research for

about last fifty years. The problem, which is very simple for almost every human, is extremely

complicated for the machine. Several techniques and methods have been proposed in order to

build faster and more reliable systems. However, despite all the efforts in this area, there is still

a significant gap between human and machine performances. Two fields can be released:

Online handwriting recognition and offline handwriting recognition. In Online recognition,

characters are recognized when written. Stroke information is mono-dimensionally captured

dynamically, represented by the pen trajectory. Offline handwriting recognition is different from

online handwriting recognition, because here, stroke information is not accessible. The

information is two-dimensional represented as an image and captured of the scanned device of

the characters. Offline handwriting recognition is less accurate than online systems because the

temporal information is absent. The latter is the subject of this thesis.

Offline handwriting recognition itself can be divided into two approaches, according to the

number of persons whose writing must be recognized. When this number is limited, the

recognition system can support a specific training on the base of the own writing to those

persons. The system is called the mono-script recognition. When the writings is composed from

a very large number of different persons, the system is called the multi-script recognition.

Hence, this work takes place in this theses.

A typical offline handwriting recognition system usually consists of three main processing steps:

preprocessing, feature generation and recognition using a classifier. Preprocessing consists of a

sequence of operations performed to the images in order to prepare them for feature

generation. These common operations include noise reduction, document skew correction, slant

correction, normalization, smoothing and skeletonization.

The feature generation allows representing an image as a vector of features using various

extraction techniques, which can be redundant or not. When features are redundant, a

selection algorithm can be performed to reduce the size of the input feature vector to avoid the

so-called curse of dimensionality problem.

Finally, there are also a large number of classifiers for recognition ncluding the statistical, the

structural, the stochastic classifiers and finishing on combination of classifiers. At each step,

selecting the appropriate parameters could affect the final classification performance.

PProblem Statement

The challenges of unconstrained handwritten digit recognition are related mainly from

the following factors:

Multi-script recognition: is considered as a complex problem for the recognition of

unconstrained handwritten digits caused by the variability of writing;

Lack of constraints: imposed upon the writing (slopes, slants, overlaps and interconnections

of digits);

Variety of writing styles: are multiple such as stick, detached, mixed and cursive script;

Tools of writing: make the line thickness not unified;

Insufficient inking: causes scanning defects largely due to the age of the writer and the

quality of the paper;

The segmentation into isolated digits can produce pseudo-digits or connected digits.

At the end this step, the system must be robust in order to avoid an over-

segmentation and under-segmentation.

In summary, the difficulties of the unconstrained handwritten digit recognition are not only

located in recognizing individual digits, but also to separate out digits each other within the

string through segmentation. It can be conducted by considering three following situations:

spaced, overlapped or connected digits. In most cases, the overlapped and connected digits are

the frequent observed situations.

Currently, the focus of our work is the development of an Automatic Handwritten Digit String

Recognition (AHDSR), which is required in many applications such as the amount of the bank

checks, postal code and forms. In this context, some complex problems are occurred for

recognizing handwritten digit strings: the presence of the noise, broken digits, overlapping

digits, connected digits and unknown length of the string.

The literature shows different methods for AHDSR. Some works are based on contours, the

background and others on the skeleton. Others combine the two approaches, using the

skeleton and contour allowing to deduce the potential cutting points and in order to achieve a

better performance.

In reviewing the various explicit segmentation methods for AHDSR, the literature presents two

different approaches: segmentation-recognition and recognition-based. The relationship

between segmentation and recognition when using the recognition-based, each step depends

on the other. In this case, the priori knowledge is required in the segmentation step. Generally,

algorithms based on the segmentation-recognition approach allows to separate the digit

string into segments using rules without recognition.

In this work, the segmentation of handwritten digit string into digits is the most crucial part in

AHDSR. We shall deal with the foregoing problems by using a recognition-based approach

based on segmentation and verification.

Goals of the w ork

The primary goal of this research is focused on the recognition of isolated digits. The main

challenges in handwritten digit recognition arise from variations in size, shape, slant, and most

importantly, the differences in the writing styles of individuals. In this research, we are

interested in enhancing the feature generation step for isolated digit recognition used for

avoiding digit normalization. The idea is to find a combination of multiple features which

improves the overall recognition rates by minimizing the intra-class variability and maximizing

inter-class variability, the most desirable requirement of any pattern recognition system. The

performance of an AHDSR system depends in particular on the design of a robust Support

Vector Machines (SVM) classifier.

A second goal of this research lies in the recognition of two connected handwritten using a

SVM-based segmentation-verification system for segmenting simple and multiple connected

digits using an oriented sliding window. Two morphological features are combined based on the

contour and the skeleton for detecting and splitting correctly the connected digits. In order to

avoid the over-segmentation and the under-segmentation, a sliding window is used for finding

interconnection points. When the interconnection points are found, the window is oriented to

define the optimal cutting path. The recognition and verification process is performed using the

Support Vector Machines (SVM) through the One-Against-All (OAA) implementation.

The third goal of this research is to propose a solution for recognizing handwritten digit strings.

based on the explicit approach for the unknown-length. Three methods are combined according

the link of adjacent digits, which are the histogram of the vertical projection dedicated for

spaced digits, the contour analysis dedicated for overlapped digits and the Sliding Window

Radon Transform method dedicated for connected digits. Beside, the proposed system uses

Support Vector Machine (SVM) as classifier for recognition and verification.

Experiments carried out on isolated digits, two connected digits and strings of digits are are

performed for evaluating these three contributions using standard datasets.

Outline of the Thesis

This thesis consists of four chapters as described as follows:

Chapter 1 presents a brief overview of the handwritten character recognition system in order to

be able to introduce all definitions related to the system. This attempt is to bring out all steps of

the recognition system. It provides a detailed description of the popular Support Vector

Machine (SVM) classification technique since it is more accurate than other classifiers in many

areas of applications for data classification.

Chapter 2 describes the combination of different statistical and structural features for

recognition of isolated handwritten digits. These features include some global statistics,

moments, profile and projection based features and features generated from the contour and

skeleton of the digits. Some of these features are extracted from the complete image of the

digit while others are extracted from different regions of the image by first applying a uniform

grid sampling to the image. Classification is carried out using one-against-all SVM

implementation. The experiments are conducted on the isolated handwritten digit NIST SD19

and the CVL Single Digit Database.

SVM-based segmentation-verification system for segmenting two connected handwritten digits

using the oriented sliding window is addressed in Chapter 3. This latter describes a

segmentation-verification system using conjointly the oriented sliding window and support

vector machine (SVM) classifiers. The oriented sliding window is used for finding at the same

time the interconnection points and the optimal angle for cutting the adjacent digits. Whilst,

classifiers are used for recognizing and verifying the correct segmentation via a global decision

module for accepting or rejecting the processed image. Experimental results conducted on a

large synthetic database of handwritten digits show the effective use of the oriented sliding

window for segmentation-verification.

Chapter 4 is dedicated to propose a new segmentation and recognition system for unknown-

length handwritten digit strings. Three segmentation methods are combined based on the

histogram vertical projection, contour analysis and the Sliding Window Radon Transform

(SWRT) for recognizing unknown-length handwritten digit strings with the help of the

verification strategy stage. A recognition module based on the SVM classifier allows analyzing

and deciding the rejection or acceptance for each segmented digit image. Experimental results

conducted on the benchmark dataset show that the proposed strategy is effective for

segmenting handwritten digit strings.

Finally, we conclude this thesis by summarizing the main contributions of this work and new

directions are proposed for its improvment.

HAPTER

VERVIEW OF HANDWRITTEN DIGIT RECOGNITION

Handwritten character recognition is an important area in image processing and pattern

recognition field. Its goal is to transform the handwritten characters to automatically recognized

by a machine readable. In this context, this thesis focuses on the handwritten digit recognition

from an image document since many applications can be envisaged as bank checks, postal code

and forms. Hence, this chapter is devoted to overview the main modules composed a

handwritten digit recognition system.

1.1. Introduction

Automatic reading of digit fields from an image document has been proposed in several

applications such as bank checks [1], postal code [2] and forms [3]. The digit fields can

be printed (or typewritten) and written by a writer text.

The first case is considered as a closed problem since all difficulties have overcome. In contrast,

the second case is entirely different because of the high variability of the writing [4].

This chapter is devoted to overview the different modules that allow recognizing a handwritten

digit character. Basically, a handwritten digit recognition system is composed of five distinct

modules: acquisition, preprocessing, segmentation, feature generation and classification

followed by an optional post-processing module.

1.2. Character recognition systems

The design of a handwritten character recognition system can be conducted mainly in five

steps: acquisition, preprocessing, segmentation, feature generation and classification. The first

module is the image acquisition, where a digital image of the text is obtained. This could be

done off-line using a scanner or on-line by a digital pen/stylus. Next, the preprocessing module

generally consists of several methods used to improve the quality of images for further

processing. The segmentation module is used for separating the overlapping and/or joining of

adjacent digits into elementary digits in order to deduce the possible distinct classes [5-9]. After

that, a feature generation is performed on the digit image for reducing the dimension of the

representation and thus makes the design of the classification system. Next, a decision function

Training

Reference

patterns

allows assigning a character image to predefined class. Figure 1.1 illustrates the different

modules for recognizing a handwritten digit.

The following sections describe each module composed a handwritten digit recognition system.

Figure 1.1. Steps of the typical character recognition system.

1.2.1. Acquisition mode

The acquisition of handwritten characters is done in two modes: on-line or off-line. Each mode

has its own acquisition tools and its corresponding recognition algorithms. On-line mode takes

into account the chronological information of the movements of the writer's hand; whereas the

off-line mode treats the information in delayed independently of the time.

1.2.1.1. On-line mode

This mode works in real time (during writing) i.e. the digital ink sample is composed as a set of

coordinates ordered in time. It is possible to track the path to know the pen-tip movements as

well as pen-up/pen-down switching and eventually the slope and speed.

This is notably the case of light pens, digital pens or styluses on touch screens, PDAs or Tablet

PCs. The symbols are recognized as they are written by hand. The obtained signal is converted

into character codes which are recognized by means of character recognition systems. The

writing is represented as a set of points whose coordinates are a function of time, which can be

regarded as a digital representation of handwriting [10].

On-line recognition has a great advantage for its possible correction and modification

interactively of writing having regard continuous system response [11].

Acquisition

Preprocessing

Segmentation

Feature generation

Classification

Recognition

Recognized

character

1.2.1.2. Off-line mode

The off-line handwriting recognition is performed on an image of a scanned document. In this

context, the acquired data is regarded as a static representation of handwriting. This mode is

appropriate for printed documents and manuscripts previously written after the acquisition.

The design of the off-line handwriting recognition is difficult comparatively to the on-line

handwriting recognition system since many desirables characteristics are not available as the

velocity, the pressure and the coordinates of the character. This mode can be considered as the

most general case for recognizing a handwriting character.

In our case, the image of the written text acquired by the off-line mode involves capturing the

text image using physical sensors (scanner, camera ...) with a minimum degradation. During

this step, despite the good quality of acquisition systems, noises might appear in scanned

document images. This is caused by the texture type, the area and its lighting.

1.2.2. Preprocessing

Preprocessing is performed essentially for reducing the superimposed noise on the image data

and trying to keep only the material information of the document. The noise may be due to

acquisition conditions (lighting, wrong placing the document ...) or the quality of the original

document.

One of the problems in the recognition of handwritten characters is the skew/slant detection

and correction in the text document, which introduces challenges for segmentation. Therefore,

the preprocessing of the images normally yields better results. These tasks commonly

include noise reduction, normalization, smoothing and skeletonization.

1.2.2.1. Noise reduction

Prior to the character recognition, it is necessary to eliminate the imperfections. Noise reduction

is the process of removing noise from an image. The imperfection in the optical scanning

devices intensity of light, scratches on the camera scanner lens or the writing instrument causes

disconnected line segments introduces noises in the scanned images.

There are many techniques to reduce the noise. Basically, the filtering function is used to

remove the noises and diminish spurious points in the image. For example, the symmetric

Gaussian filter function is used for smoothing equally in all directions. An alternative approach is

the use of the Morphological operations, which are basically neighbourhood operations. It is

performed on the input image using a structuring element. Two basic morphological operations

are used: dilation and erosion [1-2, 6-7].

1.2.2.2. Normalization

The normalization method is popularly used in character recognition to reduce all types of

variations and to obtain standardized data. However, it also gives rise to excessive shape

distortion and eliminates some useful information. The usual methods for normalizing a

character are the following:

Skew normalization::

Due to variation in the writing style, the skew can hurt the effectiveness

of recognition and therefore should be detected and corrected with respect to the

baseline (see Fig. 1.2.a). Various methods have been used, which are the projection profile

of the image [12], the Hough transform [13] or the shape of nearest neighbor clustering

[14]. After skew detection, the character or word is translated to the origin and rotated until

the baseline is horizontal.

Slant normalization:

The character inclination typically found in cursive script is called slant.

Formally, it is defined as the angle between longest stroke in a word and the vertical

direction referred to the word slant. Slant normalization is used to normalize all characters to

a standard form with no slant (see Fig. 1.2.b). Many methods have been proposed to detect

and to correct the slant of cursive words. One of the used methods is based on the center of

gravity [15], another method uses the projection profiles [16] and some used a variant of

the Hough transform [17].

Size normalization:

It is used to adjust the size, position and shape (dimension) of the

character image. This step is required for reducing the shape variation between images of

the class to facilitate the feature generation and improve their classification [18] (see

Fig. 1.2.c).

Figure 1.2. Samples of the basic methods for normalization: (a) Skew normalization (b) Slant

normalization (c) Size normalization.

(a)

(b)

(c)

1.2.2.3. Smoothing

The smoothing operation is done to regularize the edges in the image, to remove small bits of

noise and to reduce the high frequency noise in the image [19]. Furthermore, different

preprocessing methods are used for smoothing image in order to acquire more accurate output

image. In freeman's direction extraction, smoothing is done by comparing each code with

previous and next code [6, 18].

1.2.2.4. Skeletonization

Skeletonization is a morphological operation used for reducing foreground regions (contour) in

a binary image to a skeletal, which the connectivity of the original region is detected while

destroying most of the original foreground pixels.

Methods for the skeletonization are divided into two main approaches: iterative and non-

iterative [20]. When using the iterative approach, the peeling contour process parallel or

sequentially by erasing or removing the unwanted pixels in each iteration. In contrast, the non-

iterative approach, the skeleton is straightforward extracted without examining each pixel

individually. Unfortunately, these techniques are difficult to implement and slow as well.

Thinning can be somewhat performed for skeletonization using methods like erosion or

opening. In this mode, it is commonly used by reducing all lines to single pixel thickness as

shown Fig. 1.3.

Figure 1.3. Example of skeletonization.

1.2.3. Segmentation

When designing a digit recognition system, the most important step is the segmentation of the

digit string to obtain isolated digits. This step is a non-trivial problem, due to several

complexities. The first one is the inherent nature of the script that is cursive and at the same

time overlapped. The second one is the high degree of variation in writing styles produced by

writers.

The segmentation systems can be divided into two approaches: implicit and explicit.

Implicit approach

considers all traced points as potential segmentation points [3, 9]. In this

case, the segmentation and recognition are performed simultaneously for recognizing a digit

string. Indeed, this approach does not attempt to separate digits, but rather it incorporates

the implicit segmentation into the recognition module.

Explicit approach

is performed by finding the best way to separate adjacent digits before

recognition. In this case, the segmentation and recognition are performed separately. Also,

three cases can be occurred when attempting to separate two adjacent digits: spaced,

overlapped and/or connected digits. In most cases, the overlapped and connected digits are

the frequent observed situations.

Hence, many algorithms were proposed to separate the couple or string of contiguous digits. In

this case, the explicit segmentation algorithms can be categorized in two groups [8-9]:

The recognition-based approaches

usually generate multiple candidate segmentation points

which are verified with recognizer for choosing the optimal segmentation points. Although

this approach gives better efficiency than the segmentation-recognition approaches, the

main weakness lies in the computational time to compare all the segmentation hypotheses.

The segmentation-recognition approaches

are based on the morphology of the connected

digits presented by the contour (concavities, profiles) and the skeleton (Background,

foreground). It is used to construct the segmentation paths where each segmented

component should contain an isolated digit, which is submitted to the recognizer. These

methods are generally faster than recognition-based because a number of separation

attempts must be performed before a segmentation.

Generally, the recognition reaches very high performance when dealing with spaced digits.

However, when a complete segmentation system is used, the recognition is more difficult since

it can generate either an over-segmentation or under-segmentation.

When the over-segmentation problem is occurred, the segmentation cut is performed in intra-

digits and provides better results than other inter-digit segmentation cuts. Figure 1.4 shows

some examples of the over-segmentation problems.

20 020

19 101

14 121

Figure 1.4. Misclassification caused by over-segmentation

When the under-segmentation problem is occurred, the lack segmentation produces wrong

result than the correct segmentation cuts. For example, the connected digits often are

recognized as isolated digit. Therefore, this problem can provide the confusion between isolated

and under-segmented digits. Some examples of the under-segmentation problem are shown in

Figure 1.5.

Figure 1.5. Misclassification caused by under-segmentation

The proposed segmentation system for separating the couple or string of contiguous digits will

be discussed in detail later in the Chapters 3 and 4.

1.2.4. Feature generation

The feature generation can be defined as a problem for extracting the most pertinent

information from the image for a classification problem i.e. which minimizes the intra-class

variability and maximizes the inter-class variability [21-23]. This pertinent information often is

represented by a numeric value vector.

The feature generation methods are many and varied. Each one has its own properties and can

only apply to various contexts and different conditions. Before bringing our choice on a method

than another, it is important to know all the ins and outs of their use. There are numerous

types of features but they are broadly classified into three types: Global features, Statistical

features, and Geometrical and topological features.

1.2.4.1. Global features

The global features are generated from the entire character image using for instance the center

of gravity, moments, the coefficient obtained from mathematic transforms such as Fourier,

wavelet and Ridgelet coefficients... [23].

The global features is faster speed since the required values for calculating and the matching

classification is convenient, but the distinction ability of the character details is weak, and

sensitive to the image deformation. It is generally used for the simple character detection.

1.2.4.2. Statistical features

The statistical features are derived from the statistical distribution of pixels in the character

image of the pixels [21]. They offer high speed and low complexity and take care of writing

variations to some extent. They may also be used for reducing the dimension of the feature set.

One of the most common statistical features is the moments extracted from images (Hu,

Zernike moments...). Therefore, the Zoning used for dividing the character into several frame

containing overlapping or non overlapping zones (angular, concentric and circles grids). Finally,

the horizontal or vertical projection features count the number of foreground pixels.

1.2.4.3. Geometrical and topological features

The geometrical and topological features may represent global and local properties of the

character's structure and have high tolerances to deformation, distortions and writing variations

(translation and rotation) [22]. The topological features can encode some knowledge about the

character structure or may require some knowledge for this kind of components.

The specific features include shape descriptor and geometric structure features. The shape

descriptor is used to describe the character structure while the geometric structure feature is

used for reflecting character shape structure and the stroke segment change such the direction

of the stroke segment due to writing variations.

The geometrical and topological features generated from the character include the complex

stroke as curves and splines, such as stroke directions, end points, intersections of line

segments and loops, the number of holes, size of slopes and curves, length and thickness of

strokes, the areas and perimeters...

All these measures can be integrated into a single feature vector for recognition of a

handwritten character.

1.2.5. Classification techniques

The classification is a technique that allows assigning an unknown pattern into a predefined

class [24]. The classification workflow uses either unsupervised or supervised methods to

categorize character features into many classes. It can perform an unsupervised classification

without providing training data. In this case, the classes are generated according to the

resemblance between samples. In contrast, the supervised classification requires training data

and specify a classification method such as the maximum likelihood, minimum distance,

Mahalanobis distance, or Spectral Angle Mapper (SAM).

For handwritten digit recognition, the nearest neighbour classifier is one of the simplest and

widely used for classifying the digit image [32]. Artificial Neural Networks (ANN) are another

well studied classification method, and widely used for handwritten character recognition [33].

Hidden Markov Model (HMM) is also a popular approach to this problem which uses both the

statistical and structural information contained into handwritten digit shapes [85]. Support

Vector Machines (SVM) originally developed as a two class classifiers have been extended to

multi-class problems and applied to handwritten character recognition [28,88].

In the case of the handwritten digit recognition, supervised classification has been adopted

since digit classes are well specified. It can be performed in two stages:

Training stage:

The goal of the training stage is to train the classifier with the known digit

dataset for further recognition with unknown digit dataset.

Recognition and decision stage:

This stage classifies the input pattern by comparing them to

a list of reference patterns. This stage also uses classification techniques in the form of

decision. The decision stage is strongly influenced by the feature generation step, and a

successful of the classifier.

1.2.5.1. Statistical techniques

Statistical techniques are based on a statistical decision theory that uses the statistical decision

functions and a set of optimality criteria. It is performed on the shapes to be recognized.

However, it requires a significant number of samples in order to achieve correct training of the

probability distributions for different classes. The study allows reflecting their distribution in a

metric space and the statistical characteristics of the classes allow to take a decision as higher

probability of the class membership (more confidence) [24].

The main statistical techniques which are applied in the character recognition field are the

followings [18, 24]:

Parametric Recognition:

This method requires a priori information about the characters for

obtaining a parametric model for each character. Therefore, once the parameters of the

model which may be based on some probabilities are obtained, the characters are classified

according to some decision rules such as maximum Likelihood or Bayes's method [33,28,85].

Non-parametric Recognition:

This method does not require a priori information about the

data. It is used to separate different pattern classes along hyper planes defined in a given

hyper space. The finest known method of non-parametric categorization is the Nearest

Neighbor (NN) and is extensively used in character recognition. An incoming pattern is

classified using the cluster, whose the center is the minimum distance from the pattern over

all the clusters [18,32].

1.2.5.2. Syntactic and structural techniques

Syntactic and Structural techniques are based on the structural primitives taking into account

the physical structure of characters. In general, it is assumed that the character primitives

extracted from writing are quantifiable and one can find the relations among them. For

example, syntactic pattern recognition can be used to find out what objects are present in an

image. Furthermore, structural methods are strong in finding a correspondence mapping

between two images of an object [18, 32-33].

The main difference between these structural techniques and statistical techniques is that these

features are topological primitives and not measures. Several techniques are available such as

graph-matching algorithm, grammatical Methods and string matching.

1.2.5.3. Stochastic techniques

The stochastic techniques use a model for recognition, taking into account the high variability of

the shape. In these techniques, the models are often discrete and many studies are based on

the theory of Markov fields and Bayesian estimation. Markov fields permit the modeling of the

global properties while using local constraints. The model describes these states using state

transition probabilities and observation probabilities. The most common methods in these

techniques are the methods using Hidden Markov Models (HMM) [18, 24, 85].

11.3. Overview of Support vector machines (SVM)

The support vector machine (SVM) is a popular classification technique including linear and

non-linear, which not only has a solid theoretical foundation, but also it is more accurate than

other classifiers in many areas of applications for data classification. SVMs are originally

designed to use the principle of the structural risk [25].

In addition, they come to meet two major disadvantageous in machine learning which are:

Parametrically controlling the capacity of the SVM measured by the VapnikChervonenkis

(VC) dimension;

Avoid over-learning (overfitting).

Two other advantages are particularly offered by SVMs. First, with an appropriate kernel, the

SVM can work well even if data are not linearly separable in feature space. Second, especially

popular in texts and images classification problems where very high-dimensional spaces are the

norm. These two advantages allow us to select SVMs as better candidates for performing

handwritten digit recognition.

SVMs are a machine learning algorithm for performing classification and regression via a

hyperplane in a high feature space. SVM has its ability to select the representative dataset from

the training dataset, which is commonly called "Support vectors" and attempts to find the

optimal line (hyperplane) that maximizes the distance (margin) between the closest points from

two classes. Once learned, a decision function is constructed in order to classify data according

to the region separated by the hyperplane.

The error is minimized by maximizing the margin controlled by the VC dimension of the

classifier. For a better clarity, the theoretical aspects of SVMs are explained in the following

sections [25-31].

1.3.1. Basic principles

Support Vector Machine (SVM), introduced by Vapnik in 1995 [25], is a binary

classification technique for supervised learning methods which can be used for both

classification and regression. In simple terms, given a set of training samples, each labeled as

members of one of two classes, an SVM classification training algorithm tries to construct a

decision model be able to predict whether a new sample falls within one class or the other. If

the samples are represented as points in space, a linear SVM model can be interpreted as a

division of the space to the examples belonging to separate classes, which are divided by a

clear gap that is as wide as possible. New samples are then predicted to belong to a class

based on which side of the gap they fall on. It is based on the use of a function called decision

function that allows separating optimally the data.

The principle of SVM during optimization is to maximize the margin between classes in order to

increase the capability of separation. The SVM classifier will classify all the points on one side of

the decision boundary as belonging to one class and all those on the other side as belonging to

the other class.

Figure 1.6. Maximum margin hyperplane and margins for an SVM

belongs to the hyperplane with the equation:

= 0

. Where is the normal of

the hyperplane while

14T

is the perpendicular distance between the hyperplane and the

origin (Figure 1.6). Figure 1.6 shows an example where input data are grouped into two classes

which should be identified by the SVM classifier (Support vectors are circled).

In the case when the training data are linearly separable, the classifier can select two

hyperplanes in a way that they separate the data and there are no points between them, and

then try to maximize their distance. For all data (support vectors)

of the class

{ 1, +1}

14T

based learning, these hyperplanes (

)

14T

can be described through the

following equations:

> 0

= 1

< 0

(1.1)

It is possible to rescale

14T

and

14T

such that:

= 1

(1.2)

These can be combined into one set of inequalities:

(

)

(1.3)

Support

Vector

= 0

Optimal hyperplane

Maximum margin

< 0

> 0

The points for which the equality in equation (1.2) holds lie on the supporting hyperplane h1:

= 1

14T

and the supporting hyperplane h2:

14T

.Note that h1 and h2

are parallel and that no training points fall between them.

For a hyperplane equation

( ) =

, the distance of a point

to the

hyperplane is

( )/

14T

SVM constructs a decision surface hyperplane that maximizes the separation margin by

following a principle approach rooted in from the statistical learning theory. More precisely, the

VC dimension is a measure of the capacity and complexity of a statistical classification theory,

because it can predict a probabilistic upper bound on the test error of a classification model.

Where

14T

denotes the number of training set (restriction: this formula is valid when

Now, choose some

14T

such that

14T

. Then, for losses taking these values, with

probability

14T

, the following bound holds (Vapnik [30]):

test error

training error +

(log( ) + 1)

log( )

(1.4)

Where

14T

is a non-negative integer called VC dimension and is a measure of the capacity of the

learning machine. More precisely, the support vector machine is an approximate

implementation of the method of structural risk minimization. This induction principle is based

on the fact that the error rate of a learning machine on test data is bounded by the sum of the

training-error rate and a term that depends on the VC dimension. In the case of separable

patterns, a SVM produces a zero value for the first term (right-hand side of equation (1.4)) and

minimizes the second term, thus minimizing the overall risk. In the case of non-separable

patterns, the cost function that must be minimized will now perform an extra term weighted by

the parameter C which penalizes any SVM generates too non-separable patterns.

For the SVM optimization problem, the maximum possible margin between the

hyperplane and the support vectors are obtained by minimizing

, under the constraints:

(

) > 1

for any

1, ... ,

14T

. According to the optimization theory, the

goal (minimize

) and constraints (

(

) > 1

) are strictly convex. This

problem can be resolved by introducing the Lagrange multipliers

14T

therefore the constrained

problem can be expressed as:

( ,

, ) =

[ . ( .

)

(1.5)

That must be canceling the partial derivatives with respect to

14T

and

14T

. In this expression

called dual form, the constraints of good ranking become a trade-off between a large margin

and a small error penalty.

The Karush-Kuhn-Tucker (KKT) conditions play a central role in both the theory and practice of

constrained minimization problem, it proves that this is equivalent to the equations solutions of

annulling the derivatives of Lagrangian with respect to the variables ,

The annulment

of partial derivatives of these partial derivatives gives :

{

( .

)}

= 0

(1.6)

can be computed thanks to the terms:

(1. 7)

The Lagrange multipliers are deduced by setting the partial derivatives to zero, which

correspond to support vectors. All other points have null

. Finally, the equation of the

separating hyperplane is defined:

( ) = (

) +

. ( .

) +

(1.8)

Where

are the solutions of non-null

and

is found by placing the coordinates of a

support vector

of the class

> 0

= 1

or in

< 0

The Soft Margin method allows selecting a hyperplane that splits the examples as cleanly as

possible, while still maximizing the distance to the nearest cleanly split examples. In the case of

non-linearly separable patterns, the objective function is then increased by a function which

penalizes non-null

, and the minimization optimization problem. The good ranking constraint

initially defined by:

(

) > 1

(1.9)

The Soft Margin method introduces non-negative slack variables , which measure the degree

of misclassification of the data . If the penalty function is linear, the optimization problem

released through a parameter

to become a:

(

) > 1

(1.10)

In this case, the maximum possible margin between the hyperplane and the support vectors,

initially obtained by minimizing

. If the penalty function is linear, the minimization

optimization problem becomes:

(1.11)

where is a strictly positive soft margin parameter to be determined called also regularization

constant. This parameter allows striking a balance between the two competing criteria of

margin maximization and error minimization, whereas the slack variables

indicates the

distance of the incorrectly classified points from the optimal hyper plane. Figure 1.7 shows the

incomplete separable data and slack variables (wrong side of the classifier).

Figure 1.7. Hyperplane soft margin for non-linearly separable data (

is misclassified while

is properly classified although its

is non-null)

1.3.2. SVM Kernel

The problem of a nonlinear classifier is addressed by nonlinear transformation of the inputs into

high-dimensional feature space, in which the probability of linear separation is high. This

transformation of input data into high-dimensional feature space is achieved using kernel

functions.

The method of linear separation, previously presented is somewhat limited. The introduction of

kernel functions will allow overcoming this limitation. The main idea of the nonlinear extension

is to map the data from the original input space to a feature space and find the hyperplane with

a large margin in the feature space. For an enlarged feature space, consider transformations of

a mapping as follow:

(1.12)

Figure 1.8. The data not linearly separable in 2-D are mapped onto three dimensions where a

linear decision surface between the classes can be made.

which maps the training data from

to some higher Euclidean space

, which possibly has

infinite dimensions (m is the dimension of the feature space). In this high dimension space, the

data can be linearly separable, hence the linear SVM formulation above can be applied to these

data. In the SVM formulation, the training data only appear in the form of dot products

These can be replace by dot products in the Euclidean space

( ).

(1.13)

In the equation of the separating hyperplane (1.8), the dot product

( . )

may be replaced by

any kernel function

( . )

providing a dot product. The equation of the separating

hyperplane becomes:

( ) = (

) +

. ( .

) +

(1.14)

where

are the solutions of:

{

( .

)}

= 0

(1.15)

The kernel functions commonly used are:

Linear kernel:

( , ) = ( . )

Gaussian Radial Basis Function kernel (RBF):

( , ) = exp (

)

for

Sometimes parametrized using

Polynomial kernel:

( , ) = ( .

+ )

Sigmoid kernel or Hyperbolic tangent:

( , ) =

( .

+ )

The effectiveness of SVM depends on the selection type of the kernel, the kernel's parameters

and the soft margin parameter . A common choice is a Gaussian kernel, which has a single

parameter . The best combination of and

is checked using cross validation [26].

1.3.3. Multi-class SVM

SVM is designed only for separating two classes. Its extension for multi-classes has conducted

to develop various implementations SVM [27-30]. The multiclass problem can be processed as a

combination of binary classifiers. When using SVM for handwritten digit recognition, we have a

problem of ten (10) classes. Hence, some approaches are used for extending the binary SVM to

the mutli-class, which are One-Against-All (OAA), One-Against-One (OAO) and Direct Acyclic

Graph SVM. We review in the following the main properties of the three implementations.

One against all

: The most intuitive approach for making the multiclass SVM involves building

many SVMs classifier than two classes [28, 30]. One-against-all approach constructs

binary SVM classifiers and requires

comparisons for the decision, each one has the ability

to separate of which separates one class from all the rest. The classification of new samples

is done by a winner-takes-all strategy, in which the classifier with the highest output

function assigns the class.

arg

( )

(1.14)

In this way, the number of unclassified data is reduced. However, when the maximum

output value is obtained for two SVMs, it cannot assign the concerned data to a class.

Hence, this method leaves regions of the undecided feature space where more than one

class are accepted or all classes are rejected. This is demonstrated in figure 1.9.

Figure 1.9. Regions not classified by OAA approach for a problem of three classes.

( ) = 0

One-against-one:

Another combination method is "One-Against-One", also known as

"pairwise coupling", which builds one SVM for each pair of classes, or

(

1)/2

for a

problem with

classes. The classification is done by a max-wins voting strategy, in which

every classifier assigns the sample to one of the two classes. Then, the vote for the assigned

class is increased by one vote, and finally the class with the most votes determines the

sample classification.

( ) = 0

is the decision function that separates the class of the

class

by:

( ) =

( )

. During the classification step, max-wins voting strategy is

calculated as:

arg

,...,

(

( ))

(1.15)

However, when two classes have the same score,

will not be classified. This concept is

showed in figure 1.10.

Figure 1.10. Regions not classified by OAO approach for a problem of three classes.

Directed Acyclic Graph SVM (DAGSVM):

To resolve unclassifiable regions for the pairwise

classification, Platt et al. [31] proposed One-against-one in a DAG graph (decision-tree-

based pairwise classification), which uses directed acyclic graph (DAG) to reduce the number

of SVM that need to use during the testing phase. When having

classes, One-against-one

method requires

(

1)/2

classifiers, while the DAG method requires only to

classifiers. Fig. 1.11 shows the decision tree for the three classes. In the figure, shows that

does not belong to class . In this case, any pair of classes can be the top-level

classification of the tree. Except for the leaf node when

( ) > 0

, does not belong to

class . For example, if

( ) > 0

, does not belong to Class 2. Thus, it can belong to the

B C

Class 1 or 3 and the next classification pair is Classes 1 and 3 supplied by the decision

function.

Figure 1.11. Directed acyclic graph SVM method.

11.4. Summary

The objective of this chapter is to overview the main modules composed a character recognition

system. Depending on the application, each module has its own importance. In our case, we

focus on the development of a digit recognition system having three main modules: feature

generation, segmentation of the digit string to isolated digits and the classification based on the

SVM.

In the next chapter, we will introduce our system with a brief overview for recognition of

isolated handwritten digits.

( )

Not 2 Not1

Not 3 Not1 Not 2 Not1

Class 1

Class 3

Class 2

HAPTER

SOLATED HANDWRITTEN DIGIT RECOGNITION

This chapter investigates the combination of different statistical and structural features for

recognition of isolated handwritten digits, a classical pattern recognition problem. The objective

of this study is to improve the recognition rates by combining different representations of non-

normalized handwritten digits. These features include some global statistics, moments, profile

and projection based features and features computed from the contour and skeleton of the

digits. Some of these features are extracted from the complete image of digit while others are

extracted from different regions of the image by first applying a uniform grid sampling to the

image. Classification is carried out using one-against-all SVM. The experiments conducted on

the isolated handwritten digit NIST SD19 and the CVL Single Digit Database realized high

recognition rates, which are comparable to state-of-the-art methods on this subject.

2.1. Introduction

Handwriting recognition has been the premier research problem of the document analysis and

recognition community for over three decades now. The sub problems in handwriting

recognition mainly include line, word or character level segmentation, recognition of isolated

characters, words, or complete lines/paragraphs and recognition of numerical strings and

isolated digits. Among these different modalities of handwriting recognition, this chapter

focuses on recognition of isolated digits, a classical pattern recognition problem that offers a

wide range of applications. Unlike alphabet, the ten glyphs of the most commonly used Western

Arabic numerals are shared by many scripts and languages around the world making them

globally acceptable. The main challenges in handwritten digit recognition arise from variations

in size, shape, slant, and most importantly, the differences in the writing styles of individuals.

With the recent advancements in image analysis and pattern classification, sophisticated digit

recognition systems have been proposed which aim to enhance the overall recognition

performance by improving the feature generation or/and classification techniques used. Some

of the studies aim to improve the classification performance by using a combination of multiple

classifiers while others aim to combine multiple features and select the most pertinent and

optimum set of features for this problem.

In this chapter, we are interested in enhancing the feature generation step for isolated digit

recognition used for avoiding digit normalization. The idea is to find a combination of multiple

features which improves the overall recognition rates by minimizing the intra-class variability

and maximizing inter-class variability [21-23], the most desirable requirement of any pattern

recognition system.

22.2. Overview of isolated handw ritten digit recognition

Over the years, various handwritten isolated digit recognition systems reporting high

recognition rates have been proposed. Most of these systems have been evaluated on the

widely used MNIST database [32].

Among significant contributions to digit recognition, authors in leCun et al. 1995 [32] present a

comprehensive comparison of different classification algorithms on the recognition task. Heutte

et al. [33] proposed a combination of seven different features to feed a linear discrimination

based classifier.

The method of Cai and Liu (2001) [34] present an approach that integrates both statistical and

structural information for the recognition of unconstrained handwritten numerals.

Dong et al. [35] extracted a set of gradient features while Teow and Loe [36] computed linearly

separable features from the MNIST database and applied triowise linear support vector machine

with soft voting for classification. Belongie et al. [37] developed a novel similarity measure by

finding the correspondences between points in two shapes and estimating an aligning

transform. The proposed matching technique achieved high recognition rates when applied to

digit recognition.

In another notable contribution, Lauer et al. [38] proposed a trainable feature extractor based

on LeNet5 neural network architecture. Classification carried out using Support Vector Machine

realized promising recognition rates. A comprehensive survey on handwritten digit recognition

on CENPARMI, CEDAR, MNIST databases can be found in [39].

Recently, the handwritten digit recognition competition [40] held in conjunction with ICDAR

2013 also provided a platform for comparison of state-of-the-art digit recognition techniques

under the same experimental conditions. In this competition, we have participated with two (2)

methods for handwritten digit recognition (Tébessa I and Tébessa II). In the following, we can

be cited these techniques as follow:

Salzburg I method:

The approach is based on the Finite Impulse Response Multilayer

Perceptron (FIR MLP). First, the color images are transformed to 8 bit gray-scale images.

Then these gray-scale images are resized to 20×20 pixel images and their center of mass is

computed. Each scaled image is positioned by their center of mass in the center of a 28 × 28

pixel image. Each pixel value has been normalized into the range [-1, 1]. For the experiments,

a neural network framework has been adapted to the FIR MLP. This method uses one partially

connected FIR MLP with four layers.

Salzburg II method:

The description of this method is similar to the previous (Salzburg I), but

uses an ensemble of four FIR MLP partially and fully connected with four layers.

Orand method:

The approach is based on the combination of four descriptors which allow

exploiting three different characteristics of image digits. For the pre-processing, a thresholding

operation using Otsu's method is applied. For the feature generation, three different

characteristics of the digits are exploited: the stroke orientations based on the HOG approach,

the relation between background and foreground based on concavities, and the contour. In

particular, the image is divided into 2 × 2 regions. For each region a histogram of orientations

using 32 bins is computed. Then, the descriptor is produced by concatenating the region

histograms. The image is resized to 40 × 40 pixels. Then a thinning operation is applied. The

profile with respect to the left and right side is computed yielding an 80-size descriptor.

Finally, the digit descriptor is the concatenation of the four described descriptors yielding a

240-size descriptor. For classification, a multi-class SVM classifier using a RBF kernel is used,

the cost parameter is set to 6, and the gamma parameter is set to 1.4.

Jadavpur method

: The system is based on a Fuzzy-Entropy-based feature selection strategy

over a combination of Quad-tree-based longest-run and convex-hull-based feature sets with

SVM classifiers. The system is finally developed using the selected 190 features.

Paris Sud method:

The images are pre-processed as following: First, getting rid of

color, second, using down-sampling to 20×20 resolution and finally, placing the images on a

28×28 grid by centering their center of gravity. Then Hamming tree algorithm and the

AdaBoost.MH implementation are used as classifier. The chosen classifier has 47,642 trees of

4 nodes (5 leaves) each. In each boosting iteration 100 random Haar filters are tested, chosen

uniformly from the possible geometries.

François Rabelais method

: First, the image is cropped to the bounding box of the digit,

deleting the white border, and is surrounded with a 1-pixel margin. It is then magnified to

have a final size of 128×128. Finally a skew and slant normalization is applied. The black

pixels are considered as input data points, and reduce them using k-means algorithm. The

Delaunay Triangulation is built on the input data points. A multilevel static uniform zoning is

computed at K distinct orders. For each cell of a grid, two values are appended: the number

of input elements in the cell ,and the average of input elements in the neighborhood of the

cell. Both the centers of gravity of the triangles and the black pixels within the cell are input

elements. A SVM classifier (libSVM) with a RBF kernel is trained and then used for the

prediction.

Hannover method

: The numerals are normalized, binarized and slope corrected. The feature

vector is composed of three methods: first, number of black pixels in each row and column;

second, lengths of 12 probes in different directions from different positions; and third,

normalized central moments. The digits are classified with a nearest-k-neighbor classifier.

Tébessa I method

: This method is conducted with combination two structural features without

uniform grid method, background features and foreground features of the skeleton. For the

three remaining methods: global features, ridgelet transform and foreground features, the

image was divided into four regions by using uniform grid method. The recognition module is

based on the SVM multi-class approach using the one-against-all implementation. SVM and

Tébessa II method

: This method is based on multi-scale run length features which are

determined on the binary image taking into consideration both the black pixels corresponding

to the ink trace and the white pixels corresponding to the background. The probability

distribution of black and white run-lengths has been used. The proposed method includes

global features, the ridgelet transform, background features, foreground features and multi-

scale run length features, completing the system by a multi-class SVM classifier based on

approach one-against-all.

The objective of our study is to find a combination of features which achieves high recognition

rates on non-normalized isolated handwritten digits. We have considered global and local,

structural and statistical features [22-23,41] in our work. The features that we consider in our

study include Hu's moment invariants, skew angle, Zernike moments, profile and project based

features, background and foreground features and Ridgelet transform. The proposed approach

aims to combine these different features to best represent the digits.

22.3. Size normalization

To evaluate our technique on the same database but normalized. We used the size

normalization module which allowed adjusting the size, position and shape of the digit image in

order to reduce the shape variation between images. In this way, the feature generation task is

facilitated and to expect improving their classification [42].

In our case, the method based on bilinear interpolation was selected which assign each target

point a linear combination of four nearest source points and then performing the inverse

transformation. For example, suppose that the gray levels

u(i1, i2)

of an image are known

points with integer coordinates (0, 0), (1, 0), (0, 1), (1, 1), assigning a gray scale value at the

point (x, y) is effected by a bilinear interpolation [43], is defined by:

( , ) = (1

). (1

). (0,0) + (1

). . (0,1)

+ . (1

). (1,0) + . . (1,1) (2.1)

Other types of interpolation can be defined by using more points with integer coordinates. By

these methods, it is possible to increase the resolution of images fictively. In our work, the

image size is reduced to a normalized size 24 x 24.

22.4. Feature generation

Feature generation aims to express input data using a numerical representation or a set of

symbols (coding) to select the best set of features (feature vector) for a particular problem,

digit recognition in our case [22-23, 33, 39, 43]. Features are generally categorized into global

or local and statistical or structural features. Statistical features represent pattern classes by

statistical measures while structural features use formal structures for data representation.

Commonly used statistical features include moments, descriptors and geometrical

measurements etc. Examples of structural features include bends, end points, intersections,

loops and measures of concavity etc. [44]. Structural properties can sometimes also be

expressed using statistical measures. Each of these types of features can be extracted globally

from the objects (digits, characters, words or paragraphs) under study or locally from small

regions of these objects. However, each feature is more suited to one type or the other giving

rise to global and local features.

In the following sections, we provide an overview of the features used in our study.

2.4.1. Global features

Global features are computed from the image of the digit as a whole. The global features we

compute include the following.

2.4.1.1. Density

Computing the density would be to count the number of black pixels and dividing by the total

number of pixels.

2.4.1.2. Center of gravity

The centre of gravity, or sometimes called as centre of mass, is a point in an image in which its

mass is concentrated. They characterize images in a way that has analogies to statistics.

Generally, an image may be considered as a Cartesian density distribution function I(x, y),

where I(x, y) is the intensity of the pixel. Then, the two-dimensional geometric moment of

order (p+q) of a function I(x, y) is defined as:

( , ) dxdy (2.2)

where

is the basis or the weighing kernel with , = 0,1,2, . . , . The basis may have

properties that are passed on to the moments, producing descriptors that are invariant to scale,

translation or rotation.

In order to express the above equation in a discrete form, the image has been sampled into

square unit pixels.

( , ) (2.3)

The zeroth order moment defines the total mass of the image (segmented image, its area, total

number of pixels).

( , ) (2.4)

The tow first orders geometrical moments define the centre of gravity of the image,

respectively the centre of coordinates of an image:

(2.5)

These two coordinates represent the center of gravity of the digit image. Then, the central

moments in discrete form can be expressed as :

(

) (

) ( , ) (2.6)

2.4.1.3. Second order geometrical moments

The second order geometrical moments are a statistical measure of the allocation of pixels

around the center of gravity. Second-order moments

and

describe the distribution of

mass of the image with respect to the coordinate axes. In mechanics, they are called the

moments of inertia. It may be used to determine the principal axes, image ellipse and radii of

gyration.

2.4.1.4. Number of transitions:

The number of white to black transitions (or vice versa) counted at each pixel in the four

principal directions.

2.4.2. Hu's Moment Invariants

Hu proposed the application of moment invariants to image analysis and object representation

problems in [45, 46]. Since then, they have been effectively applied to a large number of shape

matching and similar problems. Hu's seven moments are invariant with respect to position,

scale and orientation. These moments capture information on image area, centroid and its

orientation.

Hu's moments invariant are calculated using combinations of second and third order normalized

central moments. The normalized central moments are given by

(2.7)

where

+ 1 ( + )

(2.8)

Hu's seven invariant moments are then calculated by

, (2.9)

= (

) + 4

(2.10)

= (

) + (3

) ,

(2.11)

= (

) + (

) ,

(2.12)

= (

)(

)[(

)

) ] + (3

)(

)[3(

)

(

) ], (2.13)

= (

)[(

)

(

) ] + 4

(

)(

), (2.14)

= (3

)(

)[(

)

) ]

(

)(

)[3(

)

(

) ]. (2.15)

The seventh moment invariant, , is also skew invariant. These seven invariant moments were

calculated for each binary digit image.

2.4.3. Skew

The skew or orientation of the digit is calculated using Radon transform of the image [47, 48].

Radon transform of the image is the sum of radon transform of each pixel in the image. The

radon function takes parallel beam projections of the image from different angles and the skew

angle is determined based on the maximum value of radon function which is used as a feature.

2.4.4. Zernike moments

Zernike moments [49] have been widely employed in a wide variety of pattern recognition

problems and we use them in our study for characterizing the digits. For efficient computation

of Zernike moments, we implemented the method in [49] which is based on recurrence

relations for fast computation of radial polynomials of Zernike moments.

Zernike complex moments are constructed using a set of Zernike polynomials. Zernike

polynomials are a sequence of orthogonal polynomials on the unit disk and they can be

expressed as:

( , ) =

( )

(2.16)

where i =

1 ,

is the order of the radial polynomial and

is positive or negative integer

according to the conditions

| | even, |m|

n, representing the repetition of the azimuthal

angle. From

and

we obtain

and

by simple conversion to polar coordinates ( =

). The radial polynomial R

( )

is defined as:

( ) =

( 1)

(

| |)/

(2.17)

Zernike moments for a digit image can be obtained by making use of the complex conjugate

property, which is Z

= Z

. We will skip directly to the formulation of the moments in

adequate form:

( , )

( , ),

1 (2.18)

where

I(x, y)

is the image function and denotes the complex conjugate with order

and

repetition

. The procedure for computing Zernike moments can be seen as an inner product

between the image function and the Zernike basis function.

If an image function ( , ) having Zernike moments

is rotated counterclockwise by

angle ; the transformed image function is ( , ) = ( ,

)

, by simply substituting that into

the equation. The discrete form of the Zernike moments of an image is expressed as follows:

and

(2.19)

The amplitudes of Zernike moments

are rotation invariant. Invariance to the scale and

translation can be obtained by shifting and scaling the image before the computation of Zernike

moments. In our implementation, we compute up to fourth order Zernike moments.

2.4.5. Projections

Horizontal and vertical projections are determined by counting the total number of text pixels in

each row/column of the image. These values are normalized by the width/height of the image.

The mean and variance of these projections are used as features in our study.

2.4.6. Profile Features

Left and right profiles are computed by considering for each image row, the distance between

the first text pixel and the left (right) boundary of the digit image. Like projections, the profiles

are normalized to the interval [0 1] and the mean and variance of these profiles are employed

as features.

2.4.7. Background features

The background feature vector is based on the concavity information. These features are aimed

at capturing the topological and geometrical properties of the digits. Each concavity feature is

the number of white pixels belonging to a specific concavity configuration [41]. The label for

each white pixel is chosen based on Freeman code with four directions. Each direction is

explored until a black pixel or the extremity of the digit is met.

Figure 2.1. Different configurations of concavity

In addition to the nine standard concavity configurations, we also consider five additional

configurations to more accurately model the loops in digits. These configurations are illustrated

in Figure 1 and produce a 14 dimensional feature vector which is normalized between 0 and 1.

Figure 2.2 shows concavity labels of the background pixel for a sample of digit `9'.

Figure 2.2. Concavity labels for digit `9'

2.4.8. Foreground features

The foreground features are computed from two different representations of digit, contour and

skeleton. Each of these types of features is discussed in the following.

2.4.8.1. Contour Based Features

These features are aimed at capturing the dominant orientations in the shape of the digit and

are computed from the contour of the (digit) image. The contour is detected using

morphological operations [18, 44, 41, 50] and is represented by Freeman chain codes

traversing the pixels in clockwise direction. This generates a string of codes in the interval [1 8]

for the contour of a digit. The normalized histogram of these codes is then computed and is

used as feature to characterize the digit (Figure 3).

This histogram of contour chain codes is effective in capturing the dominant stroke directions

(horizontal, vertical or diagonal). However, these features are very sensitive to noise and also

fail to capture the structure and topology of the digit.

Figure 2.3. Contour detection: (A) Contour of the upper region, (B) Feature vector, and (C) 8-

Freeman directions

2.4.8.2. Skeleton Based Features

These features are computed from the skeleton of the image of the digit. The skeleton of the

image is computed [2, 8] and is searched for end points, crossings (interconnections) and

(horizontal and vertical) directional points.

The neighbor N(p) of a current pixel

on the skeleton is calculated by using the 8-Freeman

directions in the clockwise such as:

N(p) =

I(np ) (2.20)

where

is the i

neighbor of

. Then, the end point is defined when ( ) = 1 , the

directional points is defined when ( ) = 2 and the Interconnection Point (IP) is defined when

( ) > 2 .

Figure 4 illustrates some examples of each type of points (labeled as 1, 2, 3 and 4 respectively)

in a digit. The normalized histogram of the occurrence of these points in a digit is used as a

feature.

Figure 2.4. Features extracted from the skeleton

These skeleton based features compute the structural information of the digit but like contour

based features; these features are also sensitive to noise in the image.

2.4.9. Ridgelet Transform

The Ridgelet transform defined by Candès and Donoho [51] has been effectively employed for

pattern recognition [52]. The Ridgelet transform combines the Radon and wavelet transforms.

Radon transform has the ability to detect lines in the image while the wavelet transform allows

detecting line singularities along the Radon slices.

Ridgelet transform has been successfully applied to a number of problems including image

compression, image transform coding, character recognition, watermarking, texture based

image retrieval and biometric identification [52].

The Ridgelet transform is based on the Radon transform which is computed on several angular

directions. Radon coefficients correspond to projections representing the shadow of the shape

at each angle [51]. Consequently, significant linear features in any direction are expressed by

high magnitudes. Thus, in order to characterize linear singularities, the one-dimensional wavelet

transform is applied on Radon slices to yield the Ridgelet coefficients. Hence, along the Radon

axis projection, the Ridgelet is constant while in the direction orthogonal to these ridges it is a

wavelet [53].

For an image ( , ), the Ridgelet transform can be computed by first calculating the Radon

transform as defined in [52].

( , ) =

(

) (

cos( ) +

sin( )

)

(2.21)

Where , and are Dirac distribution, angular and radial variables, respectively.

The 1-D wavelet transform is then applied on each Radon slice in order to obtain the Ridgelet

coefficients

( , ). The sum of the normalized values of the coefficients is used as feature.

2.4.10. Region sampling: uniform grid

Uniform grid sampling [54] is applied to the image of the digit which allows extracting features

from different regions of the image separately. A uniform grid creates rectangular regions for

sampling where each region is of the same size and has the same shape. For a given image of

size H x L, the position of horizontal and vertical grid lines for sampling is determined as

follows.

= ×

i = 1,2, ... , n

(2.22)

Where p is the vector of line positions, n is the number of horizontal or vertical regions, and k is

the width or the height of the image. Figure 5 illustrates an example of a digit split into a 2x2

grid. Once the image is divided into different regions, features are extracted from each region

separately. This allows a different level of granularity and features extracted from similar

regions of the digit can be compared with one another allowing more effective matching.

Figure 2.5. Example of splitting a digit using a uniform grid (2x2).

A summary of the features used in our study along with the dimensionality of each is

summarized in Table 2.1.

Feature

Description

Dimension

Global features

Hu's Moment Invariants

Skew Angle

Zernike Moments

Projection Histograms

Profile based features

Background features

Histogram of contour chain code

Skeleton based features

f10

Ridgelet transform

Table 2.1. Summary of features

22.5. Recognition

The proposed recognition engine is based on SVM multi-class approach using the one-against-

all implementation [28, 29]. The features discussed in the previous section are extracted from

the training data set (to be discussed in the next section) and are fed to the SVM. Two

important parameters required for training the SVM include the regularization parameter (C)

Thus, some properties should be considered between C and .

A low value of C (<1) error rate increases with higher value of .

Large values of C counter-balance the bias introduced by large .

Very small value of may result in good training data error rate but won't be useful in the

case of test data recognition error rate.

With large value of the Gaussian kernel of RBF becomes almost linear.

We consider an optimal pair of

and C

by testing and comparing the accuracy rates of the

corresponding experimental SVMs using the validation database.

The next section presents the experimental setup and the corresponding results.

2.6. Experimental results

In order to validate the concept of feature generation as well as to show the robustness of our

system, we ran two different experimental results on two databases. The first experiments

conducted on the isolated handwritten digit NIST SD19 and the second experiments conducted

on CVL Single Digit Database.

2.6.1. Experimental results on NIST SD19

This work [55] is devoted to generating pertinent features in order to find the best ones of

normalized and not normalized isolated handwritten digits. The algorithm that we propose

includes global features, the Ridgelet transform, background and foreground features,

completing the system by a multi-class SVM classifier based on the One-Against-All

implementation.

The experiments are conducted on NIST SD19 database [56] containing not-normalized isolated

handwritten digits stored as binary images. The isolated handwritten digits have the advantage

of being abundantly available in the form of a labeled database. This database is divided into 3

divisions (hsf_0123, hsf_4 and hsf_7).

Therefore, we evaluate the efficiency of various combination features on division hsf_0123 from

which we extract 120,000 handwritten digits. We divide this database into three different parts:

a training database containing 60,000 digit images and a validation database containing 30,000

digit images, while the remaining 30,000 digit images are kept for testing.

Various feature methods are generated from the normalized and not normalized isolated digit

image in order to evaluate its influence. We test five feature sets independently to SVM

classifier, which are Global features (f1), Ridgelet transform (f10), background features (f7)

and foreground features (Histogram of contour chain code (f8) and Skeleton based features

(f9)) extracted from the images. The experiments are carried out with and/or without the

uniform grid.

Finally, the recognition module is based on the SVM multi-class approach using the One-

Against-All implementation. SVM and RBF kernel parameters are fixed to C=10 and

= 8,

respectively.

In the following experiments, we provide various combination of the features used in our work.

Experiment 1

: This experiment is performed with the combination of structural features

without uniform grid, in order to assess the quality of recognition. This combination appears

as follows: Global features (f1), Ridgelet transform (f10), background features (f7) , histogram

of contour chain code (f8) and skeleton based features (f9).Generally, the final features vector

is composed of 35 components. This evaluation is illustrated in the Table 2.2.

Experiment 2:

This experiment is performed with the uniform grid which is divided into four

regions each of which is the same size. Features are generated from each region as described

in Experiment 1. Finally, the global feature vector is composed of 140 (35x4) components.

This evaluation is illustrated in Table 2.2.

Experiment 3:

This experiment is conducted by combining two structural features background

features (f7) and Skeleton based features (f9)). For the three remaining methods: Global

features (f1), Ridgelet transform (f10) and Histogram of contour chain code (f8), the image is

divided into four regions by using the uniform grid method. Then, we generate for each

region 17 components. Finally, the global feature vector is composed of 86 (l7x4+ 14+4)

components. This evaluation is reported in Table 2.2.

Class

Experiment 1

Experiment 2

Experiment 3

Normalized

Not normalized

Normalized

Not normalized

Normalized

Not normalized

98.30

99.30

95.40

98.90

98.30

99.60

97.10

98.90

97.50

98.20

99.20

99.10

97.40

97.50

96.20

98.10

99.00

99.40

98.70

99.30

98.50

98.80

99.10

98.20

98.00

98.20

98.50

98.90

99.30

98.70

98.90

97.50

98.20

99.50

99.30

98.20

98.10

95.10

96.10

99.20

98.90

97.20

96.80

96.60

97.00

97.80

97.80

97.90

98.20

96.00

96.80

99.20

99.20

98.40

98.60

98.40

99.00

99.60

Overall rate

98.01

98.36

96.96

97.90

98.89

99.07

Table 2.2. The recognition rate for various experiments

We can note that the overall rate is acceptable for Experiment 3. However, some digits are

confused and often they are visually very similar.

This evaluation is significant because the results given in Table 2.2 show that the combination

of features results in an important improvement overall rate. However, we can see that the

foreground features of the skeleton and the background features without uniform grid method

allow better discrimination of digits compared to Experiments 1 and 2, respectively.

Experiment 3 shows that the use of background features without uniform grid method

allows locating effectively concave regions , for example, loop, the same way as for Skeleton

based features. It is used in order to reduce confusion between the classes "4" and "7" for

example. However, we can note that the methods of structural feature generation used work

well with the database of the not normalized digits.

The digit image is divided into separate regions via splitting sampling allowing the generation of

features having pertinent information. The use of a uniform grid allows this information to be

improved upon for each feature by defining different levels of granularity . Features generated

in various experiments show that the most effective features are structural, strokes and

concavities when used in combination with a 2x2 region.

2.6.2. Experimental results on CVL Single Digit Database

The proposed methodology [57] was evaluated on the CVL Single Digit database [40]. The

database comprises 7,000 digits for training, a validation set of equal size, and an evaluation

set consisting of 21,780 digits. These samples are contributed by different writers with varying

writing styles. Prior to feature generation, we binarize all images using the KittlerMet

binarization method [58]. A summary of the features used in these experiments, seven of these

features (f2, f3, f4, f5, f6, f7, f9) are extracted directly from the image of the digit while three

features (f1, f8 and f10) are extracted from different regions of the image by applying the

uniform grid method. The features are directly extracted from the binary images of digits

without any size normalization. Then, the proposed recognition module is based on SVM multi-

class approach using the one-against-all implementation. SVM and RBF kernel parameters are

Feature

Precision (%)

Recall (%)

93.81

93.80

17.13

20.02

14.78

15.80

78.62

77.22

39.36

39.26

53.15

49.07

85.05

84.78

20.40

20.52

18.51

19.62

f10

25.61

23.27

Table 2.3. Recognition results on individual features.

The performance of the system is quantified by computing the precision and recall. We first

present the results of individual features summarized in Table 2.3. It can be seen that while the

performance of these features varies significantly, the global features (f1) extracted from the

four regions of the digit image outperform all other features reporting precision and recall of

approximately 94%. Zernike moments (f4) and background features (f7) also realize acceptable

recognition rates while the performance of rest of the features is not very impressive when

evaluated individually.

Table 2.4 summarizes the performance of some of the feature combinations that we have

tested. Almost all the combinations report high recognition rates whereas the highest

recognition rate (recall and precision of 96.62%) is achieved when combining all the features

(f1-f10).

Features combinations

Precision (%)

Recall (%)

f1, f2, f3, f4, f5

94.86

94.88

f6, f7, f8, f9, f10

95.69

95.70

f1,f4,f7

96.16

96.18

f2, f3, f5, f6, f8, f9, f10

82.58

82.64

f1,f4,f7,f8,f9

96.18

96.23

f1, f2, f3, f4, f7, f8, f9

96.19

96.25

f1, f2, f3, f4,f7, f8, f9,f10

96.30

96.31

f1, f2,f3,f4,f5,f7,f8,f9,f10

96.61

96.62

f1-f10

96.62

Table 2.4. Recognition results on feature combinations.

We also carry out a series of experiments to compute the precision and recall for each of the

digits (0-9) separately. The recognition results for each digit are summarized in Table 2.5. It

can be observed from the results in Table 2.5 that in general, the precision and recall values are

more or less consistent across different classes. Relatively low recall is achieved on some digits

(7, 8 and 9). Furthermore, it should be noted that some pairs like ('0','8'), ('2','7') and ('3','8')

offer relatively less inter class variations making their recognition more challenging.

Class

Precision (%)

Recall (%)

97.77

98.67

96.16

98.76

97.49

98.16

95.85

95.36

95.59

97.61

96.74

97.80

98.12

96.88

95.50

96.79

95.41

95.10

91.87

Average

96.62

Table 2.5. Recognition rates on individual digits

We also compare the performance of the proposed combination of features with state-of-the-

art digit recognition systems submitted to the Digit Recognition Competition held in conjunction

with ICDAR 2013. The database (CVL) and evaluation protocol considered in our study is the

same as that of the aforementioned competition. This comparison is summarized in Table 2.6

where the precision of other systems has been reproduced from [40].

Rank

Method

Precision (%)

Normalized Digits

Salzburg II

97.74

Yes

Salzburg I

96.72

Yes

Our method

96.62

Orand

95.44

Yes

Jadavpur

94.75

Paris Sud

94.24

Yes

François Rabelais

91.66

Yes

Hannover

89.58

Yes

Tébessa II

78.43

Yes

Tébessa I

77.53

Table 2.6. Comparison of proposed method with state-of-the-art methods [40]

It can be seen from Table 2.6 that the proposed method realizes better performances than

most of the systems submitted to the competition. Two systems (Salzburg II and Salzburg I)

report slightly better recognition rates with precisions of 97.74% and 96.72% respectively. It

should however be noted that our method does not require any size normalization and the

features are directly extracted from binarized images of isolated digits. The combination of

multiple features in our case reduces the confusion between digit classes and consequently

results in high values of precision and recall.

22.7. Summary

The objective of this chapter was to find a representation of isolated handwritten digits that

allows their effective recognition. We proposed a combination of different statistical and

structural features, some are computed from the complete image of digit while others are

computed by first applying uniform grid sampling to the image. This combination of features

was investigated using SVM as classifier. The experiments conducted on a standard database of

isolated digits without any normalization of images realized high recognition rates comparable

with the state-of-the-art methods proposed on this problem.

The initial results obtained by the proposed combination of features are very encouraging. In

the next chapter, the concepts about segmentation of two connected handwritten digit

recognition are discussed.

HAPTER

EGMENTATION OF TWO CONNECTED HANDWRITTEN DIGIT

RECOGNITION

Connected digits are the frequent observed situations occurred on the digit string. Hence, we

propose in this chapter a segmentation-verification system using conjointly the oriented sliding

window and support vector machine (SVM) classifiers. The oriented sliding window is used for

finding at the same time the interconnection points and the optimal angle for cutting the

adjacent digits. Whilst, SVM-based segmentation-verification using the global decision module

allows the rejection or acceptance of the processed image. Experimental results conducted on a

large synthetic database of handwritten digits show the effective use of the oriented sliding

window for segmentation-verification.

3.1. Introduction

Automatic reading of digit fields from an image document has been proposed in several

applications such as bank checks [1], postal code [2] and forms [3]. The usual approach for

designing a handwritten digit string recognition system is based on the segmentation-

recognition which can be performed mainly in four steps [4]: preprocessing, segmentation,

feature generation and classification. The preprocessing step generally allows transforming the

grayscale image of the digit string into the binary image for facilitating further processing. The

segmentation step is used for separating the overlapped and/or connected adjacent digits into

elementary digits in order to deduce the possible distinct classes [5-9]. After that, a feature

generation is performed on the digit image for reducing the dimension of the representation

and thus makes the design of the classification system. Next, a decision function allows

assigning a digit image to predefined class.

In order to improve the reliability of the system, an additional step is performed for verifying

the correct segmentation using the global decision stage for rejecting or accepting the

processed image. The segmentation-recognition with verification is usually namely

segmentation-verification [59]. Generally, algorithms based on the segmentation-recognition

approach are faster than segmentation-verification since a number of separation attempts are

smaller.

The design of a robust handwritten digit recognition depends of many factors such as the

techniques used for processing the image, algorithms for separating the adjacent digits,

methods for generating features and the type of classifier used for recognizing the segmented

digits. In this chapter, we only are interested in the segmentation stage since it is considered as

the most difficult task for designing robust handwritten string recognition. Hence, three cases

can occur when attempting to segment two adjacent digits: distinct digits, overlapped digits or

connected digits. In most cases, connected digits are the frequent observed situations [60].

Furthermore, the connection between adjacent digits can be simple (only one connection of

adjacent digits) or multiple (many connections of adjacent digits). The simple connection is the

frequent situation encountered comparatively to the multiple connections. Various developed

algorithms are more robust for separating simple connection than multiple connections [9].

33.2. Overview of tw o connected handw ritten digit

recognition

Various explicit segmentation algorithms of two handwritten connected digit have been

proposed based on the morphology of the connected digits for finding one or multiple

interconnection points [9], which are based on the contour [61-63], the skeleton [42,64], the

reservoir [65] or by combining simultaneously the contour, profile and the skeleton [8,66].

More recently, these algorithms have been compared objectively using synthetic and real

connected digits [9]. In the following, we briefly review the most popular algorithms to segment

two connected digits according the type of the morphology for detecting the cutting path.

Many algorithms based on the contour and profile have been proposed for segmenting two

connected digits (see Table 3.1). The first one has been proposed by Fenrich and

Krishnamoorthy (1990) [67], which use the vertical histogram projection and contour

information of two touching digits in order to derive valley and peak points. The cutting path is

then decided to segment the touching pattern by joining valley and peak points. However, the

main difficulty of using this method is the detection of valley and peak points when adjacent

digits are strongly connected. Fujisawa et al. (1992) [62] proposed to analyze the distance

between upper and lower profiles in order to detect the segmentation points, which are used to

build a segmentation graph for defining the best segmentation hypothesis. However, when the

handwritten digits strongly skewed or overlapped, the projection profile cannot reach the

touching points and therefore the segmentation points are not detected. Congedo et al (1995)

[61] proposed an algorithm based on the concept of the drop fall which involves a set of rules

dictating that the next pixel is in the direction of the fall of water drop on the contour. This

algorithm has proved to be an efficient segmentation method due to its simplicity and

effectiveness. However, the main difficulty of using this algorithm is finding the best start point

which can be in the wrong place. Therefore, completely ineffective segmenting paths will be

obtained. Shi and Govindaraju (1997) [63] proposed to segment two connected digits followed

by using the chaincode for detecting the most significant right turns at each point of touching

according the opposite contour point. Both right turn and their opposite contour points are

located to split the contour into many pieces, which are then classified as belonging to one or

the other digit. The final segmentation point is then determined by a vertical line bisecting the

image using a set of heuristic rules like height, width and position of adjacent connected

components, plus the distance between them. Kyung et al. (2002) [68] proposed using

structural features of contours to generate multiple hypotheses. These hypotheses based on an

analysis of the ligature and the characteristics of candidate break points and verified by MLP

numeral classifier. Moreover, Pal et al. (2003) [65] proposed an improvement of the Drop falls

algorithm based on the concept of water reservoir to find the best starting point for the drop in

order to avoid the local minimum. This method is based on research of the large area

"reservoir" where the digits are touching. The reservoirs are obtained by considering areas

resulting from a water drop on the component from the top or bottom of this one. This method

is effective compared to the drop fall algorithm. However, it cannot reach correct segmentation

due to wrong reservoir. Later, Lei et al. (2004) [69] proposed a recognition-based segmentation

of touching handwritten numeral strings with help of external and internal contour analysis and

projection analysis; these features are utilized to define the segmentation lines. Furthermore,

Ma et al. (2008) [70] described a touching pattern-oriented strategy for segmenting touching

digits in order to solve the problem of determining the best separation according set of Drop-fall

algorithms[61] can produce four possible cutting paths(Descend-left, Descend-right, Ascend-left

and Ascend-right). However, Wang et al. (2009) [71] proposed a recognition-based approach to

handwritten numeral string segmentation. This method generated all possible candidate

segmentations of an input string through contour and profile analysis, and then the feature

vectors containing recognition information of the statistical modeling concepts of Gaussian

Mixture Model (GMM).

The segmentation method based on the skeleton initially has been proposed by Chen and Wang

(2000) [42] by using both background and foreground analysis for simple or multiple touching

digits. A thinning algorithm is used in order to obtain the background and foreground skeletons

of the given image. Fork points, end points, and bend points from the skeleton are used as

features. This algorithm allows removing a useless stroke, which could disturb recognition.

Thereafter, all segmentation hypotheses are combined into a segmentation graph, and a

genetic algorithm is used to search among all the possible outputs. The segmentation-

recognition approach is used for recognizing the segmented digits. More recently, Everton and

Carlos (2013) [72] proposed the clustering the feature points selected from the skeletal

touching regions via Self-Organizing Maps (SOM) for segmenting single or multiple connected

handwritten digits. This method uses two parallel processes to define the segmentation points.

Firstly, the image is thinned and the branch points of the skeleton are detected and selected.

Secondly, features are extracted by scanning the image left to right, top to bottom, and saving

simultaneously the coordinates of the foreground pixels. These features are clustered using

Self-Organizing Maps (SOM) into three regions: left, right and middle region. Then, the

segmentation points are defined as the feature points which are close to nodes of the SOM

network. Elnagar and Alhajj (2003) [64] proposed an alternative approach to split pairs of

connected digits based on a segmentation-recognition approach using two agents. The pattern

is first preprocessed, normalized and thinned to obtain uniform stroke width, and then the

feature points are extracted from the skeleton and contour. Segmentation points are identified

near the decision line estimated from two agents, namely, top-valley based and bottom-hill

based and two candidate cut-points from each of them. After separation, the pattern is then

restored using heuristics. Besides, the Suwa method (2005) [3] proposed a thinning-based

segmentation strategy for a couple of handwritten connected digits graph representation. This

method basically improved previous method is reported by Elnagar and Alhajj method.

Almost at the same time, Oliveira et al [59, 66] proposed to combine structural features as the

contour, the profile and the skeleton points for generating segmentation cuts. The relationships

of feature points determine the list of segmentation hypotheses of the connected digits. All

segmentation cuts are produced using line segments connecting the segmentation points.

Neural network is used to evaluate the correct segmentation path according the segmentation-

verification approach for accepting or rejecting a handwritten segmented digit [59]. The main

difficulty of using this method is that it generates an over-segmentation or under-segmentation

which can be easily confused with an isolated digit. Later, Oliveira et al. method (2002) [59]

improved their algorithm by adding a new verification scheme containing two verifiers. A new

feature set is also introduced to feed the over-segmentation verifier. All segmentation points

are used to build a segmentation graph. The best segmentation hypothesis is the shortest path

of the graph. A postprocessor is used to resolve some problems when verifiers failed and vice

versa, and the global decision module makes an accept/reject decision.In order to reduce the

number of segmentation cuts, Vellasques et al. (2008) [8] proposed the use a filter for

discriminating isolated digits from over-segmented pieces. In this way, unnecessary cuts can be

avoided in a segmentation graph of Oliveira et al. method [59] without any attempt of

classification by a general-purpose classifier. However, the difficult task for finding the best

cutting is that adjacent digits can be oriented which leads to an over-segmentation or under-

segmentation [66].

Recently, Gattal and Chibani [73-74] proposed a system recognition-based segmentation to

recognize handwritten connected digits, the proposed approach allows separating connected

digits according the connection configuration by finding the interconnection points (IPs), after

that, in the work [73], we fixed sliding window around IPs. Otherwise, in the work [74], we set

a crossing oriented window around IPs with sloping angle for finding correctly cutting path.

Moreover, the segmentation and recognition module use the global decision for choosing the

correct segmentation path.

Features

Methods

Approach

Data

Size

Perf. (%)

Contour

and

profile

Fenrich and Krishnamoorthy (1990) [67]

Rec-based

ZIP codes

450

94.90

Fujisawa et al. (1992) [62]

Rec-based

Proprietary

920

95.00

Congedo et al (1995) [61]

Seg-Rec

CEDAR

91.00

Shi and Govindaraju (1997) [63]

Seg-Rec

CEDAR

1,966

80.80

Kyung et al. (2002) [68]

Rec-based

NIST SD19

3,500

92.50

Pal et al. (2003) [65]

Seg-Rec

French bank

checks

2,250

94.80

Lei et al. (2004) [69]

Rec-based

NIST SD19

3,359

97.72

Ma et al. (2008) [70]

Rec-based

NJUST603HW

1,591

88.18

Wang et al. (2009) [71]

Rec-based

NIST SD19

2,000

97.95

Skeleton

Chen and Wang (2000) [42]

Seg-Rec

NIST,

proprietary

4,500

96.00

Everton and Carlos (2013) [72]

Seg-Rec

Ribas et al. [9]

323

65.79

Elnagar and Alhajj (2003) [64]

Seg-Rec

NIST, CEDAR,

proprietary

96.00

Suwa method (2005) [3]

Seg-Rec

NIST SD19

2,000

88.40

Contour,

profile

and

skeleton

Oliveira et al (2000) [66]

Rec-based

Brazilian bank

checks

900

95.24

Oliveira et al. method (2002) [59]

Rec-based

Brazilian bank

checks

900

98.50

Gattal and Chibani (2011) [73]

Rec-based

NIST SD19

250

86.26

Gattal and Chibani (2012)[74]

Rec-based

NIST SD19

600

92.46

Table 3.1. Comparison of different segmentation methods

This chapter proposes a segmentation-verification scheme for segmenting simple and multiple

connections using an oriented sliding window. Two morphological features are combined based

on the contour and the skeleton for detecting and splitting correctly the connected digits. In

order to avoid the over-segmentation and the under-segmentation, a sliding window is used for

finding interconnection points. When the interconnection points are found, the window is

oriented to define the optimal cutting path. The recognition and verification process is

performed using the Support Vector Machines (SVM) through the One-Against-All (OAA)

implementation.

33.3. Segmentation-verification system

The main difficult task for segmenting two connected digits is to find the interconnection points

and the optimal path for cutting the digit image into isolated elements [62, 75-77]. Figure 3.1

illustrates some difficult samples extracted from the NIST SD19 database [56].

Figure 3.1. Different types of connected digit samples.

Hence, the design of the proposed segmentation-verification system can be conducted into four

main steps: segmentation of connected digits by searching the interconnection points using the

oriented sliding window, generation of features on isolated digits, recognition of the segmented

digits using the SVM-OAA implementation and verification for accepting or rejecting the digit

recognition. More precisely, the segmentation of connected digits uses two complementary

structural features (contour and skeletal points) for detecting the interconnection and base

points and finding the optimal path. It can be performed through the following steps:

(1) Generate segmentation points;

(2) Set sliding window on the Interconnection Point (IP) positions;

(3) Rotate the window around IPs for finding correctly the cutting path;

(4) Scan all the possible relationships of Base Points (BP) and IP to determine the list of

segmentation hypotheses on the window.

(5) Evaluate the correct segmentation path using a classifier based on SVM-OAA.

In the following, we describe each step of the proposed segmentation-verification system for

segmenting two connected digits. This system has been published as shown in the reference

[78].

3.3.1. Segmentation of connected digits

The segmentation of connected digits can be performed by setting the window on a position in

order to find the IP and then searching the optimal cutting path by rotating the window around

the IP. These two procedures are explained more precisely in the following.

3.3.1.1. Finding the interconnection points

The finding of interconnection points is inspired from Oliveira et al. method (2000) [66] from

which we propose some modifications [73-74]. It involves analyzing the number and nature of

IP between two connected digits in order to define the optimal position for cutting a digit image

couple [8].

The first step is to define IPs and BPs from which starts the segmentation. BPs are obtained

from the local extrema (minima and maxima) detected on the contour (In contrast to Oliveira et

al. method (2000) [66], BPs are obtained from the detected local extrema on the contour and

profile of each connected component). While IPs are calculated using the Freeman code

according to 8 directions in the clock-wise. The upper contour is used in order to detect the

upper Base Point (Upper BP) whilst the lower contour is used to detect the lower Base Point

(Lower BP). Often, the connection points contain IPs, BPs or both at the same time.

Therefore, three hypothesiss can be considered for the optimal segmentation (Figure 2):

Hypothesis 1:

If the Euclidean distance between the projection of the BP and the IP is lower

than a threshold, the segmentation is making between IP with Upper BP and also between IP

with Lower BP (Figure 3.2.a).

Hypothesis 2:

If the lower segment of IP is related to a higher segment of IP (or vice versa)

and both IPs are near a BP, the skeleton path linking both IPs (Skeleton path) is used as part

of the segmentation cut with complementary paths between BPs and IPs (Composed path)

(Figure 3.2.b).

Hypothesis 3:

In some cases, even if there is a connection between two digits, the skeleton

path does not have an IP. Thus, to avoid the under-segmentation (lack of segmentation

point), the Oliveira et al. method (2000) [66] enables cut path directly from BPs from profile.

In our case, the segmentation path is based on the minimal Euclidean distance between

Upper BP and Lower BP from contour (closest points) in the middle (Figure 3.2.c).

The main problem of this algorithm is that it does not allow correctly finding the cutting path

when connected digits are oriented. Hence, we propose a method based on the oriented sliding

window for finding at the same time the IP and the cutting path.

Upper BP

Lower BP

Upper BP

Lower IP

Upper IP

Upper BP

Lower BP

Lower IP

Upper IP

Skeleton path

Composed path

Lower BP

Composed path

Upper BP

Lower BP

(a)

(b)

(c)

Figure 3.2. Possible segmentation paths using IP and BP: Hypothesis 1 (b) Hypothesis 2 (c)

Hypothesis 3.

3.3.1.2. Finding the cutting path

Finding the cutting path is based on the search of the angular cutting for the optimal

segmentation. This search can be conducted through the following steps [74]:

Detecting the presence of IP by using a fixed window having the same height as the original

image and a constant width W1, the IP is thus located in the middle of this width.

Rotating the window in the range

]

,...,

[

with an angular step around IP. Figure 3.3

shows various crossing of the oriented window around IP.

Finding the cutting path for each oriented window: if a single IP is found, then the cut is

making between single IP with Upper BP and also between single IP with Lower BP when the

Euclidian distance between IP and BP (Upper BP or Lower BP) is lower than a threshold T1. If

there are two IPs (Upper IP and Lower IP), then the hypothesis 2 is applied. Otherwise, the

hypothesis

order to avoid under-segmentation. Figure 3.3 shows various segmentation paths according

the orientation of the window.

Figure 3.3. Crossing and the segmentation paths according the orientation of the window.

3.3.2. Feature generation

Various feature generation methods have been developed for handwritten digit recognition

[21,79,55,57]. In our case, we consider global and local structural and statistical features [80-

81], which are extracted from different regions of the isolated image digits by applying the

uniform grid method [54]. The selected features extracted from each region include [57]:

Global features as the density, center of gravity, second order geometrical moments and

number of transitions.

Hu's Moment Invariants.

Radon transform of the image for estimating the skew of the digit.

Zernike moments.

Horizontal and vertical projections.

Left and right profiles.

Concavity information deduced from the background of the digit image.

Contour and skeleton deduced from the foreground of the digit image.

Ridgelet Transform.

Finally, the feature vector is composed by concatenating all mentioned features. It is useful to

note that these features have recently been used in Ref. 57 for realizing highest recognition

rates.

3.3.3. Recognition and verification

After isolating the digits, the classification is performed on input image in order to recognize

one of the existing classes of the system [82]. In order to make the whole system more

reliable, a verification strategy is performed for producing a final decision. Thus, the recognition

stage interacts with the verification stage in order to produce all possible segmentation

hypotheses.

The decision for an hypothesis of segmentation-recognition is given by the multiplication

between the decision functions done by its sub-components. The decision function of a sub-

component is given by the recognition system based on a multiclass SVM-OAA implementation

[27].

Finally, the decision function is performed by selecting the maximum value provided by each

segmentation hypothesis. When values of decision functions are equal, the maximum value of

the decision function is selected from the first sub-components. Figure 3.4 illustrates an

example of the segmentation-verification according hypotheses presented in Figure 3.2. This

verification can devote to reduce the confusion between isolated and under-segmented digits.

Figure 3.4. Illustrative example for segmenting two connected digits using the oriented sliding

window and recognition-verification strategy.

33.4. System evaluation

The evaluation of a segmentation method is very subjective. It can be done according two

approaches. The first one is to evaluate the segmentation based on a prior knowledge about

the effects of segmentation. The second approach is to evaluate the system for recognizing

handwritten digits by integrating segmentation and classification. In our case, we adopt the

second approach for evaluating performances of the segmentation on a standard database

NIST SD19.

3.4.1. Databases

The NIST SD19 database is provided by the National Institute of Standard and Technology

(NIST) [56]. The isolated handwritten digits have the advantage of being abundantly available

in the form of database labeled. Similarly of isolated handwritten digits, the database contains

connected handwritten digits, which is divided into three divisions (hsf_0123, hsf_7 and hsf_4).

The split of each division is illustrated in Table 3.2.

Table 3.2. Distribution of the NIST SD19 database for handwritten digits

The isolated digits from NIST SD19 database are used for finding the optimal parameters of the

multi-class SVM classifiers.

For an objective evaluation and comparison of our proposed system to the existing methods,

the touching image pairs used in this work have been extracted from the synthetic database of

79,466 images created by Ribas et al. (2013) [9]. This database composed of 79,966 is divided

into two sets: a validation set composed of 500 images and the remaining composed of 79,466

images used for testing. Table 3.3 and Table 3.4 show the type of touching digits proposed by

Chen and Wang [42] and the distribution of connected digits according the connection type.

Type 4 connection digits are not considered in the database [9].

Division name

#Isolated digits

#Connected digits

hsf_0123

195 000

4 044

hsf_4

58 646

1 819

hsf_7

60 089

2 369

Category

Touching type

Touching style

Example

Simple

connection

Single-point

touching

Single-segment

touching

No obvious

segmentation

point

Ligature

connection

Multiply

connected

Multiple-

touching

Table 3.3. Types of connected numeral strings

Table 3.4. Distribution of the database regarding the type of connection [9]

3.4.2. Parameter tuning of the SVM model

The recognition module is based on the multi-class SVM classifiers using the One-Against-All

implementation [28-30]. The training of the SVM requires selecting two parameters, which are

the regularization parameter (C) and the Radial Basis Function (RBF) kernel parameter ( ).

Hence, a fraction of an isolated handwritten digit dataset containing 150,000 samples is divided

into two parts: the first one containing 100,000 digit images is used for training and the second

one containing 50,000 digit images are used for validation the SVM models [26]. After many

experiments, parameters of the SVM classifiers are fixed to C=40 and = 8. Finally, all digit

images (150,000) are used for training the SVM, which allows improving the digit recognition.

Connection type

% of connected digits

34.80

53.62

01.59

09.99

3.4.3. Results

This section details the results of the experimental evaluations carried out to validate the

proposed segmentation-verification scheme for segmenting simple and multiple connections

using an oriented sliding window.

3.4.3.1. Influence of the orientation angle

The proposed algorithm requires adjusting manually the threshold T1 and the width of the

window W1. After many experiments, T1 and W1 are set to 7 and 4, respectively. In order to

study the effect of using the oriented sliding window, various orientation angles are selected in

the range

]

,...,

[

with angular step fixed to 5° to avoid an important number of

segmentation cuts. Table 3.5 reports the overall rates and the number (#) of segmentation cuts

for various angles

={0°, 5°, 10°, 15°}.

(°)

Connection type (%)

Overall rate (%)

#Segmentation cuts

90.79

89.80

89.30

70.77

88.24

3.44

95.02

93.34

98.82

75.30

92.21

10.33

96.67

93.64

99.68

77.54

93.18

17.22

96.67

93.75

99.68

77.58

93.24

24.11

Table 3.5. The recognition rate for various orientation angles

We clearly can note that the use of the oriented window influences the overall recognition rate.

Indeed, by increasing the angle from 0 to 15°, we can note that the overall rate significantly

increases whatever the connection type. The best improvement is offered for the connection

Type 3, which is due to the rotation angle that allows changing the segmentation path using

hypothesis 3 (No obvious segmentation point). This means that the position of the Upper BP

and Lower BP deduced from the contour is changed according the orientation angle in the

middle. Furthermore, in spite of the complexity for segmenting multiple connections (Type 5),

the proposed segmentation-verification system allows increasing the overall rate more than 6%

for Type 5 when the angle is increased from 0 to 15°. Table 3.6 illustrates the segmentation

effect according the selected orientation angle.

Connected digits

Digit labels

Incorrect segmentation

Correct segmentation

Angle

(°)

Segmentation

effect

Angle

(°)

Segmentation

effect

Table 3.6. Segmentation effect according the orientation angle

3.4.3.2. Complexity of the proposed segmentation-verification

The performance of the proposed segmentation-verification system requires increasing the

orientation angle in order to improve the accuracy. However, this increase inevitably leads to

the increase of the number of segmentation cuts. In order to appreciate empirically the

complexity of the proposed segmentation-verification, we deduce from Table 3.5, respectively,

figures 3.5, 3.6 and 3.7, which show the number (#) of segmentation cuts versus the

orientation angle

(°), the overall rate (%) versus the orientation angle

(°) and the overall

rate (%) versus the number (#) of segmentation cuts. It can be observed that the number of

segmentation cuts increases linearly with the orientation angle. In contrast, the overall rate (%)

does not increase in the same manner. Indeed, when the orientation angle increases from 0 to

15°, the overall rate increases of 5% while the number of segmentation cuts increases more

than 600%. Hence, a tradeoff between the accuracy and the number of segmentation cuts

should be adopted avoid to reduce the speed of the segmentation-verification.

Figure 3.5. Number (#) of segmentation cuts versus the orientation angle (°)

#Se

gme

ntation

cuts

(°)

Figure 3.6. The overall rate (%) versus the orientation angle

(°)

Figure 3.7. The overall rate (%) versus the number (#) of segmentation cuts

Table 3.7 reports some examples of the successful segmentation-recognition produced by the

proposed system according the connection type. We can note when the segmentation is well

performed then the recognition also performs well. However, our proposed system does not

provide satisfactory results in some cases as reported in table 3.8, which is caused by two main

problems:

The first problem is caused by the robustness of the classifier. Indeed, although, the

segmentation is well performed, the classifier does not allow recognizing the segmented digit

due to its particular shape. To resolve this problem, it is possible to add this particular digit

into the training set for improving the performance of SVM classifiers.

100

rall

)

(°)

100

rall

)

#Segmentation cuts

The second problem is caused by "no obvious segmentation point". In this case, the

hypothesis 3 (No IP) is often applied for segmenting connected digits. This means that the

segmentation path in the middle does not always provide acceptable results.

Furthermore, in some cases, our proposed system can provide unsuccessful segmentation

verification results as depicted in figure 3.8. As shown, there are three possible mistakes of

applying the segmentation-verification of connected digits: incorrect segmentation and

recognition, correct segmentation and incorrect recognition and incorrect segmentation and

correct recognition. Most of the mistakes are due to recognition problems (07.76%).

Connection type

Original image

Digit labels

Correct

segmentation-recognition

Table 3.7. Some examples of the successful segmentation-verification produced by the proposed

system according the connection type

Connection

type

Original image

Digit labels

Incorrect

segmentation

Assigned

classes

Table 3.8. Some examples of the unsuccessful segmentation-verification produced by the proposed

system according the connection type

(a) 47

(b)

(c)

Figure 3.8. Unsuccessful segmentation-verification of connected digits produced by the proposed

system (a) Incorrect segmentation and recognition (b) correct segmentation and incorrect recognition

3.4.3.3. Comparative analysis

To evaluate the robustness of our system when

= 15° comparatively to other existing

systems, we report in Table 3.9 the correct segmentation and its corresponding number of

segmentation cuts. Table 3.9 reports a comparative analysis provided by Ribas et al (2013) [9]

for which we add results obtained by Everton and Carlos (2013) [72], Oliveira et al. method

(2000) [66], Congedo et al. method [61] and the proposed segmentation-verification system. It

is worth noting that, the segmentation method proposed by Oliveira et al. [66] and the

segmentation method proposed by Congedo et al. [61] have been implemented using

information provided in their respective publications.

Method

Connection type (%)

Overall rate

(%)

#Segmentation

cuts

Simple

Multiple

Chen and Wang [42]

97.87

94.23

97.55

76.76

93.80

45.40

Congedo et al. [61]

62.88

67.51

59.40

40.45

63.07

1.00

Elnagar and Alhajajj [64]

63.88

71.51

56.40

58.73

67.34

1.00

Everton and Carlos [72]

71.75

71.21

63.64

56.57

65.79

1.00

Fenrich and Krishnamoorthy [67]

97.54

93.79

99.45

65.57

92.37

4.07

Fujisawa et al. [62]

95.45

91.27

83.57

63.72

89.85

3.66

Oliveira et al. [66]

90.40

90.78

89.01

64.88

88.03

1.00

Pal et al. [65]

73.96

74.69

80.09

41.52

71.21

1.00

Shi and Govindaraju [63]

68.31

59.72

60.35

25.44

59.30

1.00

Our approach

96.67

93.75

99.68

77.58

93.24

24.11

Table 3.9. Comparative analysis

The proposed method achieves an overall performance greater than 93%. The correct

segmentation-recognition for connection types 1, 2, 3, and 5 are 96.49, 93.75, 99.68 and

77.58%, respectively. Roughly speaking, the proposed method is more appropriate for

segmenting digits having connection types 3 and 5, respectively. Furthermore, we can observe

that the proposed method offers comparable overall rate as Chen and Wang method [42].

However, their method requires more segmentation cuts (more than 45) and is very slower

comparatively to the proposed method, which requires a reduced number of segmentation cuts.

The worst overall performance is achieved by the methods proposed by Shi and Govindaraju

[63], Congedo et al. [61], Elnagar and Alhajajj [64] and Everton and Carlos [72]. The method

proposed by Shi and Govindaraju [63] achieves a worst overall performance.

In contrast, the proposed method and the segmentation algorithm proposed by Chen and

Wang[42] produce almost the same overall performance. However, by examining carefully the

obtained results, we can note that the proposed method surpasses all existing methods for

connection types 3 and 5. Furthermore, the Chen and Wang method is still very slow against

the proposed method since many attempts should be evaluated for recognition.

A careful observation of obtained results depicted in Figure 3.9 shows that when the number of

the segmentation cuts increases then the overall rate also increases. The proposed method

outperforms most other state-of-the-art methods except the segmentation algorithm proposed

by Chen and Wang [42]. Indeed, they obtained an overall rate of 93.80% against 93.24% for

our proposed method. When observing Table 8, we can note that the Chen and Wang method

is suitable for connection types 1 and 2. In contrast, our method is suitable for connection types

3 and 5. It is useful to recall that the connection type 5 is the more complicated situation for

separating two connected digits since multiple connections are occurred when attempting to

separate digits.

Figure 3.9. Performance of the segmentation-recognition algorithms

59.30

63.07

65.79

67.34

71.21

88.03

89.85

92.37

93.24

93.80

3.66

4.07

24.11

45.40

100

Shi

Congedo

Everton

Elnagar

Pal

Oliveira

Fujisawa

Fenrich

Our approach

Chen

Seg

entati

on method

Number of segmentation cuts

Correct segmentation (%)

Table 3.10 reports the rank of the proposed method comparatively to other existing methods

according the type of connections. We clearly see that our method is more appropriate for

segmenting no obvious segmentation (Type 3) and multiple touching digits (Type 5). In

contrast, the method proposed by Chen and Wang [42] is more suitable for single point

connection (Type 1) and single segment connection (Type 2).

Rank

Connection type

Chen and Wang [42]

Our approach

Fenrich and

Krishnamoorthy [67]

Fenrich and

Krishnamoorthy [67]

Fenrich and

Krishnamoorthy [67]

Chen and Wang [42]

Our approach

Chen and Wang [42]

Fenrich and

Krishnamoorthy [67]

Fujisawa et al. [62]

Oliveira et al. [66]

Fujisawa et al. [62]

Pal et al. [65]

Elnagar and Alhajajj

[64]

Everton and Carlos [72]

Elnagar and Alhajajj

[64]

Everton and Carlos

[72]

Everton and Carlos

[72]

Shi and Govindaraju

[63]

Everton and Carlos

[72]

Shi and Govindaraju

[63]

Pal et al. [65]

Elnagar and Alhajajj

[64]

Congedo et al. [61]

Shi and Govindaraju

[63]

Elnagar and Alhajajj

[64]

Shi and Govindaraju

[63]

Table 3.10. Rank of our method comparatively to the existing methods according the

connection type

It should however be noted that the proposed method can deal with several types of

connection between the digits specifically both connection types: no obvious segmentation

point and multiple connections. Other hand, it achieves encouraging results related to other

state-of-the-art methods.

33.5. Summary

The objective of this chapter aimed to propose a SVM-based segmentation-verification system

for segmenting two connected handwritten digits using the oriented sliding window and SVM

classifiers in order to find the best cut for isolating two adjacent digits.

Experimental results showed that the performance of the system depends on the selection of

the appropriate orientation angle. Indeed, obtained results clearly show that the recognition

rate increases when selecting the appropriate angle. Therefore, choosing an optimal angle is a

crucial step for cutting optimally two connected digits. When comparing the proposed approach

to the state of the art, we can note that it constitutes a tradeoff between the correct

segmentation and the number of the segmentation cuts.

The next chapter devotes to present the proposed a segmentation and recognition system for

unknown-length handwritten digit strings by combining several explicit segmentation methods.

HAPTER

ANDWRITTEN DIGIT STRING RECOGNITION

The segmentation of handwritten digit strings into isolated digits remains a challenging task.

The difficulty for recognizing handwritten digit strings is related to several factors such as

sloping, overlapping, connecting and unknown-length of the digit string. Hence, this chapter

aims to propose a segmentation and recognition system for unknown-length handwritten digit

strings by combining several explicit segmentation methods depending on the configuration link

between digits. Three segmentation methods are combined based on histogram of the vertical

projection (HVP), the contour analysis and the Sliding Window Radon Transform (SWRT). A

recognition and verification module based on Support Vector Machine (SVM) classifiers allows

analyzing and deciding the rejection or acceptance each segmented digit image. Experimental

results conducted on the benchmark dataset show that the proposed system is effective for

segmenting handwritten digit strings without prior knowledge of their length comparatively to

the state-of-art.

4.1. Introduction

The Automatic Handwritten Digit String Recognition (AHDSR) is required in many applications

such as the amount of the bank checks, postal code and forms [1, 3, 61, 83, 84]. In this

context, two main problems are occurred when attempting to design a handwritten digit string

recognition system. The first problem is the unknown-length of the digit string, which is not

carefully written by people in real-life situations [85]. The second problem is the link between

adjacent digits, which can be naturally spaced, overlapped or/and connected [5].

We propose, in this chapter, a new design of a handwritten digit string recognition system

based on the explicit approach for the unknown-length digit strings. Three methods are

combined according the link of adjacent digits, which are the histogram of the vertical

projection dedicated for spaced digits, the contour analysis dedicated for overlapped digits and

the Radon transform performed on the sliding window dedicated for connected digits.

Our ultimate objective is to propose a full segmentation algorithm based on the explicit

approach for handwritten digit string recognition with the unknown-length string.

44.2. Overview of handw ritten digit string recognition

Various AHDSR systems have been developed in the past years, which can be divided into two

approaches: implicit and explicit. The implicit approach considers all traced points as potential

segmentation points [63]. In this case, the segmentation and recognition are performed

simultaneously for recognizing a digit string. Indeed, this approach does not attempt to

separate digits, but rather it incorporates the implicit segmentation into the recognition module.

For instance, Britto et al [85] proposed an AHDSR based on the Hidden Markov Models (HMM).

In their approach, the segmentation-recognition is performed into two steps. The first one takes

into account some contextual information to deliver several segmentation-recognition

hypotheses for a given preprocessed string. These hypotheses are used in a second step by

using an isolated digit classifier for verifying the accuracy of the recognition. The performance

evaluation has been conducted by considering the known-length strings and the unknown-

length strings using the length predictor.

The explicit approach is, in contrast, performed by finding the best way to isolate adjacent

digits before recognition. In this case, the segmentation and recognition are performed

separately. Also, three cases can be occurred when attempting to separate two adjacent digits:

spaced, overlapped and/or connected digits. In this case, the design of a robust handwritten

digit string recognition system can be performed into two steps: primary and secondary

segmentation. The primary segmentation is performed when the adjacent digits are naturally

spaced. In this case, digits are separated using a simple histogram of the vertical projection

[62]. This segmentation requires a correct writing of the digit image to avoid separating a digit

into multiple fragments. In contrast, the secondary segmentation is performed on the resulting

image produced from the primary segmentation in order to detect any presence of overlapped

digits and/or connected digits. Hence, various methods have been developed specifically for

overlapped and connected digits.

Segmentation methods dedicated for overlapped digits are performed when digits are slopped

or appear to imperfect alignment between the axis and the written text. Under some conditions,

it is possible to apply the methods used in cursive script text to estimate the skew value [86].

In some cases, these methods do not provide an efficient segmentation when overlapped digits

are complicated. In contrast, connected digits are the frequent situations occurred when

attempting to separate two adjacent digits. Hence, some recent segmentation methods

dedicated for connected digits have been proposed for single or multiple touching such as the

Drop-fall based algorithms [61], the thinning-based algorithms [87], the water reservoir

algorithms [65] and the removing useless stroke algorithm [42].

In this context, Oliveira et al. [59] proposed a segmentation and recognition system using some

heuristics to avoid an over or under-segmentation combined with the classifier and the verifier

based on Multi-Layer Perceptron (MLP) neural networks. All segmentation points are used to

build a segmentation graph. The best segmentation hypothesis is the shortest path of the

graph. A post-processing is used to resolve some problems where the verifiers failed and vice

versa, and the global decision module makes an accept/reject decision. The performance

evaluation of the system has been conducted on NSTRING SD19 database by considering the

unknown-length digit strings. Latter, Oliveira and Sabourin [88] proposed to replace the MLP

classifier by the Support Vector Machine (SVM) classifier in order to improve the accuracy of the

digit string recognition system. Sadri et al. [89] described a genetic framework using contextual

knowledge for segmentation and recognition of handwritten digit strings. This method has been

developed to locate feature points on the string image in order to generate possible

segmentation hypotheses. A genetic algorithm is used as a general search technique for finding

the global optimum segmentation hypotheses in digit strings. Experimental results have been

conducted on the digit strings extracted from NSTRING SD19 database using two different

classifiers (MLP and SVM) by considering the known and unknown-length digit strings.

44.3. Segmentation methods of the digit string

The choice of digit string segmentation depends of the link between adjacent digits. Hence,

four situations are occurred which are spaced, overlapped, connected and

overlapped/connected digits, respectively. Thus, this section is devoted to describe

segmentation methods used in our system for which we propose a new segmentation method

performed specifically for connected digits.

4.3.1. Segmentation of spaced digits

The segmentation of spaced digits is performed using the histogram of the vertical projection

(HVP) when digits are well written with some degree of neatness (not connected or not

overlapped). The HVP is performed on the binary digit string from which a simple count of the

black pixels in each column is running in order to detect the white space between successive

digits [62]. This allows determining the location of each component contained into the image.

The advantage of this method is its ability to segment the digit string with the unknown-length.

However, when digits are overlapped or connected, the segmentation is not possible. Indeed,

the HVP can split the digit string comprising either a single digit or more than two digits. Hence,

each isolated digit is well recognized by the classifier. In contrast, when digit components are

connected or overlapped, the classifier can't recognize the digit. In this case, the classifier plays

an important role for detecting the overlapped or/and connected digits. Figure 4.1 illustrates an

example of the correct and incorrect segmentation of the digit string when using the HVP.

Figure 4.1. Correct and incorrect segmentation using the HVP.

In some cases, segmented digits are broken in multiple parts (Figure 4.2). In this case, the

recognition is performed by the classifier without any preprocessing method.

Figure 4.2. Sample images of broken handwritten digits.

4.3.2. Segmentation of overlapped digits

The segmentation of overlapped digits is performed when adjacent digits are overlapped. It is

based on the contour analysis using the contour detection, which is extracted from the binary

image using the mathematical morphology [61, 65, 59, 88, 90]. Hence, two adjacent digits are

then separated using a distance termed , which is fixed experimentally. Figure 4.3 illustrates

the possible and impossible segmentation when analyzing the digit contour.

(a)

(b)

Figure 4.3. Segmentation by contour analysis. (a) Possible segmentation. (b) Impossible segmentation

Spaced digits

Overlapped digits Connected digits

In some cases, when using the contour detection, it appears broken digits which are necessary

to process for improving the performance of the segmentation. In this case, the broken parts of

an overlapped component are detected by examining the intersection with the median line (i.e.

half height) of each component image [91]. Figure 4.4 presents two examples of the incorrect

segmentation.

Figure 4.4 Examples of incorrect segmentation when using the contour analysis.

Generally, a digit is broken into two parts for which we associate each one a contour namely

(C1) and (C2), respectively. Figure 4.5 shows different configurations of a broken digit. Hence,

three rules are specifically used for their detections:

Rule 1: when both C1 and C2 intersect the median line, then the broken parts are considered

as two distinct overlapped single-digits. In this case, these adjacent digits are straightforward

separated using a fixed distance (Figure 4.5.a).

Rule 2: When both C1 and C2 do not intersect the median line, then the broken parts are

considered belonging to the same single-digit (Figure 4.5.b).

Rule 3: When C2 does not intersect the median line and C1 intersects the median line, then

C1 and C2 belong to the current digit (Figure 4.5.c ) otherwise C1 and C2 belong to the next

digit (Figure 4.5.d). For both cases, the belonging of C2 to the current digit or to the next digit

is decided by the classifier.

(a)

(b)

(c)

(d)

Figure 4.5. Processing of broken parts (a) two distinct overlapped single-digits (Rule1) (b) broken

single-digit (Rule2) (c and d) broken digit (Rule 3)

4.3.3. Segmentation of connected digits

A great effort has been conducted for developing various techniques to detection the connected

digits using information on the contour, skeleton and profile of the digit string. For instance,

Elnagar and Alhajj [64] proposed to split touching digits using thinning processes. The pattern

is thinned to obtain a uniform stroke width and then the feature points are extracted from the

skeleton and contour as end points, joint points and crossing points. Latter, Suwa [3] proposed

an improvement of the method proposed by Elnagar and Alhajj [64] by adopting a thinning-

based segmentation strategy for a couple of handwritten connected digit. Kyung et al. [68]

proposed using structural features of contours to generate multiple hypotheses. These

hypotheses are based on the analysis of the ligature and the characteristics of candidate break

points. Lei et al. [69] proposed an analysis of external and internal contours as well as

projection analysis. These features are used to define the segmentation lines. Ma et al. [70]

described a touching pattern-oriented strategy using Drop-fall algorithms [61] according to four

possible cutting paths (Descend-left, Descend-right, Ascend-left and Ascend-right). Wang et al.

[71] proposed to generate all possible candidate segmentations of an input string through

contour and profile analysis. Recently, Gattal and Chibani [78] proposed a method for finding

the Base Points (BPs) and Interconnection Points (IPs) on the contour and the skeleton of the

connected digits according to the connection configuration. After that, a crossing oriented

window is set around IP for finding correctly the cutting path.

In this work, the algorithm is slightly adapted such that it can segment connected digits with

unknown lengths. Also, the Radon transform is used in order to perform on the sliding window

for selecting the optimal orientation angle and consequently reducing the number of

segmentation cuts. The proposed method uses conjointly the contour and the skeleton of the

Height

Median line

digit string for detecting point connections, which can be performed according to the following

steps:

Step 1: Apply a contour detection on the connected digit string in order to detect all possible

BPs from the local extrema (minima and maxima).

Step 2: Perform the skeleton algorithm in order to detect all possible IPs.

Step 3: Find IPs by analyzing the skeleton between connected digits. If IP is detected, a

sliding window having the same height as the digit image and a fixed width (

) is set on

IP in the middle of the width (

). Then, the Radon transform is performed on the sliding

window around IP for finding the orientation angle (

)

in order to get the best cutting. This

window is called the Sliding Window Radon transform (SWRT). The optimal orientation angle

(

) and the width (

) match a right angle inter-digit into the most cases. Also, it

allows reducing the number of segmentation cuts.

Step 4: For each SWRT detected in the digit string, a list of segmentation hypotheses is

provided in order to find the best cutting path. Therefore, three hypotheses can be

considered: if a single IP is found, then the cut is making between single IP with upper BP

and lower BP. When two IPs (Upper IP and Lower IP) are found, then the cut is making

between BPs and IPs. In order to avoid under-segmentation, the cut is making between

closest points (Upper BP and Lower BP) in the middle. More details are reported in [78].

Step 5: In some cases, digits may be connected in such a way there is no IP on the skeleton

leading to failure of positioning the SWRT. In this case, to avoid under-segmentation due to

lack of IPs, the cut is made between closest upper and lower BPs in the middle of two

connected digits.

The evaluation of each hypothesis is performed using the digit recognition-verification module,

which is presented in section 4.3. Furthermore, the number of the segmented components

depends on the number of detected IPs. Hence, two important parameters influence the quality

of the segmentation, which are the adjustment of the width (

) and finding the orientation

angle (

) using SWRT.

4.3.3.1. Adjustment of the width

(

)

Adjusting the appropriate width (

) of the SWRT directly influences the performance of

the segmentation. For instance, a large W

generates small number of cutting paths leading

to an over-segmentation. In contrast, a small

generates many cutting paths leading to

an under-segmentation. Consequently, a trade-off between small and large width should be

defined in order to lose or ignore some important cutting paths. Figure 4.6 illustrates the

influence of adjusting the width of the sliding window.

In our case, the width

is set as a fraction of its height i.e.

where

is the height of the connected digits and

is a parameter defined experimentally.

When the height of connected digits is changed, the width

is also changed in order to

keep the aspect ratio and allows avoiding the problem of variations in the shapes and styles.

Figure 4.6. Influence of adjusting the width on the sliding window.

4.3.3.2. Finding the angular cut via the Radon transform

The skew or orientation angle of the connected digits is estimated using the Radon transform of

the image [47-48]. It is computed by projecting the image along specified directions, which

allows generating a set of line integrals or slices from multiple sources. More precisely, the

Radon transform takes multiple and parallel projections of the image from different angles by

rotating the source around the center of the image. Formally, the Radon transform is performed

on a binary image, which is defined as [48]:

g(r, ) =

I(x, y). (x cos + y sin

r) dxdy

(4.1)

where

is the Dirac function,

] 0, 179°] and r ]

, + ]. In other words, g(r, ) is the

integral of I(x, y) over the line defined by = x cos + y sin .

Figure 4.7 shows some illustrative examples of the Radon transform computed on the binary

connected digit images. The high brightness indicates the maximum amplitude of the Radon

transform. Therefore, the orientation angle, which defines the angular cut (

), is deduced by

finding the maximum value of the Radon transform g(r, ). Figure 4.8 shows the Radon slices

for two selected projection angles. The angular cut (

) is selected according to the amplitude

of the Radon slice. The slope of the SWRT fixed around IP allows getting a better cutting path.

Figure 4.9 shows the impact of selecting the orientation angle around IPs for segmentation with

and without using the SWRT.

Upper BP

IP Lower BP

SWRT

Upper BP

Lower BP

SWRT

Figure 4.7. Radon transform for three examples of connected digits. The maximum value of the

Radon transform corresponds to the orientation angle

for cutting two connected digits.

Figure 4.8. Two selected projections of the Radon Transform showing

= 51°

100

-60 -50 -40 -30 -20 -10 0 10 20 30 40 50 60

)

51°

140°

(a)

(b)

Figure 4.9. Impact of selecting the orientation angle for segmentation

(a) Segmentation with SWRT (b) Segmentation without SWRT

44.4. Design of the digit string recognition system

The proposed digit string recognition system allows examining each type of the segmentation

problem (spaced, overlapped or connected digits) in order to select the appropriate

segmentation method. As shown in Figure 4.10, its design requires three main modules, which

are the primary segmentation, secondary segmentation and digit recognition-verification. The

digit string image is primarily segmented using HVP, which produces multiple Segmented

Components (SC) each one is verified according to its size using the Segmented Component

Analysis (SCA). If the size is greater than a fixed threshold, SC is considered as a digit (isolated

digit). Therefore, it is submitted to Digit Recognition-Verification (DRV) for accepting or

rejecting the digit. In contrast, if SC is rejected by DRV, the secondary segmentation is

performed to separate the overlapped and/or connected digits. In this case, the contour

Binary image

Inverted binary image

Orientation angle

detection

Upper BP

Orientation of the

window to

Lower BP

directions

Upper BP

Lower BP

Cutting of connected

digits

detection is performed on SC considered as a non digit, which is analyzed and verified using

successively Contour Analysis (CA), SCA and DRV, respectively. Otherwise, SWRT technique and

Grouped Component Analysis (GCA) are used conjointly for segmenting SC. GCA uses DRV for

evaluating each Grouped Component (GC) according to the configuration link in order to

provide optimal decisions in difficult situations. In the following, we first describe the design of

DRV then the steps for segmenting the digit string image.

Figure 4.10. Full segmentation system for handwritten digit string recognition.

4.4.1. Digit recognition-verification

The Digit Recognition-Verification module is composed of two sub-modules, which are the Digit

Recognition based on feature generation module and the classification module, and the Digit

Verification. Their design directly affects the robustness of the proposed handwritten digit string

recognition. In the following, we briefly describe feature generation method, classifiers and the

technique of verifying the digit.

4.4.1.1. Feature generation

Various feature generation methods have been proposed in recent years. In the proposed

system, we use a combination of multiple features for improving the overall recognition rates by

minimizing the intra-class variability and maximizing inter-class variability. These features used

in the work [57] include some global statistics, moments, profile and projection based features

and features computed from the contour and skeleton of the digit. Some of these features are

extracted from the whole image of the digit while others are extracted from different regions (4

regions) of the image by first applying a uniform grid sampling to the image. Hence, we notice

Primary segmentation

Digit recognition-verification

Secondary segmentation

Handwritten digit

string image

HVP

Segmented

Component Analysis

Contour

Detection

Contour

Analysis

SWRT

Grouped Component

Analysis

Digit

Recognition

Digit

Verification

Digit

Non Digit

Reject

that the uniform grid sampling and used features are well suited for discriminating isolated

digits from under-segmented ones.

4.4.1.2. Design of the SVM classifiers on isolated digits

The classification module is based on the well known Binary Support Vector Machine (B-SVM)

classifiers implemented through the One-Against-All (OAA) architecture. Ten (10) B-SVMs are

then used for recognizing an isolated digit from the set = { , . . ,

}

such that

is the class

label of a digit j = 0, . . ,9. Each B-SVM classifier is trained using the radial basis function kernel

for separating a digit class from other classes.

More precisely, each digit is associated to a SVM classifier providing a response termed

f (x ), j = 0, . .9 such that x

is the feature vector of the Segmented Component (SC).

Therefore, SC is assigned to one the classes according to the following decision rule:

(x ) = max f (x ); j = 0, . . ,9

(4.2)

(x ) is the maximal value selected from 10 responses provided by SVM classifiers.

4.4.1.3. Digit verification

The maximal value selected from 10 SVM responses does not ensure that SC is a true isolated

digit. Indeed, in some cases, SC submitted to the recognition can be a connected or overlapped

digit. In order to ensure a better recognition of the digit, we propose to add a threshold termed

t as a verifier using the following decision rule: x

is accepted as a digit if f

(x )

otherwise it is rejected. Conversely, when x

is rejected (i.e. f

(x ) < ), SC is then

considered as a non digit (overlapped and/or connected digit).

The threshold t is deduced during training of the SVMs by taking the minimum of the SVM

maximum outputs from all isolated digits, let:

t = min {max f (x ); j = 0, . .9 ; i = 1, . . N} (4.3)

x is the feature vector of the isolated digit and N is the number of isolated digits used for

training the 10 SVM classifiers.

4.4.2. Spaced digit recognition-verification

The spaced digit recognition-verification is performed using successively the HVP, SCA and DRV.

The HVP allows producing multiple SCs each one is analyzed by SCA in order to decide if it is a

digit or non digit. In our case, SC is considered as an isolated digit by exploiting the ratio

between its height and width using the following decision rule:

(4.4)

Such that R

= Height/Width and t

is a fixed decision threshold. Consequently, SC is

considered as an isolated digit if R

otherwise it is considered as an overlapped or

connected digit. When SC is detected as an isolated digit, it is submitted to the DRV for its

recognition otherwise it is submitted to the secondary segmentation module. The ratio of SC is

considered as additional information that helps the classifier to avoid the under-segmentation

problem.

In some cases, connected digits are verified as isolated digits by SCA since their ratio is greater

than a threshold (i.e. R

) as shown for instance in Figure 4.11. To avoid

misrecognition, SC is accepted definitively using the decision rule:

Accept if R

and f

(x )

Reject otherwise

(4.5)

When SC is rejected, it is submitted to the secondary segmentation module.

Figure 4.11. Some samples of connected digits considered by SCA as isolated digits.

4.4.3. Overlapped digit recognition-verification

The overlapped digit recognition-verification is performed when the ratio R

of SC is less than

a threshold (t

). In this case, SC is considered as non digit and can contain more than one

overlapped digits. Therefore, a contour detection is performed on SC for their separation using

a fixed threshold (t ) and some specific rules (see 4.3.2). The resulting sub-components are

verified by SCA for deciding if each one is a digit or non digit. When it is detected as a digit,

DRV is used for accepting or rejecting it. In the case of rejecting, the segmentation method

based on SWRT and GCA are performed to separate two or more connected digits according to

the connection type.

4.4.4. Connected digit recognition-verification

The connected digit recognition-verification uses conjointly SWRT method, GCA and DRV. When

using SWRT, the connected digits are split into a sequence of segments each one is considered

as a segmentation hypothesis, which is expected to contain a digit or a fragment of a digit. In

this case, all segmentation hypotheses can be represented through a segmentation graph

[08,42]. More specifically, the recognition-verification of connected digits can be performed into

two steps:

Generate all possible segmentation hypotheses on the connected digit string using SWRT.

Each hypothesis defines an isolated digit or a fragment of a digit, which is verified by DRV.

Perform GCA in order to enable the duplication of segments.

Hence, GCA allows grouping the sequence of segments for generating Grouped Components

(GC), which are submitted to DRV for accepting or rejecting. If GC is accepted then it is

considered as a digit otherwise it is considered as a non digit. The grouping of segments using

conjointly GCA and DRV is performed according to the following heuristic rules:

GC is submitted to DRV only if GC intersects the median line.

If GC intersects the median line and not accepted by DRV then it is grouped with the previous

or next segment and tested with DRV using the following decision rule:

Accept if GC intersects median line and f

(x )

Reject otherwise

(4.6)

The reject of GC is considered as an unlikely segmented component. In this case, it is

removed from the list of hypotheses.

When GC does not intersect the median line then it is grouped with the previous or next

segment component and tested with DRV, and so on.

Figure 4.12 shows an example of using the heuristic rules for grouping the segments while

Figure 4.13 depicts an illustrative example of segmenting a digit string image.

Figure 4.12. Connected digit recognition-verification (a) Original connected digits (b) Scan IPs (c)

Fixing sliding window Radon transform around IP (d) Segmentation paths (e) Final decision.

Upper BP

(a) (b) (c)

Lower BP

(d)

(e)

Figure 4.13. Segmentation example of our proposed digit string recognition system.

44.5. Experimental results

In order to show the effect of using multiple segmentation methods on unknown-length of the digit

string, the well known NIST SD19 database [56] is used containing isolated digits and digit strings.

Experimental results are conducted for evaluating the primary and secondary segmentations as well

as the overall segmentation. Since the segmentation of the connected digits affects considerably the

performance of the digit string recognition, a careful evaluation is conducted to study the influence of

the Sliding Window Radon Transform parameters. All best obtained results are then after compared to

the state-of-art.

4.5.1. Databases and evaluation criteria

For a fairly evaluation and comparison with the existing similar algorithms, we use the same

database NIST SD19 as used in Refs. [59, 85, 88]. Hence, two digit sample sets are extracted

from NIST SD19. The first one containing 90,000 handwritten isolated digits is used for building

DRV since it has more variations in their shapes and styles. While, the second one containing

12802 digit string images is extracted from the hsf_7 series NIST NSTRING SD19 database,

which is distributed into six categories according to their lengths: 2-digit (2370), 3-digit (2385),

4-digit (2345), 5-digit (2316), 6-digit (2169), and 10-digit (1217) strings, respectively. These

data exhibit different difficulties such as sloping, overlapping and connecting as well as different

Accept Reject

Reject

Accept

1 2 3 4 5

6 7 8

Reject Accept

Digit string

Segmentation and

recognition-verification of

spaced digits.

Segmentation and

recognition-verification of

overlapped digits

Segmentation and

recognition-verification of

connected digits

Segmented and recognized

digit string

string lengths. Table 4.1 shows the distribution of the database according to the various

difficulties.

String length

#Strings

#SD

#C-OD

SD (%)

C-OD (%)

2-digit

2370

1656

714

69.87

30.13

3-digit

2385

1303

1082

54.63

45.37

4-digit

2345

1219

1126

51.98

48.02

5-digit

2316

1125

1191

48.58

51.42

6-digit

2169

1202

967

55.42

44.58

10-digit

1217

679

538

55.79

44.21

Total

12802

7184

5618

56.12

43.88

Table 4.1. Number of digit string samples (#Strings) distributed according to the Numbers of

Spaced Digits (#SD) and Connected and/or Overlapped Digits (#C-OD) expressed also in %.

The evaluation criterion used for conducting experiments and comparison is based on the

recognition rate (Rec. Rate) performed on the six digit string lengths: 2-digit, 3-digit, 4-digit, 5-

digit, 6-digit and 10-digit strings, respectively.

All recognized segmented components provided by the proposed system are compared to the

ground truth. If the segmented component is assigned to the corresponding ground truth then

the segmentation is considered as successful.

The evaluation of the proposed system is conducted by providing in each time the recognition

rate obtained for primary and secondary segmentations as well as the overall rate for each

string length.

4.5.2 Experimental setup

The performance of the proposed digit string recognition system depends on the robust design

of DRV on isolated digits, the appropriate selection of the decision threshold (t ) used for

accepting or rejecting the digit, the threshold t

used for deciding if SC is a digit or non digit,

the distance (t ) between two overlapped digits, used for adjusting the width of the sliding

window (W

) and the adjustment of the Radon transform parameters (range of the

projection angle and the angular step). t and t

are deduced from isolated digits whereas t

and

are fixed from the digit string images. Therefore, all these parameters are fixed as

follows:

t = 0.2: t is selected by taking the minimum value of the maximum SVM outputs among

recognized 90,000 isolated digits.

= 2: t

is deduced from 90,000 isolated digits by taking the maximal ratio of R

t = 5 pixels: t is fixed by examining the overlapped digits contained into NSTRING SD19.

= 5: W

is selected as the height fifth of connected digits by examining all connected

digits contained into NSTRING SD19.

4.5.3. Design of SVM classifiers on isolated digits

The performance of a digit string recognition system depends in particular on the design of a

robust SVM classifier on isolated digits. In our case, the recognition module is based on the

SVM multi-class approach using the One-Against-All implementation [30]. Two important

parameters are required for training the SVM classifiers, which are the regularization parameter

trained and their parameters are adjusted by using the NIST SD19 handwritten isolated digit

database.

The efficiency of the SVM classifiers also depends on the number of samples used for building a

robust DRV. Thus, 90,000 handwritten digits are divided into sub-sets, which are used for

training (60,000 images) and validating (30,000 images) the SVM classifier parameters,

respectively. When parameters are found, the SVM classifiers are retrained by considering all

handwritten isolated digit images (90,000 images). More precisely, training and retraining of the

SVM classifiers are performed according to the two following steps:

Training: During this step, C and

are varied in the interval [1, 100] and [1, 50],

respectively. The optimal couple is selected when the validation rate is maximal. In our case,

the optimal C and parameters are set to 18 and 11, respectively.

Retraining: After training, SVM classifiers are retrained using all samples (training and

validation, i.e 90,000 samples) by keeping the same parameters (C=18 and = 11).

Table 4.2 reports the performance of the proposed segmentation and recognition system when

performing training and retraining SVM classifiers. The recognition rates (%) are reported for

the primary and secondary segmentations as well as the overall recognition for each string

length. The projection angle of the Radon transform is ranged in [1, 179] while the value of

the angular step is set to 10°.

Rec. Rate when Training SVMs

(%)

Rec. Rate when Retraining SVMs

(%)

String length

#Strings Primary

Secondary

Overall

Primary

Secondary

Overall

2-digit

2370

99.73

79.55

93.65

99.88

80.46

94.03

3-digit

2385

99.49

85.67

93.22

99.87

86.94

94.00

4-digit

2345

98.69

86.92

93.04

99.20

88.45

94.04

5-digit

2316

97.69

88.38

92.90

98.31

89.76

93.91

6-digit

2169

98.13

87.81

93.53

98.68

90.06

94.84

10-digit

1217

97.81

86.36

92.74

98.38

88.64

94.08

Overall

98.59

85.78

93.18

99.05

87.38

94.15

Table 4.2. Recognition rate (%) obtained when training and retraining SVM classifiers.

We clearly can notice that the retraining the SVM classifiers improves the best overall

recognition rate against the training of the SVM classifiers. Indeed, the overall recognition rate

when retraining the SVM classifiers significantly increases whatever the string length. This

improvement is due to the robustness of SVM classifiers for building an efficient DRV.

Furthermore, we can observe that the best improvement is specifically offered for the

secondary segmentation compared to the primary segmentation. This means that the

recognition of spaced digits is more simple than the recognition of overlapped or/and connected

digits.

4.5.4. Evaluation of the digit segmentation

The performance of the segmentation is strongly influenced by the presence of connected digits

into the digit string. In our system, the segmentation of connected digits is performed using the

Radon transform and its appropriate parameters for finding the best cutting path. The two

parameters are respectively the range of the projection angle ( ) and the value of the angular

step. In the following, these two parameters are studied for evaluating their impact of the digit

string recognition.

4.5.4.1. Influence of the range of the projection angle

Selecting the range of the projection angle plays an important role for finding the best cutting

path. Figure 4.14 depicts an illustrative example showing the influence of selecting the range of

the projection angle when segmenting two connected digits using SWRT. As we can see, when

the range of the projection angle is reduced, the cut is well performed.

The recognition rates for the primary and secondary segmentations are reported in Figure 4.15

according to the range of the projection angle when the value of the angular step is fixed to

10°. Table 4.3 reports the average rates for all digit string lengths derived from different ranges

of the projection angle. The last column reports the overall recognition rate for all string lengths

provided by the proposed system with respect to the range of the projection angle (°) while the

value of angular step is set to 10°.

Figure 4.14. Impact of selecting the range of the projection angle for detecting the cutting path from

[1, 179] to [80, 100].

(a)

100

Rec.

(%)

Index of the range of the projection angle

Primary segmentation

Secondary segmentation

[1, 179]

[10, 170]

[20, 160]

[30, 150]

[40, 140]

[50, 130]

[60, 120]

[70, 110]

[80, 100]

(b)

(c)

(d)

100

Rec.

(%)

Index of the range of the projection angle

Primary segmentation

Secondary segmentation

100

Rec.

(%)

Index of the range of the projection angle

Primary segmentation

Secondary segmentation

100

Rec.

(%)

Index of the range of the projection angle

Primary segmentation

Secondary segmentation

(e)

(f)

Range

[1, 179]

[10, 170]

[20, 160]

[30, 150]

[40, 140]

[50, 130]

[60, 120]

[70, 110]

[80, 100]

Index

Figure 4.15. Influence of selecting the range of the projection angle (°) for different digit string

lengths: (a) 2-Digit string length (b) 3-Digit string length (c) 4-Digit string length (d) 5-Digit string

length (e) 6-Digit string length (f) 10-Digit string length.

100

Rec.

(%)

Index of the range of the projection angle

Primary segmentation

Secondary segmentation

100

Rec.

(%)

Index of the range of the projection angle

Primary segmentation

Secondary segmentation

Rec. Rate (%)

Range of the

projection angle (°)

Primary

Secondary Overall

[1,179]

99.05

86.98

93.99

[10,170]

99.05

87.74

94.29

[20,160]

99.05

86.80

93.92

[30,150]

99.05

87.38

94.15

[40,140]

99.05

87.74

94.29

[50,130]

99.05

86.98

93.99

[60,120]

99.05

88.79

94.71

[70,110]

99.05

90.90

95.56

[80,100]

99.05

94.18

96.87

Table 4.3. Average rates obtained for all digit string lengths derived from different ranges of the

projection angle (°)

As we can see, the selection of the projection angle affects the performance of the system

more specifically for the secondary segmentation whatever the length of the digit string.

Moreover, the recognition rates for each digit string length and the average rates for all digit

string lengths are improved when the range of the projection angle is reduced.

The best performance is obtained when the range of the projection angle is fixed in the interval

[80, 100]. For the primary segmentation, the selected range does not affect the performance of

the system since SWRT is performed only on the secondary segmentation. In contrast, this

selected range allows improving the recognition rate till 94.18% for the secondary

segmentation providing an overall recognition rate of 96.87%. The good performance obtained

for this range of angles is fairly predictable since the writing generated by a writer is generally

inclined around at an angle about 90°. Hence, this range of angles for SWRT can be considered

as an optimum for all digit strings.

4.5.4.2. Adjustment of the angular step

The adjustment of the angular step for SWRT is also considered as a crucial parameter for

achieving a robust system. Its value is a trade-off between the computation time and the

recognition rate. In order to study its influence, the proposed system is evaluated for different

values of the angular step ranged from 1° to 10°. The range of the projection angle is fixed in

[80,100] according to the best performance obtained previously. Figure. 4.16 depicts the

recognition rate versus the angular step for different digit string lengths. Table 4.4 reports the

average rates for each angular step computed for all digit string lengths according to the

primary and secondary segmentations, respectively. The last column reports the overall

recognition rate for all string lengths provided by the proposed system with respect to the

angular step.

(a)

(b)

(c)

100

Rec.

(%)

Angular step (°)

Primary segmentation

Secondary segmentation

100

Rec.

(%)

Angular step (°)

Primary segmentation

Secondary segmentation

100

Rec.

(%)

Angular step (°)

Primary segmentation

Secondary segmentation

(d)

(e)

(f)

Figure 4.16. Influence of adjusting the angular step (°) for different digit string lengths: (a) 2-Digit

string length (b) 3-Digit string length (c) 4-Digit string length (d) 5-Digit string length (e) 6-Digit string

length (f) 10-Digit string length.

100

Rec.

(%)

Angular step (°)

Primary segmentation

Secondary segmentation

100

Rec.

(%)

Angular step (°)

Primary segmentation

Secondary segmentation

100

Rec.

(%)

Angular step (°)

Primary segmentation

Secondary segmentation

Rec.Rate (%)

Angular step (°)

Primary

Secondary

Overall

99.05

93.12

96.45

99.05

94.29

96.91

99.05

94.29

96.91

99.05

94.24

96.89

99.05

94.18

96.87

99.05

93.83

96.73

99.05

94.24

96.89

99.05

94.24

96.89

99.05

94.24

96.89

99.05

94.18

96.87

Table 4.4. Recognition rates according to the angular step for each string length.

As we can see, adjusting the angular step from 1 to 10 (°) does not affect considerably the

performance of the proposed system whatever the length of the digit string (Figure 4.16).

However, the best improvement is obtained when the angular step is fixed to 2 or 3 providing

an average recognition rate 94.29% for the secondary segmentation and an overall recognition

rate of 96.91% (Table 4.4).

In order to show the impact of selecting the appropriate angular step, Figure 4.17 illustrates the

correct and incorrect segmentation when the angular step is fixed to 3 and 6°, respectively.

When the angular step is correctly adjusted (Figure 4.17a), the Upper BP and Lower BP are well

detected into the SWRT and therefore the separation of the two adjacent digits is well

performed. In contrast, when the angular step is incorrectly adjusted (Figure 4.17b), the Upper

BP and Lower BP are not well detected, which leads to an incorrect segmentation since the

SWRT is fixed around IP on the wrong orientation angle.

(a) (b)

Figure 4.17. Impact of selecting the angular step for detecting the cutting path: (a) Correct

segmentation when the angular step is fixed to 3°. (b) Incorrect segmentation when the angular step

is fixed to 6°.

Upper BP

Lower BP

SWRT

4.5.5. Comparative analysis

In order to compare our system with others systems, we also used the same string lengths

images database (NIST NSTRING SD19) for our experiments. This database is used in Refs.

[85, 59,88-89], so we recall these systems which will compare with our system in terms of

correct segmentation and recognition. Table 4.5 shows that the results of our method compare

to other methods in the literature.

String length

#Strings Oliveira et al.

method [59]

Britto et al.

[85]

Oliveira and

Sabourin [88]

Sadri et al.

[89]

Our

approach

2-Digit

2370

96.88

94.80

97.67

95.05

99.01

3-Digit

2385

95.38

91.60

96.26

91.43

97.30

4-Digit

2345

93.38

91.30

94.28

91.07

96.56

5-Digit

2316

92.40

88.30

94.00

88.05

95.95

6-Digit

2169

93.12

89.10

93.80

88.69

96.65

10-Digit

1217

90.24

86.90

91.38

86.13

96.01

Overall

93.57

90.33

94.57

90.07

96.91

Table 4.5. Comparative analysis of various segmentation systems for the unknown-length string

performed on NIST NSTRING SD19

In this table, we compared the results of our proposed system by considering unknown-length

digit strings with the best results of similar systems. Moreover, all parameters are fixed. The

proposed system achieved overall average rates of 96.91% for all string length. It attempted to

find the best segmentation on digit strings database. The correct recognition rate for string

lengths of 2,3,4,5,6 and 10 was 99.01%, 97.30%, 96.56%, 95.95%, 96.65%, and 96.01%

respectively. Relative results clearly show how our proposed system outperforms the state-of-

the-art results. The benefit of this technique is that it can properly segments the spaced,

sloped, overlapped and connected digits in digit string as well as unknown-length digit string.

However, apply successively CA, SCA and DRV for analyzing and verifying the SC allow

segmenting the digit string without any information about the length of the string.

The DRV on isolated digits is designed in order to recognize single digits one at a time, so they

do not use contextual knowledge in their decisions such as the comparison of connected or

overlapped components, or comparing the relative sizes (length digit strings) or the relative

positions of the various components.

Some examples of correctly and incorrectly recognized strings of our system on handwritten

digit strings from NSTRING SD19 indicate three possible results as output: correct segmentation

and correct recognition, wrong segmentation and/or wrong recognition. Table.4.6 presents

some examples of correctly and incorrectly recognized strings.

Digit string

Class labels

Segmented

digits

Assigned

classes

Correct

segmentation and

recognition

03283

75434

45537

5601

Incorrect

segmentation

9898

90898

84097

84297

Incorrect recognition

13211

13210

72937

12931

Table 4.6. Examples of the correct and incorrect segmentation-recognition produced by the proposed

system from the NSTRING SD19 Database.

4.5.6. Computation cost of the proposed system

The proposed digit string recognition system allows resolving different segmentation problems

in order to select the most appropriate one. Since this system can be deployed in real

environment, it is interesting to compute the cost required for treating all situations. Hence, the

computation cost is measured by considering three situations:

Spaced digit: the total Number of Segmentation Hypotheses using HVP (N

) is the same as

the number of the spaced digits.

Overlapped digit: Let N

be the Number of Grouped Broken Component provided by CA when

the broken component exist and N

the total Number of Isolated Digit provided by CA, then,

the total Number of Segmentation Hypotheses from CA (N ) is N

= N

+ N

Connected digit: Let N

be the Number of GC and N

the total Number of SC, then the total

Number of Segmentation Hypotheses from SWRT (N

) is calculated as N

= N

+ N .

Finally, the Total Number of Segmentation Hypotheses (N ) is the sum of all numbers of

segmentation hypotheses taking the following form:

= N (N

+ N

) (4.7)

Where N

is the number of digits contained into a string. By substituting N

, N

and N

Eq. 4.7 becomes:

= N . (N

+ N

+ N ) (4.8)

Eq. 4.8 clearly indicates that the computation cost depends of the digit string length and the

complexity of the connected digits for finding the best cutting path.

44.6. Summary

The goal of this paper was to present a full explicit segmentation system for handwritten digit

strings based on a combination of several segmentation methods depending on the

configuration link between digits. Hence, three segmentation methods have been combined,

which are HVP, CA and SWRT used conjointly with SCA and DRV for verification and

recognition.

The benefit of the proposed system is its ability to properly segment the spaced, sloped,

overlapped and connected digits without any information about the length of the string when

performing successively CA, SCA and DRV for analyzing and verifying the SC.

The obtained performance of the proposed segmentation-recognition system of digit strings

depends on three different factors: the accuracy of SVM-OAA and robustness of the DRV, the

used segmentation methods and, finally, the combination strategy with some heuristic rules.

In some cases, the unknown-length of the digit strings makes the task for classifiers more

difficult to recognize many outliers leading to treat, for example, digit strings as valid digits

[89]. However, by combining HVP, CA and SWRT, the proposed system allows reducing the

confusion of the segmentation-recognition.

ONCLUSION

The main focus of this thesis was the development of a Automatic Handwritten Digit String

Recognition (AHDSR), where the main efforts were focused towards the problems of

segmenting adjacent digits inherent to the variability of the digits, broken digits, overlapping

digits, connected digits, unknown length of the string, under-segmentation and over-

segmentation. The improvement of the performance of such a system is not a trivial problem

at all.

We have proposed a SVM-based segmentation-verification system for segmenting two

connected handwritten digits using the oriented sliding window and SVM classifiers in order to

find the best cut for isolating two adjacent digits. Hence, the design of the proposed

segmentation-verification system can be conducted into four main steps: segmentation of

connected digits by searching the interconnection points using the oriented sliding window,

generation of features on isolated digits, recognition of the segmented digits using the SVM-

OAA implementation and verification for accepting or rejecting the digit recognition. More

precisely, the segmentation of connected digits uses two complementary structural features

(contour and skeletal points) for detecting the interconnection and base points and finding the

optimal path. Experimental results showed that the performance of the system depends on the

selection of the appropriate orientation angle. Indeed, obtained results clearly show that the

recognition rate increases when selecting the appropriate angle. Therefore, choosing an optimal

angle is a crucial step for cutting optimally two connected digits. When comparing the proposed

approach to the state of the art, it can note that it constitutes a tradeoff between the correct

segmentation and the number of the segmentation cuts. However, the proposed system does

not provide satisfactory results in some cases caused by two main problems:

The first problem is caused by the robustness of the classifier. Indeed, although, the

segmentation is well performed, the classifier does not allow recognizing the segmented

digit due to its particular shape. To resolve this problem, it is possible to add this particular

digit into the training set for improving the performance of SVM classifier.

The second problem is caused by "no obvious segmentation point". In this case, the

assumption 3 (No IP) is often applied for segmenting connected digits. This means that the

segmentation path in the middle does not always provide acceptable results.

In order to extend the previous system to recognize the digit string for the unknown-length, a

new design is proposed on the combination of several explicit segmentation methods according

on the configuration link between digits. Three segmentation methods are combined based on

the histogram vertical projection, contour analysis and the Sliding Window Radon Transform

(SWRT) for recognizing unknown-length handwritten digit strings with the help of verification

strategy. The main idea of the proposed segmentation algorithm is to introduce a super cutting

path in order to avoid the over-segmentation and the under-segmentation. The verification

strategy uses the recognition module for evaluating each segmented component according to

the configuration link in order to provide optimal decisions in difficult situations.

Generally, the first results obtained are encouraging since the proposed system allows

managing all the possible ways of segmentation. This combination uses few rules and has the

advantage of providing a correct segmentation without using the contextual knowledge.

Future Works

Although this works proposes various techniques for improving the performance of the

segmentation, some segmentation problems remain difficult to resolve. Hence, some future

works can be outlined such as:

Include the investigation of other features as well as a combination of different classifiers to

further improve the recognition rates. A feature selection mechanism to identify the most

appropriate subset of features for this problem would also be interesting to explore.

Reduce the number of segmentation hypotheses in order to reduce the complexity of the

system and therefore becomes more faster.

Explore the way to use a filter for eliminating the unnecessary segmentation hypothesis in

order to reduce the computation time.

EFERENCES

[1]. G. Dimauro, S. Impedovo, G. Pirlo and A. Salzo, Automatic Bankcheck processing. A New Engineered

System, International Journal of Pattern Recognition and Artificial Intelligence 11(4) (1997) 467-504.

[2]. M. Cheriet, Y. Al-Ohali, N. E. Ayat and C. Y. Suen , Arabic Cheque Processing System: Issues and

Future Trend. Advances in Pattern Recognition, ed. B.B. Chaudhuri (Springer Verlag, 2007), pp. 213

232.

[3]. M. Suwa, Segmentation of connected handwritten numerals by graph representation, in Proc. Eighth

International Conference on Document Analysis and Recognition (ICDAR'05), Seoul, 2005, vol. 2, pp.

750754.

[4]. J. R. Ward and T. Kuklinski, A model for variability effects in handprinted with implication for the

design of handwritten character recognition system, IEEE Trans. Man Cybernetics 18(1988), pp.438

451.

[5]. M. Shridhar and A. Badreldin, Recognition of Isolated and Simply Connected Handwritten Numerals,

Pattern Recognition 19 (1) (1986) 112.

[6]. M. Shridhar and A.Badreldin, Context-directed segmentation algorithm for handwritten numeral

strings, Image and Vision Computing 5 (1) (1987) 39.

[7]. B. K. Jang and R. T. Chin, One-pass parallel thinning: Analysis, properties, and quantitative

evaluation, IEEE Transactions on Pattern Analysis and Machine Intelligence 14 (11) (1992) 1129

1140.

[8]. E. Vellasques, L. S. Oliveira, Jr. A. S. Britto, A. L. Koerich and R. Sabourin, Filtering segmentation

cuts for digit string recognition, Pattern Recognition 41(10) (2008) 30443053.

[9]. F. C. Ribas, L. S. Oliveira, Jr. A. S. Britto and R. Sabourin, Handwritten digit segmentation: a

comparative study, International Journal on Document Analysis and Recognition 16 (2) (2012) 127

137.

[10]. E. Lecolinet and O. Barret. Cursive word recognition: methods and survey. In S. Empedovo Editor,

Fundamentals in Handwriting Recognition, volume 24 of NATO_Advanced Institute, Series F

Springer-Verlag, pp.12-166.

[11]. P.M. Lallican, C. Viard-Gaudin, and S. Knerr, From Off-Line to On-Line Handwriting Recognition, Proc.

Int'l Workshop Frontiers in Handwriting Recognition, Amsterdam, 2000, pp. 303-312.

[12]. J. Kanai and A. D. Bagdanov, Projection profile based skew estimation algorithm for JPIG compressed

images, Int. J. Document Anal. Recognit., 1 (1) (1998) 4351.

[13]. B. Yu and A. K. Jain, A robust and fast skew detection algorithm for generic documents, Pattern

Recognition 29 (10) (1996) 1599 1630.

[14]. A. Hashizume, P. S. Yeh, and A. Rosenfeld, A method of detecting the orientation of aligned

components, Pattern Recognit. Lett. 4 (1986) 125132.

[15]. R. M. Bozinovic and S. N. Srihari, Off-line cursive script word recognition, IEEE Trans. Pattern Anal.

Machine Intell. 11 (1989) 6883.

[16]. D. Guillevic and C. Y. Suen, Cursive script recognition: A sentence level recognition scheme, in Proc.

Int. Workshop Frontiers in Handwriting Recognit., 1994, pp. 216223.

[17]. Kornai, K. M. Mohiuddin, and S. D. Connell, "Recognition of cursive writing on personal checks," in

Proc. of the 5th International Workshop on Frontiers in Handwriting Recognition, Essex, U.K., 1996,

pp.373378.

[18]. M. Cheriet, N. Kharma, C.L. Liu and C.Y. Suen, Character Recognition Systems, a Textbook for

Students and Practitioners, Edition John Wiley & Sons Inc. p.326. 2004.

[19]. N. W. Strathy. A method for segmentation of touching handwritten numerals.Master's thesis,

Concordia University, Montreal - Canada, Setember 1993.

[20]. G.Nemeth and K. Palagyi, ,Topology Preserving Parallel Thinning Algorithm, International Journal of

Imaging System and Technology, 21 (1) (2011) 37-44.

[21]. P.A. Devijver and J. Kittler,Pattern recognition, a statistical approach, Prentice Hall, London, pp. 480,

1982.

[22]. K. K.Kim, J. H. Kim, and C. Y. Suen, Segmentation-based recognition of handwritten touching pairs

of digits using structural features, Pattern Recognition 23 (1) (2002) 13-24.

[23]. D.Y.H. Yan, Separation of touching handwritten multi-numeral strings based on morphological

structural features, Pattern Recognition 34 (3) (2001) 587-599.

[24]. J. Mantas, Methodologies in pattern recognition and image analysis - a brief survey, Pattern

Recognition 20 (1) (1987) 16.

[25]. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, London, UK, 1995.

[26]. R. Payam, T. Lei and L. Huan, Cross Validation. In Encyclopedia of Database Systems, ed. Springer,

(2009), pp. 532538.

[27]. J. Weston and C. Watkins, Multi-class support vector machines for multi-class pattern recognition, in

Proc. European Symposium on Artificial Neural Networks. Bruges, Belgium, 1999, pp. 219224.

[28]. C. Hsu and C. Lin, A comparison of methods for multi-class support vector machines, IEEE Trans. on

Neural Networks 13 (2) (2002) 415425.

[29]. Y. Guermeur, A. Elisee and H. PaugamMoisy, A New multiclass SVM based on a uniform convergence

result, in Proc. of International Joint Conference on Neural Networks. Como, Italy, 2000, vol.4, pp.

183-188.

[30]. V. N. Vapnik, An Overview of Statistical Learning Theory, IEEE Trans. on Neural Networks. 10 (5)

(1999) 988999.

[31]. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin DAGs for multiclass classification. In S.

A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in Neural Information Processing Systems,

The MIT Press, Cambridge, MA, 2000, pp. 547553.

[32]. Y. LeCun , L. Jackel , L. Bottou , A. Brunot , C. Cortes , J. Denker , H. Drucker , I. Guyon , U. Müller ,

E. Säckinger , P. Simard and V. Vapnik, Comparison of learning algorithms for handwritten digit

recognition, in: F. Fogelman-Soulié, P.Gallinari (Eds.), Proceedings of the International Conference on

Artificial Neural Networks, Nanterre, France, pp. 5360, 1995.

[33]. L. Heutte, T. Paquet, J.V. Moreau, Y. Lecourtier and C. Olivier, A structural/statistical feature based

vector for handwritten character recognition, Pattern Recognition Lett., vol. 19, n°7, pp. 629641,

1998.

[34]. J. Cai, and Z.Q. Liu, Integration of structural and statistical information for unconstrained

handwritten numeral recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, 21 (3)

(1999) 263-270,.

[35]. J.X. Dong, A. Krzyzak and C.Y. Suen, A multi-net learning framework for pattern recognition,

Proceedings of the Sixth International Conference on Document Analysis and Recognition, Seattle,

pp. 328332, 2001.

[36]. L. N. Teow and K. F. Loe, Robust vision-based features and classification schemes for off-line

handwritten digit recognition, Pattern Recognition 35 (11) (2002) 2355 2364.

[37]. S. Belongie, J. Malik and J. Puzicha, Shape matching and object recognition using shape contexts,

IEEE Trans. Pattern Anal. Mach. Intell. 24 (4) (2002) 509522.

[38]. F. Lauer, C. Y. Suen and G. Bloch, A trainable feature extractor for handwritten digit recognition,

Pattern Recognition 40 (6) (2007) 18161824.

[39]. C. L. Liu, K. Nakashima, H. Sako, and H. Fujisawa, Handwritten Digit Recognition: Benchmarking of

State-of-the-Art Techniques, Pattern Recognition 36 (10) (2003) 2271-2285.

[40]. M. Diem, S. Fiel, A. Garz, M. Keglevic, F. Kleber and R. Sablatnig, ICDAR 2013 Competition on

Handwritten Digit Recognition (HDRC 2013), In Proc. of the 12th Int. Conference on Document

Analysis and Recognition (ICDAR) 2013, pp. 1454-1459, 2013.

[41]. A. Britto, R. Sabourin, F. Bortolozzi, C.Y.Suen. « Complementary features combined in an HMM-based

system to recognize handwritten digits ». In 12th International Conference on Image Analysis and

Processing(ICIAP), Mantova, Italy, pp. 670-675, 2003.

[42]. Y. Chen and J.F. Wang, Segmentation of Single- or Multiple-Touching Handwritten Numeral String

Using Background and Foreground Analysis, IEEE Transactions on Pattern Analysis Machine

Intelligence 22 (11) (2000) 1304-1317.

[43]. G. Aubertet, and P. Kornprobst, Traitement des images numériques, In J. Akoka and I. Comyn-

Wattiau, editors, Encyclopédie de l'informatique et des systèmes d'information, Vuibert, vol. 18, n°6,

pp. 861879,2006.

[44]. H. Nishida, Curve description based on directional features and quasi-convexity/concavity. Pattern

Recognition 28 (7) (1995) 1045-1051.

[45]. M.K. Hu, Visual problem recognition by moment invariant, IRE Trans. Inform. Theory, IT-8 (1962)

179-187.

[46]. M.K. Hu, Pattern recognition by moment invariants, proc. IRE, vol. 49, p.1428, 1961.

[47]. A.V.N. Manjunath, K. G. Hemantha and S. Noushath, Robust Unconstrained Handwritten Digit

Recognition using Radon Transform, in International Conference on Signal Processing,

Communications and Networking (ICSCN), 22-24 Feb, 2007. pp. 626-629.

[48]. O. R. Terrades and E. Valveny, Radon Transform for Linear Symbol Representation , Proc. of the

Seventh International Conference on Document Analysis and Recognition (ICDAR 2003) ,vol.1,

Edinburgh, Scotland, 3-6 August, pp. 195-199, 2003.

[49]. E.C. Kintner, On the mathematical properties of the Zernike polynomials, Opt. Acta. 23 (8) (1976)

679680.

[50]. L.S. Oliveira, Automatic recognition of handwritten numerical strings, Thesis of PhD, Ecole de

technologie superieure, University of Quebec. Canada, pp. 56-58,184, 2003.

[51]. E.J. Candès, and D.L. Donoho, Ridgelets: A Key to Higher-dimensional Intermittency, Philosophical

Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 357 (1760)

(1999) 24952509.

[52]. G. Y. Chen, , T. D. Bui, and A. Krzyzak, "Rotation invariant feature extraction using Ridgelet and

Fourier transforms," Journal of Pattern Analysis and Application, vol. 9, pp.83-93, 2006.

[53]. S. Arivazhagan, S.S. Priyadharshini and J.R. Sekar, Iris recognition using Ridgelet transform,

International Conference on Recent Advancements in Electrical, Electronics and Control Engineering

(ICONRAEeCE), pp. 286 290 ,2011.

[54]. J. Favata and G. Srikantan, A Multiple Feature/Resolution Approach To Handprinted Digit and

Character Recognition, International journal of imaging systems and technology 7 (4) (1996) 304

311.

[55]. A. Gattal and Y. Chibani, Combination of features generation methods for recognition of normalized

and not normalized isolated handwritten digits, Business Intelligence and Mobile Technology

Research: An Information Systems Engineering Perspective, ed. (Laouar and Eom), Cambridge

Scholars Publishing, (2013), pp. 118132.

[56]. P. J. Grother, NIST Special Database 19 -Handprinted Forms and Characters Database, National

Institute of Standards and Technology (NIST), 1995.

[57]. A. Gattal, Y. Chibani, C. Djeddi and I. Siddiqi, Improving Isolated Digit Recognition using a

Combination of Multiple Features, in Proc. of 14th International Conference on Frontiers in

Handwriting Recognition (ICFHR-2014), Crete Island, Greece ,2014, pp. 446451.

[58]. J. Kittler and J. Illingworth, Minimum Error Thresholding, Pattern Recognition 19 (1) (1986) 41-47.

[59]. L. S. Oliveira, R. Sabourin, F. Bortolozzi and C. Y. Suen, Automatic Recognition of Handwritten

Numerical Strings: A Recognition and Verification Strategy, IEEE Transactions on Pattern Analysis

and Machine Intelligence PAMI 24 (11) (2002) 14381454.

[60]. X. Wang, V. Govindaraju and S. Srihari, Holistic recognition of touching digits, in Proc. Sixth

International Workshop on Frontiers in Handwriting Recognition. Daejeon, 1998, pp. 295303.

[61]. G. Congedo, G. Dimauro, S. Impedovo and G. Pirlo, Segmentation of Numeric Strings, in Proc. of

Third Int. Conf. on Document Analysis and Recognition. Montreal, Canada, 1995, vol. 2, pp. 1038

1041.

[62]. H. Fujisawa, Y. Nakano and K. Kurino, Segmentation Methods for Character Recognition: From

Segmentation to Document Structure Analysis, Proceedings of the IEEE 80 (7) (1992) 10791091.

[63]. Z. Shi and V. Govindaraju, Segmentation and recognition of connected handwritten numeral strings,

Pattern Recognition 30 (9) (1997) 15011504.

[64]. A. Elnagar and R. Alhajj, Segmentation of connected handwritten numeral strings, Pattern

Recognition 36 (3) (2003) 625634.

[65]. U. Pal, A. Belaid and C. Choisy, Touching numeral segmentation using water reservoir concept,

Pattern Recognition Letters 24 (2003) 261272.

[66]. L. S. Oliveira, E. Lethelier, F. Bortolozzi, R. Sabourin, A new approach to segment handwritten digits,

in Proc. 7th International Workshop on Frontiers in Handwriting Recognition. Amsterdam,

Netherlands, 2000, pp. 577582.

[67]. R. Fenrich and S. Krishnamoorthy, Segmenting diverse quality handwritten digit strings in near real-

time, in Proc. of 5th USPS Advanced Technology Conference. Washington, D.C., 1990, pp. 523537.

[68]. K.K. Kyung, K. J. Ho and Y. S. Ching, Segmentation-based recognition of handwritten touching pairs

of digits using structural features, Pattern Recogn. Lett. 23 (13) (2002) 1324.

[69]. Y. Lei, C. S. Liu, X. Q. Ding and Q. Fu, A recognition based system for segmentation of touching

handwritten numeral strings. Proc. of 9th International Workshop on Frontiers in Handwriting

Recognition (IWFHR-09), Tokyo, Japan, 2004, pp.294-299.

[70]. R. Ma, Y. Zhao, Y. Xia and Y. Yan, A Touching Pattern-oriented Strategy for Handwritten Digits

Segmentation, Proc. of the Intl. Conf. on Computational Intelligence and Security (CIS), Suzhou,

China, vol.1, 2008, pp. 174179.

[71]. Y.J. Wang, X.B. Liu and Y.D. Jia, Statistical Modeling and Learning for Recognition-Based

Handwritten Numeral String Segmentation, Proc. of 10th International Conference on Document

Analysis and Recognition (ICDAR-10), Barcelona, Spain, 2009, pp. 421425.

[72]. B. L. Everton and A. B. M. Carlos, Segmentation of connected handwritten digits using Self-

Organizing Maps, Expert systems with applications 40 (2013) 5867-5877.

[73]. A. Gattal and Y. Chibani, Segmentation Strategy of Handwritten Connected Digits (SSHCD), in Proc.

of 16th International Conference on Image Analysis and Processing (ICIAP'11) Ravenna, Italy, 2011,

pp. 248254.

[74]. A. Gattal and Y. Chibani, Segmentation and recognition strategy of handwritten connected digits

based on the oriented sliding windows, in Proc. of International Conference on Frontiers in

Handwriting Recognition. Bari, Italy, 2012, pp. 297-301.

[75]. K. M. Hussein, A. Agarwal, A. Gupta and P. S. P. Wang, A Knowledge-based algorithm for enhanced

recognition of handwritten courtesy amounts, Pattern Recognition 32 (1999) 305316.

[76]. A. S. Britto, R. Sabourin, F. Bortolozzi and C. Y. Suen, Foreground and Background Information in an

HMM-based Method for Recognition of Isolated Characters and Numeral Strings, in Proc.

International Workshop on Frontiers in Handwriting Recognition. Tokyo, Japan, 2004, pp. 371376.

[77]. A. Ko, R. Sabourin and Jr. A. Britto, Ensemble of HMM classifiers based on the Clustering Validity

Index for a Handwritten Numeral Recognizer, Pattern Analysis and Applications 12 (01) (2009) 21

35.

[78]. A. Gattal and Y. Chibani, SVM-Based Segmentation-Verification of Handwritten Connected Digits

Using the Oriented Sliding Window, International Journal of Computational Intelligence and

Applications (IJCIA), 14 (1) (2015) 117

[79]. L. S. Oliveira, N. Benahmed, R. Sabourin, F. Bortolozzi and C. Y. Suen, Feature Subset Selection

Using Genetic Algorithm for Handwritten Digit Recognition, in Proc. XIV Brazilian Symposium on

Computer Graphics and Image Processing. Florianopolis, Brazil, 2001, pp. 362-369.

[80]. B. Al-Badr and S. A. Mahmoud, Survey and bibliography of Arabic optical text recognition, Signal

Processing. 41 (1) (1995) 4977.

[81]. A. S. Britto, R. Sabourin, F. Bortolozzi and C. Y. Suen, The recognition of handwritten numeral strings

using a two-stage HMM-based method, International Journal on Document Analysis and Recognition

5 (2003) 102117.

[82]. R. M. Suresh, S. Arumugam, Fuzzy technique based recognition of handwritten characters, Image

and Vision Computing 25 (2) (2007) 230239.

[83]. V.K. Madasu, M.H.M. Yusof, M. Hanmandlu, K. Kubik, Automatic extraction of signatures from bank

cheques and other documents, Proc. of the Seventh International Conference on Digital Image

Computing: Techniques and Applications (DICTA-07), Sydney, Australia, 2003, pp.591600.

[84]. L. Lam, C. Y. Suen, Structural classification and relaxation matching of totally unconstrained

handwritten zip-code numbers, Pattern Recognition, 21 (1) (1988) 19-31.

[85]. A.S. Britto, R. Sabourine, F. Bortolozzi, C.Y. Suen, Recognition of handwritten numeral strings using a

two-stage HMM-based method, Int. J. Doc. Anal. Recognition 5 (23) (2003) 102117.

[86]. V. Alessandro, J. Luettin, A new normalization technique for cursive handwritten words. Pattern

Recognition Letters. 22 (9) (2001) 1043-1050.

[87]. Z. Lu, Z. Chi, W.C. Siu, P. Shi, A background-thinning-based approach for separating and recognizing

connected handwritten digit strings, Pattern Recognition, 32 (6) (1999) 921933.

[88]. L.S. Oliveira, R. Sabourin, Support vector machines for handwritten numerical string recognition,

Proc. of the International Workshop on Frontiers in Handwriting Recognition (IWFHR-09), Tokyo,

Japan, 2004, pp. 3944.

[89]. J. Sadri, C.Y. Suen, T.D. Bui, A genetic framework using contextual knowledge for segmentation and

recognition of handwritten numeral strings. Pattern Recognition, 40 (3) (2007) 898919.

[90]. T. Kanungo, R. M. Haralick, Character recognition using mathematical morphology. Proc. of USPS

Fourth Advanced Technology Conference, Washington, D.C., 1990, Vol. 2, pp. 973-986.

[91]. T.M. Ha, M. Zimmermann, H. Bunke, Off-line handwritten numeral string recognition by combining

segmentation-based and segmentation-free methods. Pattern Recognition, 31 (3) (1998) 257272.

BOUT THE AUTHOR

IOGRAPHY

Abdeljalil Gattal was born in Algeria. He received his BS degree in

Computer Science from University of Skikda (Algeria) in 2004, and

MS degree in Computer Science "Information and knowledge

systems" from Abbes Laghrour University of Khenchela (Algeria) in

2009. Currently, He is a PhD student at the Ecole nationale

Supérieure d'Informatique (ESI-Algeria) in Computer Science. He is

working as an Assistant Professor at the Department of Computer

and Mathematics Science in University of Tebessa (Algeria) since

2009. He participated in the organization of several national and

international conferences, which were held at the University of

Tebessa. He collaborated, as member in several research projects

CNEPRU and PNR. He supervised many Master and License

students. He has published a number of papers. His research interests include Image Analysis,

Pattern Recognition and Recognition of handwriting. Currently, he is member of

Laboratoire de Mathématiques, d'Informatique et des Systèmes (LAMIS) of the University of

Tebessa.

SCIENTIFIC CONTRIBUTIONS

1. Journal Papers

Abdeljalil Gattal and Youcef Chibani," SVM-Based Segmentation-Verification of Handwritten

Connected Digits Using the Oriented Sliding Window", International Journal of

Computational Intelligence and Applications (IJCIA), Vol. 14, N° 01, March 2015.

Information about the journal IJCIA

ISSN: 1469-0268

Frequency of publication: 4 numbers per year.

Process reviewing: peer reviewed.

Published by: Brijesh Verma, in all World Scientific and Imperial College Press journals.

Web Site: http://www.worldscientific.com/worldscinet/ijcia

Indexed in: Compendex, CompuScience, DBLP Computer Science Bibliography, Ebsco

Discovery Service, ExLibris Primo Central , Google Scholar ,OCLC WorldCat® , Proquest

Summon , Scopus.

100

22. Book Chapters:

Abdeljalil Gattal, Youcef Chibani, "Combination of features generation methods for recognition of

normalized and not normalized isolated handwritten digits", Business Intelligence and Mobile

Technology Research: An Information Systems Engineering Perspective, ed (Laouar and Eom).

published by Cambridge Scholars Publishing, pp. 118-132.2013

3. Papers published in Lecture Notes

A.GATTAL, Y.CHIBANI, "Segmentation Strategy of Handwritten Connected Digits (SSHCD)", 16th

International Conference on Image Analysis and Processing, ICIAP 11, 14-16 September 2011,

Ravenna, Italy. Proceedings, Part II. LNCS 6979 Springer 2011, ISBN 978-3-642-24087-4.

4. Conference Papers

1. Abdeldjalil GATTAL, Youcef CHIBANI, Hassiba NEMOUR, Mohamed CHERIET: « Combinaison de

méthodes de segmentation automatique des chiffres manuscrits pour la reconnaissance du

montant numérique des cheques bancaires algériens », Journées sur la Gestion Electronique

de Documents et Réseaux Informatiques 2009, Annaba, 20-21 Mai 2009.

2. A. Gattal, Y. Chibani, H. Nemmour et M. Cheriet : « combinaison de méthodes de segmentation

automatique des chiffres manuscrits pour la reconnaissance du montant numérique des chèques

bancaires algériens », The 6th International School on Signal The 6th International Summer

School on Signal Processing & its Applications, ISSSPA 09, 4-8 October 2009, Oran, Algeria.

3. A.Gattal, Y. Chibani, H. Nemmour, « Combinaison de méthodes de segmentation explicite pour la

reconnaissance automatique des chiffres manuscrits », 1ére édition de la conférence Maghrébine

d'Extraction et de Gestion de Connaissances, EGC-M'2010, 13-14 décembre 2010, Alger.

4. A.GATTAL, Y.CHIBANI, « Combining Multiple Segmentation Methods for Handwritten Digit

Recognition of Algerian Bank Checks ». 2nd International Conference on Information Systems

and Technologies (ICIST'2012) March 24-26 2012, Sousse, Tunisia. ISBN 978-9938-9511-2-7

5. A.GATTAL, Y.CHIBANI, « Segmentation and recognition strategy of handwritten connected digits

based on the oriented sliding window ». 13th International Conference on Frontiers in Handwriting

Recognition (ICFHR-2012) September, 2012, Bari, Italy, IEEE Computer Society. ISBN 978-0-

7695-4774-9

6. A.GATTAL, Y.CHIBANI, « Combination of features generation methods to improve recognition of

isolated handwritten digits ». 3rd International Conference on Information Systems and

Technologies - ICIST'2013, March , 2013, Tangier, Morocco, awarded the best paper.

7. Abdeljalil Gattal, " Segmentation Algerian postal check: Automatic extraction of numeral

amounts", The 2nd International Conference on Software Engineering and New Technologies

"ICSENT - 2013", Hammamet, Tunisia, December 21-23, 2013.

8. Chawki Djeddi, Labiba-Souici Meslati, Imran Siddiqi, Abdelllatif Ennaji, Haikal El Abed, Abdeljalil

Gattal," Evaluation of Texture Features for Offline Arabic Writer Identification", 11th IAPR

International Workshop on Document Analysis Systems (DAS-2014) ,2014. pp. 106-110

9. Abdeljalil Gattal, Youcef Chibani, Chawki Djeddi and Imran Siddiqi, "Improving Isolated Digit

Recognition using a Combination of Multiple Features", 14th International Conference on Frontiers

in Handwriting Recognition (ICFHR-2014) September, 2014, Crete Island, Greece, IEEE

Computer Society. ISBN 978-1-4799-4335-7, pp. 446- 451

10. Chawki Djeddi, Abdeljalil Gattal, Labiba Souici-Meslati, Imran Siddiqi, Youcef Chibani and Haikal El

Abed, "LAMIS-MSHD: A Multi-Script offline Handwriting Database", 14th International Conference

on Frontiers in Handwriting Recognition (ICFHR-2014) September, 2014, Crete Island, Greece,

IEEE Computer Society. ISBN 978-1-4799-4335-7,pp. 93-97

101

11. Chawki Djeddi, Somaya Al-Máadeed, Abdeljalil Gattal, Imran Siddiqi, Labiba Souici-Meslati, Haikal

El Abed, "ICDAR2015 competition on Multi-script Writer Identification and Gender Classification

using 'QUWI' Database", 13th International Conference on Document Analysis and Recognition,

(ICDAR 2015) August, 2015, Tunis, Tunisia, pp. 1191-1195

12. Abdeljalil Gattal, Youcef Chibani, Bilal Hadjadji, Hassiba Nemmour, Imran Siddiqi, Chawki Djeddi,

"Segmentation-verification based on fuzzy integral for connected handwritten digit recognition",

International Conference on Image Processing Theory, Tools and Applications, (IPTA 2015),

Orleans, France, November, 2015, pp. 588-591

13. Abdeljalil Gattal, Chawki Djeddi, Youcef Chibani, Imran Siddiqi, "Isolated Handwritten Digit

Recognition Using oBIFs and Background Features", 12th IAPR International Workshop on

Document Analysis Systems (DAS-2016), Santorini, Greece, 2016. pp. 305-310

55. Research projects

2012-2014: PNR Project: Traitement Automatique des Chèques Postaux Algériens.

2011-2014: CNEPRU Project: Développement d'un système pour la reconnaissance de l'écriture

manuscrite dans les documents images. Project Code: J0200220100015

2015-2017: CNEPRU Project: Techniques d'analyse biométrique des documents manuscrits.

Project Code: J0200220140009

ARTICIPATION IN SPECIALIZED COMPETITION

2013

Eighth position for The ICDAR 2013"Competition on Handwritten Digit Recognition". The results of

the competition was presented in the following article:

M. Diem, S. Fiel, A. Garz, M. Keglevic, F. Kleber, R. Sablatnig. "ICDAR2013 Competition on

Handwritten Digit Recognition (HDRC 2013)". In Proceedings of the 12th International Conference

on Document Analysis and Recognition, Washington DC, USA, pp. 1454-1459, 2013.

2014

Fourth position for The ICFHR 2014(Second in CVL dataset, fifth in CAR A Database, fifth in CAR B

Database)"Competition on Handwritten Digit String Recognition in Challenging Datasets ". The

results of the competition was presented in the following article:

M.Diem, S. Fiel, F. Kleber, R. Sablatnig, J.M. Saavedra, D. Contreras,J. M. Barrios, and L.S.

Oliveira. "ICFHR 2014 Competition on Handwritten Digit String Recognition in Challenging Datasets

(HDSRC 2014)". 4th International Conference on Frontiers in Handwriting Recognition (ICFHR-

2014) September, Crete Island, Greece, pp. 779-784, 2014.

Foire aux questions

What is the focus of this thesis?

This thesis focuses on the development of an Automatic Handwritten Digit String Recognition (AHDSR) system.

What are the main challenges addressed in the thesis?

The main challenges include:

Segmenting adjacent digits (spaced, overlapped, connected)
Dealing with the unknown length of digit strings
Developing a recognition and verification module

What are the key techniques used in the thesis?

The thesis explores several techniques, including:

Histogram of Vertical Projection (HVP)
Contour analysis
Sliding Window Radon Transform (SWRT)
Support Vector Machines (SVM) for classification and verification
Oriented sliding window for segmenting connected handwritten digits

What is the proposed system based on?

The proposed system uses a combination of several explicit segmentation methods depending on the configuration link between digits.

What are the main modules of the proposed system?

The main modules are:

Primary segmentation
Secondary segmentation
Digit recognition-verification

What is the purpose of primary segmentation?

Primary segmentation utilizes HVP to initially separate spaced digits in the digit string.

What is the purpose of secondary segmentation?

Secondary segmentation addresses remaining problems after primary segmentation by analyzing potentially overlapping and connected components that were initially rejected. It incorporates contour analysis, SCA, SWRT, and GCA to find potential cuts.

What datasets were used for evaluating the system?

The NIST SD19 handwritten digits database and CVL Single Digit Database are used for evaluating the system.

What are the key findings of the thesis?

The key findings include:

Combination of features enhances recognition rates.
Selecting appropriate orientation angle is crucial in SWRT.
A trade-off should be adopted to balance between accuracy and segmentation complexity

What is the novelty of the research?

The thesis innovates through:

An explicit approach for segmenting and recognizing handwritten digit strings without prior knowledge of their length
A segmentation-verification technique based on an oriented sliding window and SVM

Fin de l'extrait de 114 pages - haut de page

Acheter maintenant

Titre: Segmentation-Verification for Handwritten Digit Recognition

Thèse de Doctorat , 2016 , 114 Pages

Autor:in: Abdeljalil Gattal (Auteur)

Informatique - Informatique appliquée

Lire l'ebook

Résumé des informations

Titre: Segmentation-Verification for Handwritten Digit Recognition
Université: National Higher School Of Computer Engineering
Auteur: Abdeljalil Gattal (Auteur)
Année de publication: 2016
Pages: 114
N° de catalogue: V374168
ISBN (ebook): 9783668518094
ISBN (Livre): 9783668518100
Langue: anglais
mots-clé: Handwritten Digit Recognition Segmentation SVM
Sécurité des produits: GRIN Publishing GmbH

Citation du texte: Abdeljalil Gattal (Auteur), 2016, Segmentation-Verification for Handwritten Digit Recognition, Munich, GRIN Verlag, https://www.grin.com/document/374168