The paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values. It is derived from the position of the letters in the text line, based on their typographical characteristics. Each code is considered as a gray level. Accordingly, the coded text determines a 1-D image, on which texture analysis by run-length statistics and local binary pattern are performed. It defines feature vectors representing the script content of the document. A modified clustering approach employed on the document feature vectors groups documents written in the same script. Experimentation performed on two custom oriented databases of historical documents in old Cyrillic, angular and round Glagolitic as well as Antiqua and Fraktur scripts demonstrates the superiority of the proposed method with respect to well-known methods in the state-of-the-art.

Document image coding and clustering for script discrimination

Amelio A.;
2016-01-01

Abstract

The paper introduces a new method for discrimination of documents given in different scripts. The document is mapped into a uniformly coded text of numerical values. It is derived from the position of the letters in the text line, based on their typographical characteristics. Each code is considered as a gray level. Accordingly, the coded text determines a 1-D image, on which texture analysis by run-length statistics and local binary pattern are performed. It defines feature vectors representing the script content of the document. A modified clustering approach employed on the document feature vectors groups documents written in the same script. Experimentation performed on two custom oriented databases of historical documents in old Cyrillic, angular and round Glagolitic as well as Antiqua and Fraktur scripts demonstrates the superiority of the proposed method with respect to well-known methods in the state-of-the-art.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11564/770204
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 11
social impact