DSpace
 

Tai Nguyen So - Vietnam National University, Ha Noi - VNU >
TRƯỜNG ĐẠI HỌC CÔNG NGHỆ >
PTN Micro Nano >
New - Articles of Universities of Vietnam from Scopus >

Search

Please use this identifier to cite or link to this item: http://tainguyenso.vnu.edu.vn/jspui/handle/123456789/13003

Title: Text extraction from graphical document images using sparse representation
Authors: Hoang T.V.
Tabbone S.
Keywords: Curvelet transform
Morphological component analysis
Redundant wavelet transform
Sparse representation
Text component grouping
Text/graphics separation
Issue Date: 2010
Publisher: ACM International Conference Proceeding Series
Citation: Volume , Issue , Page 143-150
Abstract: A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-dimensional signals by which text and graphics have different morphological characteristics. The proposed algorithm relies upon a sparse representation framework with two appropriately chosen discriminative overcomplete dictionaries, each one gives sparse representation over one type of signal and non-sparse representation over the other. Separation of text and graphics components is obtained by promoting sparse representation of input images in these two dictionaries. Some heuristic rules are used for grouping text components into text strings in post-processing steps. The proposed method overcomes the problem of touching between text and graphics. Preliminary experiments show some promising results on different types of document. Copyright 2010 ACM.
URI: http://tainguyenso.vnu.edu.vn/jspui/handle/123456789/13003
ISSN: 
Appears in Collections:New - Articles of Universities of Vietnam from Scopus

Files in This Item:

File SizeFormat
HN_U194.pdf51.35 kBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! DSpace Software Copyright © 2002-2010  Duraspace - Feedback