Abstract:
|
Only very recently have Vietnamese researchers begun to be involved in the domain of Natural
Language Processing (NLP). As there does not exist any published work in formal linguistics nor any
recognizable standard for Vietnamese word definition and word categories, the fundamental tasks for
automatic Vietnamese language processing, such as part-of-speech tagging, parsing, etc., are very difficult
tasks for computer scientists. The fact that all necessary linguistic resources have to be built from scratch by
each research team is a real obstacle to the development of Vietnamese language processing. The aim of our
projects is thus to build a common linguistic database that is freely and easily exploitable for the automatic
processing of Vietnamese. In this paper, we present our work on creating a Vietnamese lexicon for NLP
applications. We emphasize the standardization aspect of the lexicon representation. We especially propose
an extensible set of Vietnamese syntactic descriptions that can be used for tagset definition and
morphosyntactic analysis. These descriptors are established in such a way as to be a reference set proposal
for Vietnamese in the context of ISO subcommittee TC 37/SC 4 (Language Resource Management). ?? 2007
Springer Science+Business Media B.V. |