Abstract:
|
Word segmentation is one of the most important tasks in NLP. This task, within Vietnamese
language and its own features, faces some challenges, especially in words boundary determination. To tackle
the task of Vietnamese word segmentation, in this paper, we propose the WS4VN system that uses a new
approach based on Maximum matching algorithm combining with stochastic models using part-of-speech
information. The approach can resolve word ambiguity and choose the best segmentation for each input
sentence. Our system gives a promising result with an F-measure of 97%, higher than the results of existing
publicly available Vietnamese word segmentation systems. ?? 2009 IEEE. |