한국어 정보검색에서 N-GRAM 이용한 미등록어 색인 방법 [韩语论文]

资料分类免费韩语论文 责任编辑:金一助教更新时间:2017-04-27
提示:本资料为网络收集免费论文,存在不完整性。建议下载本站其它完整的收费论文。使用可通过查重系统的论文,才是您毕业的保障。

When indexing korean document for information retrieval, the general practice is to index nouns using phrase and morpheme analysis. However, difficulties lie in indexing those unknown words in the dictionary, a commonly used reference tool for morpheme analysis. Such unknown words can include proper nouns, borrowed words, and professional terms, and they can be a key index for information retrieval.

The N-GRAM, with its non-linguistic features, is characterized by faster processing speed, the ability to index unknown words not listed in the morpheme dictionary, and is effective for separating compound nouns.On the other hand, it can extract unrelated index words which lead to taking up too much of memory space and can degrade search efficiency.

In order to make up for such weak points of N-GRAM, this study suggests that uninflected words and conjugated words be extracted as index words first and that N-GRAM be applied at the stage for processing unknown words. Also, experiments showed that, with the same retrieval system, application of N-GRAM to the indexing algorithm for unknown words helped it perform better than other algorithms.

韩语论文题目韩语论文范文
免费论文题目: