HOMEE-SUBMISSIONSITEMAPCONTACT US

CORPUS LINGUSITICS RESEARCH

pISSN: 2465-812X

Journal SearchALL ISSUE

ALL ISSUE

Export Citation Download PDF PMC Previewer
대용량 코퍼스 전산적 툴에 대한 연구 ×
  • EndNote
  • RefWorks
  • Scholar's Aid
  • BibTeX

Export Citation Cancel

CORPUS LINGUSITICS RESEARCH Vol.6 No.1 pp.45-63
대용량 코퍼스 전산적 툴에 대한 연구
김동성1†
1 이화여자대학교
Key Words : Corpus; IMS Corpus Workbench; Sketch Engine; Big Data; Encoding

Abstract

Since corpus is a pile of everyday language usage, using computing tools is essential in collecting, sifting, mining and using the meaningful data from the massive text data. In this paper, we introduce two tools for handling the large scale corpus; IMS Corpus Workbench (CWB) and Sketch Engine. The architecture of the tools is the inverted index model as a type of reference database, providing corpus handlers with speed and extendibility. The limit of CWB lies in the Western language character unicode system (ISO-8859), causing unsatisfactory handling of Korean in the full-fledged scale. We need to consider more suitable architectural design for searching, storing and user-friendly interface in case of large scale corpus in Korean.
LIST
Export citation