Overview
We focus on developing algorithms to process
text and to make their information accessible
to many Natural Language Processing-based
applications. We also specialize in Korean
Language Processing and keep some Korean
Language Processing tools and resources.
If you are interested, please contact
us!
Large
Language Models
We are developing Korean Large Language
Models that can well reflect the
characteristics of the Korean language by
creating new tokenizers and other tools, and
by fine-tuning, combining and integrating
layers of existing open models in various ways
to apply them to various fields.
Try out Our LLM DaG and
KATALOG !
Transformer-based
Pre-trained Models
We are building and releasing various types
of Korean pre-trained models trained through
Transformer-based language models such as BERT
and GPT
Try out Korean based Bert pre-trained (KR-BERT), KR-KOSAC-BERT, and many other models.
Sentiment / Opinion Analysis
We have been working on (Korean) Sentiment/Opinion Analysis.
Korean Temporal Awareness and Reasoning Systems for Question Interpretation
We are working on the Korean version of Temporal Awareness and Reasoning Systems for Question Interpretation, following the work of TARSQI in Brandeis University. Currently, we are developing the Korean TimeML (Markup Language for Temporal and Event Expressions).
TimeML is a robust specification language for events and temporal expressions in natural language. It is designed to address four problems in event and temporal expression markup:
1. Time stamping of events (identifying an event and anchoring it in time)
2. Ordering events with respect to one another (lexical versus discourse properties of ordering)
3. Reasoning with contextually underspecified temporal expressions (temporal functions such as 'last week' and 'two weeks before')
4. Reasoning about the persistence of events (how long does an event or the outcome of an event last)
Korean Lexical Resources
We are developing Korean lexical resources for various NLP task
Korean Language Processing
Fields in which we are interested in relation to Korean Language:
As part of the work on constructing the 21st Century Sejong Electronic Dictionary, we have been in charge of its "special words", which are abbreviations frequently found in texts, recently made words, proper nouns, foreign words, in short, words that are not listed in dictionaries but are essential for the research on Korean language processing.
Also, we have been working on the mapping of Korean basic verbs and nouns over the Mikrokosmos Ontology, which is basic for Korean language processing.
Knowledge Base/Ontology
Nowadays, research related to ontologies in connection with natural language processing of meanings is a trend. These ontologies, as structures of concepts, are a part of a knowledge base needed for lexical bases, lexical networks, semantic networks and meta-NLP. Concerning this field, we have been doing the following at our lab:
XML-related Work
Information Retrieval
By making use
of collocations, morphology, grammatical
properties, we have created a database, and
we are now working on how to get a higher
performance from the lexical information
retrieval system based on existing
theoretical lexical information, and how to
improve the precision of the calculation
model for the statistical classification of
documents. We are applying linguistic
information (part of speech, meaning) to
decrease the vector space, and through this
grasp the character of the text to be able
to analyze documents by automatic
question-and-answer system, and automatic
grading of essays.