The KOSAC data in the csv formatted file contains all annotated tags: subjective, objective, and seed tags. This README is to provide a guide to understand the listed tags in csv format. Each column is explained with number below. ***For detailed explanations of column values, see the guideline or Shin et al. (2012). Shin, Hyopil, Munhyong Kim, Yu-Mi Jo, Hayeon Jang, and Andrew Cattle. 2012. Annotation Scheme for Constructing Sentiment Corpus in Korean In proceedings of the 26th Pacific Asia Conference on Language, Information and Compuation, pages 181-190. -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- 1. tag_id This tag_id is unique ID for each tag in the whole KOSAC data. The ID may not be continuous since some tags are deleted in annotation process. 2. sent_id The sent_id refers to which sentence the tag belongs to. Because the sentence ID is unique for each sentence, it helps to gather all tags that belong to it. 3. tag type The tag type shows if the tag is a sentence level tag, subjective or objective, or an under-sentence level tag, seed tag. 4. morphemes This column contains expressions on which tags are anchored. Only seed tags have anchored expressions on this column. Anchors of objective and subjective tags can be found on sentence-morph column. The format of each morpheme is expression/part_of_speech#id, as “사랑/NNG#57926”. The id can be used to match the expression in the whole sentence. 5. expressive-type The expressive-type column contains how the expression is delivered by a writer of the sentence. The values of the column are direct-explicit, direct-speech, direct-action, indirect, and writing-device. 6. subjectivity-type The subjectivity-type attribute indicates what kind of sentiment the archored expression belongs to. The categories are Judgment, Argument, Intention, Agreement, Speculation, Emotion, Others. 7. subjectivity-polarity The subjectivity-polarity has positive, negative, complex, and neutral. This value composes one united value with subjectivity-type value, for instance, judgment-pos, intention-pos. This value does not refer to the usual polarity sense of expressions. For more details, refer to the documents above. 8. polarity The polarity attribute indicates the polarity sense of the anchored expression. 9. intensity The intensity column shows the how strong the subjective expression is. 10. nested-source This column shows the source of the expression and the path of the delivering subjectivity via sources. The left most source is always writer, though it is omitted. The format is source1-source2-source3. It is supposed that it is less likely to be the case that the sources of a subjective expression are more than four. 11. target The target column contains the target expressions to which the direction of sentiment towards. The format is target1-target2-target3, indicating that multiple targets are possible for a sentiment expression. 12. comment This column is for an annotator to leave a comment on a tag. 13. confident This confident column is to leave a marker indicating how confident an annotator is about the tag. 14. raw-sentence The raw-sentence column contains the original sentence that the tag belongs to. 15. sentence-morph The raw-sentence column contains the original sentence with morphemes that the tag belongs to.