108.535A: Studies in Computational Linguistics

Hyopil Shin (Dept. of Linguistics, Seoul National University)

hpshin@snu.ac.kr, http://knlp.snu.ac.kr
Tuesdays  from 1:00 to 4:00 PM in building 5, 107-2

T.A: Youngsam Kim( youngsamy@gmail.comkys079@snu.ac.kr)

(http://www.theverge.com/2016/3/11/11208078/lee-se-dol-go-google-kasparov-jennings-ai)

Updates

Course description

This course provides an introduction to the field of Computational Linguistics (CL) or Natural Language Processing (NLP) . We will cover subfields of CL/NLP from morphology to semantics. The course will introduce basic notions and algorithms widely used in the fields. Knowledge-based and statistical approaches to CL/NLP will be primary methodologies. In this class, you  will learn the techniques that are the building blocks of NLP, and how to assemble them to accomplish goals.

Textbook

 

Speech and Language Processing 2nd Edition, by Daniel Jurafsky and James H. Martin

 

                                                                                     

Natural Language Processing with Python, by Steven Bird, Ewan Klein, and Edward Loper, 2009. A version of the book is available online.

Linguistics and Statistical Models, by Hyopil Shin, 2009. The 2nd edition  is available at bookstores

Syllabus

Date Topics What you have to do Assignment
1 9/6

Intro to NLP

What is NaturaI Language Processing (Jurafsky's NLP Online Course Material)

Install NLTK (Python 2.7 recomended, installation guide for Window users)

Read SLP Chap 1. and 2

Study NPwithPython Chap 3 and be ready for the first assignment!

2 9/13

Regular Expressions and Automata (SLP) / Processing Raw Text(NPwithPython)

Ken Church's tutorial Unix for Poets

Words and Transducers

Edit Distance by Jurafsky

NPwithPython Chap 3

 Assignment 1: Implement Exercise 18, and 25 in NPwithPython Chap 3 . Use  BROWN_A1 as an input source file.

(Filename: YourStudentIDnumber_hw1-1.py for the Excercise 18, YourStudentIDnumber_hw1-2.py for the Excercise 25)

Due. Sep. 20th, 1:00 p.m.

(Please Submit it through ETL)

Python Regular Expression Page

3 9/20

N-grams(SLP) - Jurafsky's PPT

Accessing Text Corpora and Lexical Resources(NPwith Python)

**WordCounting and Printing Python Sample

NPwithPython Chap 2

Assignment 2: Choose either DFA assignment or a variation of exercise 38 in NPwithPython Chap 3

(Filename: YourStudentIDnumber_hw2-1.py for the First assignment, YourStudentIDnumber_hw2-2.py for the Second one)

Due. Sep. 27th, 1:00 p.m.

4 9/27 NPwithPython by Guest Lectures
Code and Unicode in Python
   
5 10/4

Naivebayes(Jurafsky's online course PPT)

Entropy Intro

Entropy

Maxmimum Entropy Models(SLP)/Learning to Classify Text(NPwithPython)

NPwithPython Chap 6

A Brief Maxent Tutorial
Good online tutorial by
Adam Berger

Assignment 3: Choose either Schannon Method or Cross Entropy

(Filename: YourStudentIDnumber_hw3-1.py for the First Question, YourStudentIDnumber_hw3-2.py for the Second one)

Due. Oct. 11th, 1:00 p.m.

6 10/11

Hidden Markov Model(Forward-Backward/)Viterbi (SLP)

Forward, Backward and Viterbi Sample

 

 
7 10/18

Classification Tools: SVM, NumPy, Scit-learn by Guest Lecturer

  MidTerm:TakeHome Programming
8 10/25 Mid-Term Exam    
9 11/1

Tagging(SLP)/Categorizing and Tagging words(NPwithPython)

Collocations

t-distribution table

Chi-square table

Collocations(FSNLP Chap.5)

NPwithPython Chap 5

 
10 11/8

Context-Free Grammars and Formal Grammars(SLP)/Analyzing Sentence Structure(NPwithPython)

 

 

11 11/15

Parsing with CFGs(SLP)/Analyzing Sentence Structure(NPwithPython)

NPwithPython Chap 8 

Assignment 4: Choose either

a.  Grammars, Parsing Based on CKY  (with Grading weight)or

b. (Excercise 16 and 17 of NPwithPython Chap 8

Due. Nov. 22nd, 1:00 p.m.

12 11/22

Statistical Parsing/Analyzing Sentence Structure(NPwithPython)

Language and Complexity

   
13 11/29

Vector Semantics from SLP 3rd Edition Draft

Semantics with Dense Vectors from SLP 3rd Edition Draft

Computing with Word Senses from SLP 3rd Edition Draft

Efficient Estimation of Word Representations in Vector Space by Mikolov et al.(2013)

Vector Semantics by Jurafsky  

Efficient Estimation of Word Representations in Vector Space - lecture material

word2vec in Google code

gensim module for word2vec in python

word2vec Tutorial

NPwithPython Chap 10

1. Install any Korean Morphological Analyzer. You can refer to KONLPY

2. Install gensim based on your machines

3. Googling related sites including this

14 12/6

Vector Semantics from SLP 3rd Edition Draft

Semantics with Dense Vectors from SLP 3rd Edition Draft

Computing with Word Senses from SLP 3rd Edition Draft

Efficient Estimation of Word Representations in Vector Space by Mikolov et al.(2013)

Vector Semantics by Jurafsky  

Efficient Estimation of Word Representations in Vector Space - lecture material

word2vec in Google code

gensim module for word2vec in python

word2vec Tutorial

NPwithPython Chap 10

1. Download Korean News Articl Data

2. Morphological Analysis sample code using KONLPY

3.Word2Vec Modeling and Testing Sample code

15 12/13 Final Test    

Resources

 

  • Chapter by chapter (based on the first version of SLP) links to resources on the Web
  • Visual Regular Expression tool
  • Depth-First and Breadth-Frist Search
  • Levenshtein Minimum Edit Distance DEMO
  • N-gram and Smoothing DEMO
  • HMM DEMO