Build Probability and Surprisal Model for English Data

This script builds language models to compute probability and surprisal based on n-gram counts.
1 min read

Build Text Classification Models (LSTM, BERT, etc.) for English data

This script builds a couple of machine learning models to classify documents by pre-set categories. We use Google Colab for the script, which allows us to use GPU instead of CPU.
0 min read

Create GloVe Model for Korean data

This script creates a GloVe model to analyze semantic similarities between Korean words. We use Google Colab for the script, which allows us to use GPU instead of CPU.
0 min read

Create Word2Vec Model for English data

This script creates a Word2Vec model to analyze semantic similarities between English words.
0 min read

Create Word2Vec Model for Korean data

This script creates a Word2Vec model to analyze semantic similarities between Korean words. We use Google Colab for the script, which allows us to use GPU instead of CPU.
0 min read

Create a Korean-to-English translator

This script creates a Korean-to-English translator using a Hugging Face resource.
0 min read

Finetune BERT Models for Korean data

This script creates finetuned BERT models to conduct sentiment analysis, summarization, mask prediction, text generation, and question answering. We use Google Colab for the script, which allows us to use GPU instead of CPU.
0 min read

Identify English VP-elliipsis and Gapping instances

This script helps identify VP-ellipsis and Gapping instances in English data in Python using benepar and spaCy.
5 min read

Measure English Proficiency

This script outputs final proficiency z-scores for the English poduction data based on the three measures: (a) morpho-syntactic complexity (verbal density), (b) lexical complexity (Moving-Average Type-Token Ratio), and (c) morphological/syntactic/lexical accuracy (pre-coded by human an...
7 min read

Move Files from One Directory to Another

This script moves multiple files that match a certain criterion from one directory to another.
0 min read

Split CSV to Multiple Text Files

This script splits a large CSV dataset into multiple text files based on the first column. The name of each file comes from the first column and the content of each file will come from all the cells in the second column.
0 min read

Tag English Words for Part-of-Speech

This script marks up each word in a text as corresponding to a particular part of speech.
0 min read

Topic Modeling for English Data

This script builds a semantic network and classifies documents based on topics (using the Latent Dirichlet Allocation (LDA) algorithm). We use Google Colab for the script.
0 min read

Train a GPT2 text generating model for English

This script creates a GPT2 text generating model. We use Google Colab for the script, which allows us to use GPU instead of CPU.
0 min read

Train a Sentence Generating Model for Korean Data

This script creates a finetuned BERT model to generate sentences using the NSMC (Naver Sentiment Movie Corpus) corpus. We use Google Colab for the script, which allows us to use GPU instead of CPU.
0 min read

Use a Pre-trained GloVe Model for English data

This script analyzes semantic similarities between English words by using a pre-trained GloVe Model. We use Google Colab for the script, which allows us to use GPU instead of CPU.
0 min read