About me

Hi, my name is Eugene. I am a Ph.D. student at Northeastern University, advised by Terra Blevins. I’m interested in the many challenges involved with Multilingual NLP.

Right now I’m especially looking at:

Tokenization: Many model issues can be traced to the tokenizer, a product of many design choices that impact the model in subtle ways. Can we create both better tokenizers and model-tokenizer interactions?

Pre-Ph.D., I worked on AI for security as a research scientist at S2W Inc. I received my Bachelor’s and Master’s degrees at KAIST. You can find my CV here.

Recent News

2025 Nov - Had an absolute blast at EMNLP 2025. Very happy that many people found our work interesting (Github).

2025 Sep - I am now in Boston, starting my Ph.D. at Northeastern!

2025 Aug - Our paper on Byte-level BPE tokenizer vulnerabilities has been accepted to EMNLP 2025!

2024 Nov - Our paper on drug jargon detection was accepted to KDD 2025!

2024 Oct - New paper on Byte-level BPE tokenizer vulnerabilities is now on arxiv!

2024 Sep - Our paper on security event detection from Tweets was accepted to NDSS 2025!

2024 Jul - Was interviewed by KBS on the topic of ChatGPT jailbreaks. My first major TV appearance!

Trivia

My cat’s name is Squash. He has a stepbrother, Pumpkin.
I used to write articles for the school’s English newspaper, usually complaining about something in the Society section.
I enjoy playing electric guitar, but like music that’s too technical for my own good. I eagerly await an AI system that can transcribe very fast solos from songs.

Eugene Jang

Recent News

Trivia