Hyunji Amy Lee

Ph.D Student at KAIST AI, Incoming Postdoc at UNC NLP

hyunji.amy.lee@kaist.ac.kr

About
Hello! I am a final-year PhD Student at the Language & Knowledge Lab at KAIST, advised by Minjoon Seo and an incoming Postdoc at UNC working with Mohit Bansal. I am interested in building a robust semi-parametric model that aligns with external knowledge modules thereby capitalizing on the strengths of both the model’s parametric knowledge space and external nonparametric knowledge space, aiming to achieve the optimal performance.

I've been focusing on three main research areas:
1. Retrieving the correct knowledge source: generalizable retrieval models robust on out-of-domain (RouterRetriever) and generative retrieval models (GMR, DynamicGR).
2. Aligning knowledge modules with language model: training objective to enhance grounding ability (CINGS), superposing model's parametric space and nonparametric space (Semiparametric Token-Sequence Co-Suprervision), and developing decoding method to effectively utilize nonparametric space (Np Decoding).
3. Understanding model’s parametric knowledge: investigating how models truly ground to given knowledge (Truly Ground), improving grounding performance in complex contexts (CORG), and studying knowledge utilization during pretraining (Knowledge Entropy).
Additionally I've worked on expanding these approaches to other modalities (VLM Safety, ZeroTA).

Previously, I graduated with M.S. at the Language & Knowledge Lab at KAIST, advised by Minjoon Seo, and B.S. from KAIST in computer science and electrical engineering (double major), where I was advised by Edward Choi.

I am currently interning at Microsoft Research with Lucas Caccia, Marc-Alexandre Côté, Eric Xingdi Yuan, and Alessandro Sordoni. I've previously interned at Tencent AI (Winter 2024) with Wenhao Yu, Kaixin Ma, and Hongming Zhang; Adobe Research (Summer 2024) with David Seunghyun Yoon, Franck Dernoncourt, and Trung Bui; AI2 Semantic Scholar Team (Summer 2023) with Kyle Lo and Luca Soldaini. See my Vitæ page for details.
News
Publications

Preprint
2025

Context-Informed Grounding Supervision

Hyunji Lee, Seunghyun Yoon, Yunjae Won, Hanseok Oh, Geewook Kim, Trung Bui, Franck Dernoncourt, Elias Stengel-Eskin, Mohit Bansal, Minjoon Seo

Preprint

2023

Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders

Hyunji Lee, Luca Soldani, Arman Cohan, Minjoon Seo, Kyle Lo

Preprint

Zero-Shot Dense Video Captioning by Jointly Optimizing Text and Moment

Yongrae Jo, Seongyun Lee, Aiden SJ Lee, Hyunji Lee, Hanseok Oh, Minjoon Seo

Preprint

Peer-Reviewed
2025

COrg: Generating Answers from Complex, Interrelated Contexts

Hyunji Lee, Franck Dernoncourt, Trung Bui, Seunghyun Yoon

NAACL 2025

How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?

Seongyun Lee*, Geewook Kim*, Jiyeon Kim*, Hyunji Lee, Hoyeon Chang, Sue Hyun Park, Minjoon Seo

ICLR 2025

Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisition

Jiyeon Kim*, Hyunji Lee*, Hyowon Cho, Joel Jang, Hyeonbin Hwang, Seungpil Won, Youbin Ahn, Dohaeng Lee, Minjoon Seo

ICLR 2025 Oral

Best Paper, Towards Knowledgeable Foundation Models @AAAI 2025 Workshop

RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models

Hyunji Lee, Luca Soldani, Arman Cohan, Minjoon Seo, Kyle Lo

AAAI 2025

2024

Exploring the practicality of generative retrieval on dynamic corpora

Chaeeun Kim*, Soyoung Yoon*, Hyunji Lee, Joel Jang, Minjoon Seo

EMNLP 2024

Semiparametric Token-Sequence Co-Supervision

Hyunji Lee*, Doyoung Kim*, Jihoon Jun, Sejune Joo, Joel Jang, Kyung-Woon On, Minjoon Seo

ACL 2024

InstructIR: A Benchmark for Instruction Following of Information Retrieval Models

Hanseok Oh, Hyunji Lee, Seonghyeon Ye, Haebin Shin, Hansol Jang, Changwook Jun, Minjoon Seo

ACL KnowledgeNLP Workshop 2024 Oral

Improving Probability-based Prompt Selection Through Unified Evaluation and Analysis

Sohee Yang, Jonghyeon Kim, Joel Jang, Seonghyeon Ye, Hyunji Lee, Minjoon Seo

TACL 2024

How Well Do Large Language Models Truly Ground?

Hyunji Lee*, Sejune Joo*, Chaeeun Kim, Joel Jang, Doyoung Kim, Kyung-Woon On, Minjoon Seo

NAACL 2024 Oral

Best Student Paper, Knowledge-Augmented NLP Workshop @ACL 2024

KTRL+F: Knowledge-Augmented In-Document Search

Hanseok Oh*, Haebin Shin*, Miyoung Ko, Hyunji Lee, Minjoon Seo

NAACL 2024

2023

Local 3D Editing via 3D Distillation of CLIP Knowledge

Junha Hyung, Sungwon Hwang, Daejin Kim, Hyunji Lee, Jaegul Choo

CVPR 2023

Nonparametric Decoding for Generative Retrieval

Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vlad Karpukhin, Yi Lu, Minjoon Seo

Findings of ACL 2023

2022

Generative Multi-hop Retrieval

Hyunji Lee, Sohee Yang, Hanseok Oh, Minjoon Seo

EMNLP 2022 Oral

2021

Cost-effective End-to-end Information Extraction for Semi-structured Document Images

Wonseok Hwang, Hyunji Lee, Jinyeong Yim, Geewook Kim, Minjoon seo

EMNLP 2021 (short)