I am a thrid-year PhD student at Natural Language Processing Group at The Hong Kong Polytechnic University (PolyU), advised by Prof. Maggie Wenjie Li. Before that, I received my BEng degrees from the School of Computer Science, Wuhan University, in 2022.

My primary focus is on uncovering mechanistic insights to enhance the alignment of Large Language Models (LLMs). Closely related to this, I am also passionate about the mechanistic interpretability of LLMs’ general computational processes (check out Awesome-LLM-Interpretability!). Beyond these core areas, I have broad interests in LLM alignment, improving their reasoning capabilities, and developing more effective interactions between LLMs, humans, and the environment.

🔥 News

2025.06: 🎉 Two papers are acceped by EMNLP 2025!
2025.06: 🎉 Our ACL paper “Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region” are selected as oral presentation (Top 8%)!
2025.05: 🎉 Three papers are accepted by ACL 2025!
2025.05: 🎉 One papers are accepted by ICML 2025!
2024.09: 🎉 Two papers are accepted by EMNLP 2024!
2024.05: 🎉 Two papers are accepted by ACL 2024!

📝 Publications

See full list in

ACL 2025 Oral🌟

Why Safeguarded Ships Run Aground? Aligned Large Language Models’ Safety Mechanisms Tend to Be Anchored in The Template Region
Chak Tou Leong, Qingyu Yin, Jian Wang, Wenjie Li

ICML 2025

Direct Preference Optimization Using Sparse Feature-Level Constraints
Qingyu Yin†, Chak Tou Leong†, Hongbo Zhang, Minjun Zhu, Hanqi Yan, Qiang Zhang, Yulan He, Wenjie Li, Jun Wang, Yue Zhang, Linyi Yang

EMNLP 2024 Findings

E^2CL: Exploration-based Error Correction Learning for Embodied Agents
Hanlin Wang†, Chak Tou Leong†, Jian Wang, Wenjie Li

Preprint

No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks
Chak Tou Leong, Yi Cheng, Kaishuai Xu, Jian Wang, Hanlin Wang, Wenjie Li

EMNLP 2023

Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong†, Yi Cheng†, Jiashuo Wang, Jian Wang, Wenjie Li

EMNLP 2025 TokenSkip: Controllable Chain-of-Thought Compression in LLMs, Heming Xia, Chak Tou Leong, Wenjie Wang, Yongqi Li, Wenjie Li
EMNLP 2025 Expanding before Inferring: Enhancing Factuality in Large Language Models through Premature Layers Interpolation, Dingwei Chen, Ziqiang Liu, Feiteng Fang, Chak Tou Leong, Shiwen Ni, Ahmadreza Argha, Hamid Alinejad-Rokny, Min Yang, Chengming Li
ACL 2025 Findings STeCa: Step-level Trajectory Calibration for LLM Agent Learning, Hanlin Wang, Jian Wang, Chak Tou Leong, Wenjie Li
ACL 2025 Subtle Errors Matter: Preference Learning via Error-injected Self-editing, Kaishuai Xu, Tiezheng Yu, Wenjun Hou, Yi Cheng, Chak Tou Leong, Liangyou Li, Xin Jiang, Lifeng Shang, Qun Liu, Wenjie Li
EMNLP 2024 Findings Deeper Insights Without Updates: The Power of In-Context Learning Over Fine-Tuning, Qingyu Yin, Xuzheng He, Luoao Deng, Chak Tou Leong, Fan Wang, Yanzhao Yan, Xiaoyu Shen, Qiang Zhang
ACL 2024 Instruct Once, Chat Consistently in Multiple Rounds: An Efficient Tuning Framework for Dialogue, Jian Wang, Chak Tou Leong, Jiashuo Wang, Dongding Lin, Wenjie Li, Xiaoyong Wei
ACL 2024 Findings Muffin: Mitigating Unhelpfulness in Emotional Support Conversations with Multifaceted AI Feedback, Jiashuo Wang, Chunpu Xu, Chak Tou Leong, Wenjie Li, Jing Li
ACMMM 2024 SCREEN: A Benchmark for Situated Conversational Recommendation, Dongding Lin, Jian Wang, Chak Tou Leong, Wenjie Li
AAAI 2024 COOPER: Coordinating Specialized Agents towards a Complex Dialogue Goal, Yi Cheng, Wenge Liu, Jian Wang, Chak Tou Leong, Yi Ouyang, Wenjie Li, Xian Wu, Yefeng Zheng

🎖 Honors and Awards

2021.12 Outstanding Prize, Scholarships for Hong Kong, Macau and Overseas Chinese Students (9 students awarded school-wise)

Academic and Professional Activities

Open-Source Projects

Awesome-LLM-Interpretability
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc.

Academic Services

Conference Reviewer: NeuIPS 2024, ICLR 2025, ICML 2025, ARR 2025

Teaching Assistant

COMP 6709: Advanced Natural Language Processing, Spring 2024, 2025 PolyU
COMP 4133: Information Retrieval, Fall 2023 PolyU
COMP 5423: Natural Language Processing, Spring 2023 PolyU