About Me

I am Kaixuan Huang (黄凯旋), a final year Ph.D. student in Electrical and Computer Engineering Department at Princeton University. I am fortunate to be advised by Professor Mengdi Wang. Before that, I received B.S. in Mathematics and B.S. in Computer Science from Peking University. My research is partially supported by Google PHD Fellowship.

Contact me via email kaixuanh@princeton.edu; X Account; wechat: [QR Code].

Research Overview

My research focuses on (1) understanding how language models may fail when inputs are out-of-distribution or adversarial, and (2) improving the capabilities and robustness of language models via scalable human/synthetic data pipelines and algorithmic innovations. My recent topics include:

Robustness of RLHF and RLVR:

  • The alignment of LLMs can be easily broken by exploiting parameter-efficient finetuning (PEFT) methods. We used prefix tuning methods to construct adversarial prefixes that undo the safety alignment of LLMs [1]. We also developed pruning and low-rank adaptation methods to identify and isolate safety-critical regions of LLMs [2].
  • Reasoning models may memorize the problem-solving techniques from the training set and blindly apply them when the input problems are slightly perturbed [3].

Inference-time Algorithms & Agents: Scaling up inference-time compute and augmenting LLMs with tools are principled approaches to boost capabilities and improve robustness.

  • To accelerate high-quality data generation, we developed efficient decoding algorithm [4] and inference-time alignment technique [5], borrowing techniques from traditional RL.
  • Besides handcrafting agents [6], I have been thinking about distilling agents into foundation models and improving them through Reinforcement Learning.

I am also interested in long-term research that can lead to a paradigm shift for the current AI systems.

News

  • 05/2025: I will give a talk at Google DeepMind about MATH-Perturb.
  • 05/2025: MATH-Perturb is accepted in ICML 2025.
  • 11/2024: Thrilled to receieve Google PHD Fellowship 2024.
  • 10/2024: I will give a talk at INFORMS 2024 about CRISPR-GPT.
  • 03/2024: I started my internship at Google DeepMind, working with Zheng Wen and Csaba Szepesvari.

Selected Works

MATH-Perturb: Benchmarking LLMs' Math Reasoning Abilities against Hard Perturbations
ICML 2025 [link] [Website]
Kaixuan Huang, Jiacheng Guo, Zihao Li, Xiang Ji, Jiawei Ge, Wenzhe Li, Yingqing Guo, Tianle Cai, Hui Yuan, Runzhe Wang, Yue Wu, Ming Yin, Shange Tang, Yangsibo Huang, Chi Jin, Xinyun Chen, Chiyuan Zhang, Mengdi Wang


SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
ICML 2024 workshop on Efficient Systems for Foundation Models (ES-FoMo) [link] [Code]
Kaixuan Huang, Xudong Guo, Mengdi Wang


CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments
to appear in Nature Biomedical Engineering [link]
Kaixuan Huang*, Yuanhao Qu*, Henry Cousins, William A. Johnson, Di Yin, Mihir Shah, Denny Zhou, Russ Altman, Mengdi Wang, Le Cong

Embodied LLM Agents Learn to Cooperate in Organized Teams
arXiv preprint [link]
Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, Mengdi Wang

Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
ICML 2024, ICLR Secure and Trustworthy LLMs 2024 (Best Paper) [link] [Code]
Boyi Wei*, Kaixuan Huang*, Yangsibo Huang*, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson

Visual Adversarial Examples Jailbreak Large Language Models
AAAI 2024 (Oral), ICML2023 Adv ML workshop (Oral). [link] [Code]
Xiangyu Qi*, Kaixuan Huang*, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Behaviors
ICLR 2025 [link] [Website]
Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang , Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal

Scaling In-Context Demonstrations with Structured Attention
ICML 2023 Workshop on Efficient Systems for Foundation Models. [link]
Tianle Cai*, Kaixuan Huang*, Jason D. Lee, Mengdi Wang