My name is Xiaogeng Liu, currently a second-year Ph.D. student in Information Science at the University of Wisconsin-Madison. I am honored to conduct my research under the esteemed guidance of Professor Chaowei Xiao, who specializes in security, privacy, and machine learning, with the goal of building socially responsible machine learning systems. I obtained my Master’s degree from Huazhong University of Science and Technology in 2023, and was fortunate to be a member of TAI group, mentored by Professor Shengshan Hu. I am honored to be awarded the NVIDIA 2025-2026 Graduate Fellowship.

My research interests lie in trustworthy AI, especially the robustness of machine learning models that emphasizes the model’s ability to maintain performance and resist any kind of attacks or unexpected inputs. I have published several papers at the top international AI conferences with total google scholar . I am always open to collaboration and the exchange of ideas. If you’d like to discuss potential research opportunities or simply connect, please don’t hesitate to reach out to me at xiaogeng.liu@wisc.edu

🔥 News

(* represents equal contribution)

More

💥 Preprints

Preprint
sym

AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs

Xiaogeng Liu*, Peiran Li*, Edward Suh, Yevgeniy Vorobeychik, Zhuoqing Mao, Somesh Jha, Patrick McDaniel, Huan Sun, Bo Li, Chaowei Xiao

Project Page |

  • We propose AutoDAN-Turbo, a black-box jailbreak method that can automatically discover as many jailbreak strategies as possible from scratch, without any human intervention or predefined scopes (e.g., specified candidate strategies), and use them for red-teaming. Notably, AutoDAN-Turbo achieves an 88.5 attack success rate on GPT-4-1106-turbo. In addition, AutoDAN-Turbo is a unified framework that can incorporate existing human-designed jailbreak strategies in a plug-and-play manner. By integrating human-designed strategies, AutoDAN-Turbo can even achieve a higher attack success rate of 93.4 on GPT-4-1106-turbo.
  • It is the strongest jailbreak attack in Harmbench.
Preprint
sym

InjecGuard: Benchmarking and Mitigating Over-defense in Prompt Injection Guardrail Models

Hao Li*, Xiaogeng Liu*, Chaowei Xiao

Project Page | Dataset

  • We propose InjecGuard, a lightweight model designed to defend against prompt injection attacks. It delivers strong performance across benign, malicious, and over-defense accuracy metrics, surpassing existing guard models such as PromptGuard, ProtectAIv2, and LakeraAI. Despite its compact size, with model parameters of only 184MB, InjecGuard achieves competitive performance comparable to advanced commercial large language models like GPT-4.
  • We also introduce NotInject, an evaluation dataset that systematically measures over-defense across various prompt guard models.

📝 Selected Publications

ICLR 2024
sym

AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models

Xiaogeng Liu, Nan Xu, Muhao Chen, Chaowei Xiao

Project Page |

  • This pioneering work focuses on the adversarial robustness of the safety alignment of LLMs, and introduces AutoDAN, a novel hierarchical genetic algorithm that automatically generates stealthy jailbreak prompts for LLMs, preserving semantic meaningfulness while bypassing existing defenses like perplexity detection.
  • It is one of the strongest jailbreak attacks in public benchmarks (Harmbench, Easyjailbreak).
USENIX Security 2024 Distinguished Paper Award
sym

Don’t Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models

Zhiyuan Yu, Xiaogeng Liu, Shunning Liang, Zach Cameron, Chaowei Xiao, Ning Zhang

Project Page

  • This work is a comprehensive systematization of jailbreak prompts in LLMs, categorizing them into five types and analyzing their effectiveness based on 448 prompts collected from online forums.
  • We also introduce a human-AI cooperative framework for automating jailbreak prompt generation, achieving success in transforming 766 failed prompts into harmful outputs, demonstrating the feasibility of automating such attacks.
ECCV 2024
sym

AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting

Yu Wang*, Xiaogeng Liu*, Yu Li, Muhao Chen, Chaowei Xiao

Project Page

  • This paper presents AdaShield, a novel adaptive defense mechanism designed to safeguard MLLMs against structure-based jailbreak attacks by using defense prompts without requiring fine-tuning or additional training.
  • AdaShield achieves state-of-the-art performance, significantly improving defense robustness across multiple MLLMs while maintaining general performance on benign tasks, through its adaptive auto-refinement framework that customizes defense prompts to various attack scenarios.
COLM 2024
sym

JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks

Weidi Luo*, Siyuan Ma*, Xiaogeng Liu*, Xiaoyu Guo, Chaowei Xiao

Project Page

  • This work introduces JailBreakV-28K, a comprehensive benchmark for evaluating the robustness of MLLMs against both text-based and image-based jailbreak attacks, and RedTeam-2K, a dataset of 2,000 malicious queries covering 16 safety policies aimed at testing the vulnerabilities of LLMs and MLLMs.
  • The benchmark highlights the transferability of jailbreak techniques from LLMs to MLLMs, revealing significant vulnerabilities in MLLMs’ ability to handle malicious inputs across text and visual modalities.
CVPR 2023
sym

Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency

Xiaogeng Liu, Minghui Li, Haoyu Wang, Shengshan Hu, Dengpan Ye, Hai Jin, Libing Wu, Chaowei Xiao

Project Page

  • This paper introduces TeCo, a novel test-time trigger sample detection method that leverages the anomaly in corruption robustness consistency between clean and trigger samples, requiring only hard-label outputs and no additional data or assumptions.
  • TeCo significantly outperforms state-of-the-art methods on various backdoor attacks and benchmarks, improving the AUROC by 10% and achieving 5 times the stability of existing methods.

🎖 Honors and Awards

  • 2024.12 NVIDIA 2025-2026 Graduate Fellowship
  • 2024.08 Distinguished Paper Award in 33rd USENIX Security Symposium (USENIX Security’24)
  • 2022.10 Chinese National Scholarship (Top 1%)

📖 Educations

  • 2023.09 - present, Ph.D in Information Science, University of Wisconsin-Madison, Madison, Wisconsin, USA.
  • 2020.09 - 2023.06, Master in Cybersecurity, Huazhong University of Science and Technology, Wuhan, Hubei, China.