Steffi Chern

🎓 Hi everyone! I’m Steffi, currently an undergraduate senior studying Statistics and Machine Learning at Carnegie Mellon University, seeking opportunities to pursue a Ph.D. degree in NLP/ Computer Science starting Fall 2025.

🧠 My research interests lie at the intersection of natural language processing (NLP) and machine learning (ML). I’m particularly interested in developing reliable methods for aligning and evaluating large language models (LLMs).

🚀 In this rapidly evolving field, I strive to contribute to the advancement of techniques that ensure the trustworthiness and robustness of LLMs.

✨ I’m fortunate to have received an honorable mention for the NSF Graduate Research Fellowship.

đź“© Feel free to contact me about any research/job opportunities or questions you have!

⬇️ Below are some of my recent publications:

BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern, Zhulin Hu, Yuqing Yang, Ethan Chern, Yuan Guo, Jiahe Jin, Binjie Wang, Pengfei Liu
Preprint. [paper] [github] [website]

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
Accepted to Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS D&B Track) [paper] [github] [website]

Halu-J: Critique-Based Hallucination Judge
Binjie Wang, Steffi Chern, Ethan Chern, Pengfei Liu
Preprint.

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
Steffi Chern, Ethan Chern, Graham Neubig, Pengfei Liu
Preprint. [paper] [github]

FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu
Preprint. [paper] [github] [website]

Combating Adversarial Attacks with Multi‑Agent Debate
Steffi Chern *, Zhen Fan *, Andy Liu *
Preprint. [paper]

Align on the Fly: Adapting Chatbot Behavior to Established Norms
Chunpu Xu, Steffi Chern, Ethan Chern, Ge Zhang, Zekun Wang, Ruibo Liu, Jing Li, Jie Fu, Pengfei Liu
Preprint. [paper] [github]

Voice Direction‑of-Arrival Conversion
I‑Chun Chern, Steffi Chern, Heng‑Cheng Kuo, Huan‑Hsin Tseng, Kuo‑Hsuan Hung, Yu Tsao
Accepted to 33rd IEEE Machine Learning for Signal Processing (MLSP) 2023. [paper]

Automated Analysis of Fluency Behaviors in Aphasia.
Davida Fromm, Steffi Chern, Zihan Geng, Mason Kim, Brian MacWhinney, Joel Greenhouse
Accepted to 52nd Clinical Aphasiology Conference (CAC) for poster, and Journal of Speech, Language, and Hearing Research for paper (JSLHR). [poster] [paper]