Publications

BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern, Zhulin Hu, Yuqing Yang, Ethan Chern, Yuan Guo, Jiahe Jin, Binjie Wang, Pengfei Liu
Preprint. [paper] [github] [website]

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
Accepted to the Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS D&B Track) [paper] [github] [website]

Halu-J: Critique-Based Hallucination Judge
Binjie Wang, Steffi Chern, Ethan Chern, Pengfei Liu
Preprint.

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
Steffi Chern, Ethan Chern, Graham Neubig, Pengfei Liu
Accepted to the Proceedings of Communications in Computer and Information Science (CCIS), AAAI 2025 AI4Research (Oral) [paper] [github]

FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu
Preprint. [paper] [github] [website]

Align on the Fly: Adapting Chatbot Behavior to Established Norms
Chunpu Xu, Steffi Chern, Ethan Chern, Ge Zhang, Zekun Wang, Ruibo Liu, Jing Li, Jie Fu, Pengfei Liu
Preprint. [paper] [github] [website]

Combating Adversarial Attacks with Multi‑Agent Debate
Steffi Chern *, Zhen Fan *, Andy Liu *
Preprint. [paper]

Voice Direction‑of-Arrival Conversion
I‑Chun Chern, Steffi Chern, Heng‑Cheng Kuo, Huan‑Hsin Tseng, Kuo‑Hsuan Hung, Yu Tsao
Accepted to 33rd IEEE Machine Learning for Signal Processing (MLSP) 2023. [paper]

Automated Analysis of Fluency Behaviors in Aphasia.
Davida Fromm, Steffi Chern, Zihan Geng, Mason Kim, Brian MacWhinney, Joel Greenhouse
Accepted to 52nd Clinical Aphasiology Conference (CAC) for poster, and Journal of Speech, Language, and Hearing Research for paper (JSLHR). [poster] [paper]