Publications

Thinking with Generated Images
Ethan Chern+, Zhulin Hu+, Steffi Chern+, Siqi Kou, Jiadi Su, Yan Ma, Zhijie Deng, Pengfei Liu
Preprint. [paper] [github]

BeHonest: Benchmarking Honesty in Large Language Models
Steffi Chern+, Zhulin Hu+, Yuqing Yang+, Ethan Chern, Yuan Guo, Jiahe Jin, Binjie Wang, Pengfei Liu
Preprint. [paper] [github] [website]

Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme Yan Ma, Steffi Chern, Xuyang Shen, Yiran Zhong, Pengfei Liu Preprint. [paper] [github]

Generative AI Act II: Test Time Scaling Drives Cognition Engineering Shijie Xia, Yiwei Qin, Xuefeng Li, Yan Ma, Run-Ze Fan, Steffi Chern, Haoyang Zou, Fan Zhou, Xiangkun Hu, Jiahe Jin, Yanheng He, Yixin Ye, Yixiu Liu, Pengfei Liu Preprint. [paper] [github] [website]

OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang, Zengzhi Wang, Shijie Xia, Xuefeng Li, Haoyang Zou, Ruijie Xu, Run-Ze Fan, Lyumanshan Ye, Ethan Chern, Yixin Ye, Yikai Zhang, Yuqing Yang, Ting Wu, Binjie Wang, Shichao Sun, Yang Xiao, Yiyuan Li, Fan Zhou, Steffi Chern, Yiwei Qin, Yan Ma, Jiadi Su, Yixiu Liu, Yuxiang Zheng, Shaoting Zhang, Dahua Lin, Yu Qiao, Pengfei Liu
Accepted to the NeurIPS D&B Track, 2024 [paper] [github] [website]

Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
Steffi Chern, Ethan Chern, Graham Neubig, Pengfei Liu
Accepted to AAAI 2025 AI4Research (Oral)* [paper] [github]

FacTool: Factuality Detection in Generative AI - A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios
I-Chun Chern, Steffi Chern, Shiqi Chen, Weizhe Yuan, Kehua Feng, Chunting Zhou, Junxian He, Graham Neubig, Pengfei Liu
Accepted to COLM 2025. [paper] [github] [website]

Halu-J: Critique-Based Hallucination Judge
Binjie Wang, Steffi Chern, Ethan Chern, Pengfei Liu
Accepted to the AAAI 2025 Workshop on Preventing and Detecting LLM Misinformation (Oral). [paper]

Align on the Fly: Adapting Chatbot Behavior to Established Norms
Chunpu Xu, Steffi Chern, Ethan Chern, Ge Zhang, Zekun Wang, Ruibo Liu, Jing Li, Jie Fu, Pengfei Liu
Preprint. [paper] [github] [website]

Combating Adversarial Attacks with Multi‑Agent Debate
Steffi Chern *, Zhen Fan *, Andy Liu *
Preprint. [paper]

Voice Direction‑of-Arrival Conversion
I‑Chun Chern, Steffi Chern, Heng‑Cheng Kuo, Huan‑Hsin Tseng, Kuo‑Hsuan Hung, Yu Tsao
Accepted to 33rd IEEE Machine Learning for Signal Processing (MLSP) 2023. [paper]

Automated Analysis of Fluency Behaviors in Aphasia.
Davida Fromm, Steffi Chern, Zihan Geng, Mason Kim, Brian MacWhinney, Joel Greenhouse
Accepted to 52nd Clinical Aphasiology Conference (CAC) for poster, and Journal of Speech, Language, and Hearing Research (JSLHR) for paper. [poster] [paper]