Xin Liu

劉昕

Senior Applied Scientist

Store Foundational AI, Amazon

 

Office: SFO22, Palo Alto, CA 94301

Email: xliucr [at] amazon [dot] com; seanliu96 [at] outlook [dot] com

 

Bio

I am a Senior Applied Scientist at Store Foundational AI - Rufus, Amazon, where I work on end-to-end foundation model development for generative AI shopping — spanning pretraining, midtraining, post-training, and agentic reinforcement learning. I led data recipes and curriculum strategies for in-house MoE model pretraining and midtraining in 2024--2025, and served as the primary tech lead for instruct, reasoning, and agent model releases in 2023--2026. My current research focuses on efficient RL & agentic RL, including on-policy data augmentation, multi-step tool orchestration, and multimodal agents.

I obtained my Ph.D. in Computer Science from the Hong Kong University of Science and Technology in July 2023, advised by Prof. Yangqiu Song, and my B.E. in Computer Science from Sun Yat-sen University in June 2018, advised by Prof. Rong Pan.

My research interests lie at the intersection of large language models and agentic AI. I am broadly interested in how models can be trained to reason, plan, and act over long horizons — from efficient RL training algorithms and reward design to scalable data synthesis and multi-step tool use. Previously, I worked on graph representation learning, knowledge graphs, commonsense reasoning, and session-based recommendation.

Working Experience

Senior Applied Scientist Fulltime
August 2023 - Now

Store Foundational AI

Foundational Models for E-commerce

Managers: Qingyu Yin

Show details

Pre-training & Mid-training: Babysitted and delivered two generations of Rufus MoE LLMs from scratch: latest ultra-sparse MoE (3.25% active parameters) matches or exceeds DeepSeek-V3-Base and Kimi-K2-Base on MMLU, MMLU-Pro, Math, and internal shopping benchmarks. Roadmapped data mixture and curriculum strategy across pre-training and mid-training [2024-2025]. Pioneered agentic mid-training with Hephaestus [NAACL’25], a 103B-token corpus of 76k+ APIs from API documents and avaliable tool trajectories for mid-training, outperforming open-source base checkpoints at the similar scale and boosting following post-training.

Post-training & Model Release: Led the initial release of four in-house instruct/reasoning/agent models (Dense, MoE, ultra-sparse MoE), improving performance in retrieval augmentation, instruction following, consistency [2023-2024], reasoning, helpfulness, and shopping agent [2025-2026] through systematic exploration of data recipes and training signals (reject sampling, verifiable rewards, reward models, etc.). Built HeaPA [ArXiv: 2601.22448], a difficulty-aware heap sampling and on-policy query augmentation framework for efficient RLVR, where heap sampling provides a more flexible difficulty schedule and query augmentation ensures the diversity of queries and difficulty managed by a tree-based reward aggregation.

Agentic Systems & Reinforcement Learning: Architected the first agentic multimodal shopping environment with visual search and image generation/editing tools for end-to-end RL, with the support of context management [2026]. Designed multi-step tool orchestration rewards after analyzing compositional patterns [ArXiv: 2603.24709], verified in a 100k+ real-API cache environment and achieved 19.9% turn accuracy and 34.2% call accuracy gains on ComplexFuncBench. Worked on DeepPlanner [Findings of ACL’26] for deep research agents via advantage shaping and upweighting (10× fewer training queries than prior SOTA).

Evaluation & Prompting Infrastructure: Designed the first Rufus shopping prompting system using a finite-state machine (A/B testing, routing, task planning, multilingual support) [2023–2024]. Built MultiTurnInstruct [EMNLP’25], a 1.1K-sample multi-turn benchmark across nine instruction-following categories to stress-test LLMs on entangled and conflicting instructions in complex conversations. Worked on several in-house shopping benchmarks for instruction following, consistency, helpfulness, product search agent, and multimodal agent.

Training Infrastructure: Directed mid-training scaling law package to identify optimal data mixture and curriculum schedules [2024–2025]. Extended pre-training and post-training infra NeMo, NeMo-Aligner, verl, and slime with data loading & processing, curriculum, reward services, and agent environment [2023–2026].

Research Intern Internship
June 2022 - Dec. 2022

Search Query Understanding, Amazon Search (A9)

Commonsense Knowedge Graph and Pattern Mining for E-commerce

Mentors: Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao

Show details

Pioneered graph learning for natural language understanding and parsing – implemented the most effective solution to mine important user intent patterns and parse millions of action-item-intention triples to construct commonsense knowledge graphs [ACL’2023] and built the commonsense knowledge graph at Amazon (COSMO) to improve ranking relevance and recommendation quality in Amazon Search and Navigation [SIGMOD’2024].

Architected session-based recommendation solutions by integrating graph learning – developed the state-of-the-art solution using pattern mining algorithms and memory augmentation to significantly enhance the item-item collaborative filtering graph [NeurIPS’2023].

Research Intern Internship
June 2017 - Dec. 2017

Cloud & Mobile, Microsoft Research Asia

Distributed Graph Database

Mentor: Liang Jeff Chen

Show details

Collaborated on the development of a new graph database upon relational databases and non-relational databases, contributing to the open-source project GraphView.

Focused on translation, compilation, and optimization, leading to the integration of GraphView as a key component of Microsoft Azure.

Publications

Most recent publications on Google Scholar.

Learning To Optimize Multi-Objective Alignment Through Dynamic Reward Weighting ArXiv Code

Yining Lu, Zilong Wang, Shiyang Li, Xin Liu, Changlong Yu, Qingyu Yin, Zhan Shi, Zixuan Zhang, Meng Jiang

Transactions of the Association for Computational Linguistics, 2026

WebCoach: Self-Evolving Web Agents With Cross-session Memory Guidance ArXiv

Genglin Liu, Shijie Geng, Sha Li, Hejie Cui, Sarah Zhang, Xin Liu, Tianyi Liu

ICLR MemAgents Workshop, 2026

HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning ArXiv Code

Weiqi Wang, Xin Liu, Binxuan Huang, Hejie Cui, Rongzhi Zhang, Changlong Yu, Shuowei Jin, Jingfeng Yang, Qingyu Yin, Zhengyang Wang, others

ArXiv preprint, 2026

Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards ArXiv Code

Jiayang Cheng, Xin Liu, Zhihan Zhang, Haoyang Wen, Zixuan Zhang, Qingyu Yin, Shiyang Li, Priyanka Nigam, Bing Yin, Chao Zhang, Yangqiu Song

ArXiv preprint, 2026

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping ArXiv Code

Wei Fan, Wenlin Yao, Zheng Li, Feng Yao, Xin Liu, Liang Qiu, Qingyu Yin, Yangqiu Song, Bing Yin

Findings of ACL, 2026

POPI: Personalizing LLMs via Optimized Natural Language Preference Inference ArXiv

Yizhuo Chen, Xin Liu, Ruijie Wang, Zheng Li, Pei Chen, Changlong Yu, Priyanka Nigam, Meng Jiang, Bing Yin

ArXiv preprint, 2025

Think-RM: Enabling long-horizon reasoning in generative reward models ArXiv

Ilgee Hong, Changlong Yu, Liang Qiu, Weixiang Yan, Zhenghao Xu, Haoming Jiang, Qingru Zhang, Qin Lu, Xin Liu, Chao Zhang, others

NeurIPS, 2025

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training ArXiv

Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, others

NAACL, 2025

Uniconv: Unifying Retrieval And Response Generation For Large Language Models In Conversations ArXiv

Fengran Mo, Yifan Gao, Chuan Meng, Xin Liu, Zhuofeng Wu, Kelong Mao, Zhengyang Wang, Pei Chen, Zheng Li, Xian Li, others

ACL, 2025

EcomScriptBench: A multi-task benchmark for e-commerce script planning via step-wise intention-driven product association ArXiv

Weiqi Wang, Limeng Cui, Xin Liu, Sreyashi Nag, Wenju Xu, Chen Luo, Sheikh Muhammad Sarwar, Yang Li, Hansu Gu, Hui Liu, others

ACL, 2025

IHEval: Evaluating language models on following the instruction hierarchy ArXiv Code

Zhihan Zhang, Shiyang Li, Zixuan Zhang, Xin Liu, Haoming Jiang, Xianfeng Tang, Yifan Gao, Zheng Li, Haodong Wang, Zhaoxuan Tan, others

NAACL, 2025

Can Language Models Follow Multiple Turns of Entangled Instructions? ArXiv Code

Chi Han, Xin Liu, Haodong Wang, Shiyang Li, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Qingyu Yin, Liang Qiu, Changlong Yu, others

Findings of EMNLP, 2025

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding ArXiv

Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song

Findings of EMNLP, 2024

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models ArXiv

Haoran Li, Dadi Guo, Donghao Li, Wei Fan, Qi Hu, Xin Liu, Chunkit Chan, Duanyi Yao, Yangqiu Song

ACL, 2024

COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at Amazon

Changlong Yu*, Xin Liu*, Jefferson Maia, Tianyu Cao, Laurence (Yang) Li, Yifan Gao, Yangqiu Song, Rahul Goutam, Haiyang Zhang, Bing Yin, Zheng Li*

SIGMOD, 2024

Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns PDF Code

Xin Liu, Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, Yangqiu Song

NeurIPS, 2023

FolkScope: Intention Knowledge Graph Construction for E-commerce Commonsense Discovery ArXiv Code

Changlong Yu, Weiqi Wang, Xin Liu, Jiaxin Bai, Yangqiu Song, Zheng Li, Yifan Gao, Tianyu Cao, Bing Yin

Findings of ACL, 2023

ASER: Towards Large-scale Commonsense Knowledge Acquisition via Higher-order Selectional Preference over Eventualities ArXiv Code

Hongming Zhang*, Xin Liu*, Haojie Pan*, Haowen Ke, Jiefu Ou, Tianqing Fang, Yangqiu Song

Artificial Intelligence, 2022

Boosting Graph Structure Learning with Dummy Nodes ArXiv Code

Xin Liu, Jiayang Cheng, Yangqiu Song, Xin Jiang

ICML, 2022

Neural Subgraph Isomorphism Counting ArXiv Code

Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang

SIGKDD, 2020

Learning To Optimize Multi-Objective Alignment Through Dynamic Reward Weighting ArXiv Code

Yining Lu, Zilong Wang, Shiyang Li, Xin Liu, Changlong Yu, Qingyu Yin, Zhan Shi, Zixuan Zhang, Meng Jiang

Transactions of the Association for Computational Linguistics, 2026

Leveraging Historical Information To Boost Retrieval-Augmented Generation In Conversations PDF

Fengran Mo, Yifan Gao, Zhuofeng Wu, Xin Liu, Pei Chen, Zheng Li, Zhengyang Wang, Xian Li, Meng Jiang, Jian-Yun Nie

Information Processing & Management, 2026

WebCoach: Self-Evolving Web Agents With Cross-session Memory Guidance ArXiv

Genglin Liu, Shijie Geng, Sha Li, Hejie Cui, Sarah Zhang, Xin Liu, Tianyi Liu

ICLR MemAgents Workshop, 2026

HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning ArXiv Code

Weiqi Wang, Xin Liu, Binxuan Huang, Hejie Cui, Rongzhi Zhang, Changlong Yu, Shuowei Jin, Jingfeng Yang, Qingyu Yin, Zhengyang Wang, others

ArXiv preprint, 2026

Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards ArXiv Code

Jiayang Cheng, Xin Liu, Zhihan Zhang, Haoyang Wen, Zixuan Zhang, Qingyu Yin, Shiyang Li, Priyanka Nigam, Bing Yin, Chao Zhang, Yangqiu Song

ArXiv preprint, 2026

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping ArXiv Code

Wei Fan, Wenlin Yao, Zheng Li, Feng Yao, Xin Liu, Liang Qiu, Qingyu Yin, Yangqiu Song, Bing Yin

Findings of ACL, 2026

Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning ArXiv

Fengran Mo, Yifan Gao, Sha Li, Hansi Zeng, Xin Liu, Zhaoxuan Tan, Xian Li, Jianshu Chen, Dakuo Wang, Meng Jiang

Findings of ACL, 2026

SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding ArXiv Code

Yuqi Yang, Weiqi Wang, Baixuan Xu, Wei Fan, Qing Zong, Chunkit Chan, Zheye Deng, Xin Liu, Yifan Gao, Changlong Yu, others

Findings of ACL, 2026

Teach Diffusion Language Models to Learn from Their Own Mistakes ArXiv

Liming Liu, Binxuan Huang, Xin Liu, Bing Yin, Tuo Zhao

ArXiv preprint, 2026

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control ArXiv Code

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, others

ArXiv preprint, 2026

POPI: Personalizing LLMs via Optimized Natural Language Preference Inference ArXiv

Yizhuo Chen, Xin Liu, Ruijie Wang, Zheng Li, Pei Chen, Changlong Yu, Priyanka Nigam, Meng Jiang, Bing Yin

ArXiv preprint, 2025

Think-RM: Enabling long-horizon reasoning in generative reward models ArXiv

Ilgee Hong, Changlong Yu, Liang Qiu, Weixiang Yan, Zhenghao Xu, Haoming Jiang, Qingru Zhang, Qin Lu, Xin Liu, Chao Zhang, others

NeurIPS, 2025

Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data ArXiv Code

Siqi Guo, Ilgee Hong, Vicente Balmaseda, Changlong Yu, Liang Qiu, Xin Liu, Haoming Jiang, Tuo Zhao, Tianbao Yang

ICML, 2025

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training ArXiv

Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, others

NAACL, 2025

Uniconv: Unifying Retrieval And Response Generation For Large Language Models In Conversations ArXiv

Fengran Mo, Yifan Gao, Chuan Meng, Xin Liu, Zhuofeng Wu, Kelong Mao, Zhengyang Wang, Pei Chen, Zheng Li, Xian Li, others

ACL, 2025

EcomScriptBench: A multi-task benchmark for e-commerce script planning via step-wise intention-driven product association ArXiv

Weiqi Wang, Limeng Cui, Xin Liu, Sreyashi Nag, Wenju Xu, Chen Luo, Sheikh Muhammad Sarwar, Yang Li, Hansu Gu, Hui Liu, others

ACL, 2025

IHEval: Evaluating language models on following the instruction hierarchy ArXiv Code

Zhihan Zhang, Shiyang Li, Zixuan Zhang, Xin Liu, Haoming Jiang, Xianfeng Tang, Yifan Gao, Zheng Li, Haodong Wang, Zhaoxuan Tan, others

NAACL, 2025

Can Language Models Follow Multiple Turns of Entangled Instructions? ArXiv Code

Chi Han, Xin Liu, Haodong Wang, Shiyang Li, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Qingyu Yin, Liang Qiu, Changlong Yu, others

Findings of EMNLP, 2025

DrAgent: Empowering Large Language Models as Medical Agents for Multi-hop Medical Reasoning PDF

Fenglin Liu, Zheng Li, Hongjian Zhou, Qingyu Yin, Jingfeng Yang, Xin Liu, Zhengyang Wang, Xianfeng Tang, Shiyang Li, Xiang He, others

Findings of EMNLP, 2025

InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding ArXiv Code

Jiayang Cheng, Qianqian Zhuang, Haoran Li, Chunkit Chan, Xin Liu, Lin Qiu, Yangqiu Song

Findings of EMNLP, 2025

Towards Subgraph Isomorphism Counting with Graph Kernels ArXiv

Xin Liu, Weiqi Wang, Jiaxin Bai, Yangqiu Song

ArXiv preprint, 2024

On the role of entity and event level conceptualization in generalizable reasoning: A survey of tasks, methods, applications, and future directions ArXiv

Weiqi Wang, Tianqing Fang, Haochen Shi, Baixuan Xu, Wenxuan Ding, Liyu Zhang, Wei Fan, Jiaxin Bai, Haoran Li, Xin Liu, others

Findings of EMNLP, 2025

MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding ArXiv

Baixuan Xu*, Weiqi Wang*, Haochen Shi, Wenxuan Ding, Huihao JING, Tianqing Fang, Jiaxin Bai, Xin Liu, Changlong Yu, Zheng Li, Chen Luo, Qingyu Yin, Bing Yin, Long Chen, Yangqiu Song

EMNLP, 2024

IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce ArXiv

Wenxuan Ding*, Weiqi Wang*, Sze Heng Douglas Kwok, Minghao LIU, Tianqing Fang, Jiaxin Bai, Xin Liu, Changlong Yu, Zheng Li, Chen Luo, Qingyu Yin, Bing Yin, Junxian He, Yangqiu Song

Findings of EMNLP, 2024

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding ArXiv

Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song

Findings of EMNLP, 2024

Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation ArXiv

Jiaxin Bai, Yicheng Wang, Tianshi Zheng, Yue Guo, Xin Liu, Yangqiu Song

ACL, 2024

AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation ArXiv Code

Zhaowei Wang, Wei Fan, Qing Zong, Hongming Zhang, Sehyun Choi, Tianqing Fang, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

ACL, 2024

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models ArXiv

Haoran Li, Dadi Guo, Donghao Li, Wei Fan, Qi Hu, Xin Liu, Chunkit Chan, Duanyi Yao, Yangqiu Song

ACL, 2024

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning ArXiv Code

Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Jiayang Cheng, Chunkit Chan, Yangqiu Song

ACL, 2024

AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph ArXiv Code

Zhaowei Wang, Haochen Shi, Weiqi Wang, Tianqing Fang, Hongming Zhang, Sehyun Choi, Xin Liu, Yangqiu Song

Findings of NAACL, 2024

EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs ArXiv Code

Cheng Jiayang, Lin Qiu, Chunkit Chan, Xin Liu, Yangqiu Song, Zheng Zhang

COLING, 2024

ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations ArXiv

Chunkit Chan, Jiayang Cheng, Weiqi Wang, Yuxin Jiang, Tianqing Fang, Xin Liu, Yangqiu Song

Findings of EACL, 2024

COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at Amazon

Changlong Yu*, Xin Liu*, Jefferson Maia, Tianyu Cao, Laurence (Yang) Li, Yifan Gao, Yangqiu Song, Rahul Goutam, Haiyang Zhang, Bing Yin, Zheng Li*

SIGMOD, 2024

🚗CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering ArXiv Code

Weiqi Wang*, Tianqing Fang*, Wenxuan Ding, Baixuan Xu, Xin Liu, Yangqiu Song, Antoine Bosselut

Findings of EMNLP, 2023

QaDynamics: Training Dynamics-Driven Synthetic QA Diagnostic for Zero-Shot Commonsense Question Answering ArXiv Code

Haochen Shi*, Weiqi Wang*, Tianqing Fang, Baixuan Xu, Wenxuan Ding, Xin Liu, Yangqiu Song

Findings of EMNLP, 2023

Gold: A Global and Local-aware Denoising Framework for Commonsense Knowledge Graph Noise Detection ArXiv Code

Zheye Deng, Weiqi Wang, Zhaowei Wang, Xin Liu, Yangqiu Song

Findings of EMNLP, 2023

Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns PDF Code

Xin Liu, Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, Yangqiu Song

NeurIPS, 2023

Complex Eventuality Reasoning with Implicit Logical Constraints ArXiv Code

Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song

NeurIPS, 2023

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting ArXiv

Hejie Cui, Xinyu Fang, Zihan Zhang, Ran Xu, Xuan Kan, Xin Liu, Yue Yu, Manling Li, Yangqiu Song, Carl Yang

NeurIPS, 2023

Spatio-Temporal Contrastive Learning-enhanced GNNs for Session-based Recommendation ArXiv Code

Zhongwei Wan, Xin Liu, Benyou Wang, Jiezhong Qiu, Boyu Li, Ting Guo, Guangyong Chen, Yang Wang

ACM Transactions on Information Systems, 2023

Self-Consistency Narrative Prompts on Abductive Natural Language Inference ArXiv Code

Chunkit Chan, Xin Liu, Tsz Ho Chan, Jiayang Cheng, Yangqiu Song, Ginny Y. Wong, Simon See

AACL, 2023

DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition ArXiv Code

Chunkit Chan*, Xin Liu*, Jiayang Cheng, Zihan Li, Yangqiu Song, Ginny Y. Wong, Simon See

Findings of ACL, 2023

FolkScope: Intention Knowledge Graph Construction for E-commerce Commonsense Discovery ArXiv Code

Changlong Yu, Weiqi Wang, Xin Liu, Jiaxin Bai, Yangqiu Song, Zheng Li, Yifan Gao, Tianyu Cao, Bing Yin

Findings of ACL, 2023

ASER: Towards Large-scale Commonsense Knowledge Acquisition via Higher-order Selectional Preference over Eventualities ArXiv Code

Hongming Zhang*, Xin Liu*, Haojie Pan*, Haowen Ke, Jiefu Ou, Tianqing Fang, Yangqiu Song

Artificial Intelligence, 2022

Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier Transform ArXiv Code

Huiru Xiao, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

EMNLP, 2022

Boosting Graph Structure Learning with Dummy Nodes ArXiv Code

Xin Liu, Jiayang Cheng, Yangqiu Song, Xin Jiang

ICML, 2022

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching ArXiv Code

Xin Liu, Yangqiu Song

AAAI, 2022

Exploring Discourse Structures for Argument Impact Classification ArXiv Code

Xin Liu, Jiefu Ou, Yangqiu Song, Xin Jiang

ACL, 2021

On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification ArXiv Code

Xin Liu, Jiefu Ou, Yangqiu Song, Xin Jiang

IJCAI, 2020

Neural Subgraph Isomorphism Counting ArXiv Code

Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang

SIGKDD, 2020

ASER: A Large-scale Eventuality Knowledge Graph ArXiv Code

Hongming Zhang*, Xin Liu*, Haojie Pan*, Yangqiu Song, Cane Wing-Ki Leung

WWW, 2020

Hyper-Path-Based Representation Learning for Hyper-Networks ArXiv Code

Jie Huang, Xin Liu, Yangqiu Song

CIKM, 2019

Relation Discovery with Out-of-Relation Knowledge Base as Supervision ArXiv Code

Yan Liang, Xin Liu, Jianwen Zhang, Yangqiu Song

NAACL, 2019

A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification ArXiv Code

Ziqian Zeng, Wenxuan Zhou, Xin Liu, Yangqiu Song

NAACL, 2019

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification ArXiv Code

Huiru Xiao, Xin Liu, Yangqiu Song

WWW, 2019

Biased RandomWalk based Social Regularization for Word Embeddings PDF Code

Ziqian Zeng*, Xin Liu*, Yangqiu Song

IJCAI, 2018

Beyond Test-Time Training: Learning to Reason via Hardware-Efficient Optimal Control ArXiv Code

Peihao Wang, Shan Yang, Xijun Wang, Tesi Xiao, Xin Liu, Changlong Yu, Yu Lou, Pan Li, Zhangyang Wang, Ming Lin, others

ArXiv preprint, 2026

Towards Subgraph Isomorphism Counting with Graph Kernels ArXiv

Xin Liu, Weiqi Wang, Jiaxin Bai, Yangqiu Song

ArXiv preprint, 2024

Advancing Abductive Reasoning in Knowledge Graphs through Complex Logical Hypothesis Generation ArXiv

Jiaxin Bai, Yicheng Wang, Tianshi Zheng, Yue Guo, Xin Liu, Yangqiu Song

ACL, 2024

Enhancing User Intent Capture in Session-Based Recommendation with Attribute Patterns PDF Code

Xin Liu, Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao, Zhengyang Wang, Bing Yin, Yangqiu Song

NeurIPS, 2023

Complex Eventuality Reasoning with Implicit Logical Constraints ArXiv Code

Jiaxin Bai, Xin Liu, Weiqi Wang, Chen Luo, Yangqiu Song

NeurIPS, 2023

Spatio-Temporal Contrastive Learning-enhanced GNNs for Session-based Recommendation ArXiv Code

Zhongwei Wan, Xin Liu, Benyou Wang, Jiezhong Qiu, Boyu Li, Ting Guo, Guangyong Chen, Yang Wang

ACM Transactions on Information Systems, 2023

Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier Transform ArXiv Code

Huiru Xiao, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

EMNLP, 2022

Boosting Graph Structure Learning with Dummy Nodes ArXiv Code

Xin Liu, Jiayang Cheng, Yangqiu Song, Xin Jiang

ICML, 2022

Graph Convolutional Networks with Dual Message Passing for Subgraph Isomorphism Counting and Matching ArXiv Code

Xin Liu, Yangqiu Song

AAAI, 2022

Neural Subgraph Isomorphism Counting ArXiv Code

Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, Lifeng Shang

SIGKDD, 2020

Hyper-Path-Based Representation Learning for Hyper-Networks ArXiv Code

Jie Huang, Xin Liu, Yangqiu Song

CIKM, 2019

Learning To Optimize Multi-Objective Alignment Through Dynamic Reward Weighting ArXiv Code

Yining Lu, Zilong Wang, Shiyang Li, Xin Liu, Changlong Yu, Qingyu Yin, Zhan Shi, Zixuan Zhang, Meng Jiang

Transactions of the Association for Computational Linguistics, 2026

Leveraging Historical Information To Boost Retrieval-Augmented Generation In Conversations PDF

Fengran Mo, Yifan Gao, Zhuofeng Wu, Xin Liu, Pei Chen, Zheng Li, Zhengyang Wang, Xian Li, Meng Jiang, Jian-Yun Nie

Information Processing & Management, 2026

WebCoach: Self-Evolving Web Agents With Cross-session Memory Guidance ArXiv

Genglin Liu, Shijie Geng, Sha Li, Hejie Cui, Sarah Zhang, Xin Liu, Tianyi Liu

ICLR MemAgents Workshop, 2026

HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning ArXiv Code

Weiqi Wang, Xin Liu, Binxuan Huang, Hejie Cui, Rongzhi Zhang, Changlong Yu, Shuowei Jin, Jingfeng Yang, Qingyu Yin, Zhengyang Wang, others

ArXiv preprint, 2026

Training LLMs for Multi-Step Tool Orchestration with Constrained Data Synthesis and Graduated Rewards ArXiv Code

Jiayang Cheng, Xin Liu, Zhihan Zhang, Haoyang Wen, Zixuan Zhang, Qingyu Yin, Shiyang Li, Priyanka Nigam, Bing Yin, Chao Zhang, Yangqiu Song

ArXiv preprint, 2026

DeepPlanner: Scaling Planning Capability for Deep Research Agents via Advantage Shaping ArXiv Code

Wei Fan, Wenlin Yao, Zheng Li, Feng Yao, Xin Liu, Liang Qiu, Qingyu Yin, Yangqiu Song, Bing Yin

Findings of ACL, 2026

Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning ArXiv

Fengran Mo, Yifan Gao, Sha Li, Hansi Zeng, Xin Liu, Zhaoxuan Tan, Xian Li, Jianshu Chen, Dakuo Wang, Meng Jiang

Findings of ACL, 2026

SessionIntentBench: A Multi-task Inter-session Intention-shift Modeling Benchmark for E-commerce Customer Behavior Understanding ArXiv Code

Yuqi Yang, Weiqi Wang, Baixuan Xu, Wei Fan, Qing Zong, Chunkit Chan, Zheye Deng, Xin Liu, Yifan Gao, Changlong Yu, others

Findings of ACL, 2026

Teach Diffusion Language Models to Learn from Their Own Mistakes ArXiv

Liming Liu, Binxuan Huang, Xin Liu, Bing Yin, Tuo Zhao

ArXiv preprint, 2026

POPI: Personalizing LLMs via Optimized Natural Language Preference Inference ArXiv

Yizhuo Chen, Xin Liu, Ruijie Wang, Zheng Li, Pei Chen, Changlong Yu, Priyanka Nigam, Meng Jiang, Bing Yin

ArXiv preprint, 2025

Think-RM: Enabling long-horizon reasoning in generative reward models ArXiv

Ilgee Hong, Changlong Yu, Liang Qiu, Weixiang Yan, Zhenghao Xu, Haoming Jiang, Qingru Zhang, Qin Lu, Xin Liu, Chao Zhang, others

NeurIPS, 2025

Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data ArXiv Code

Siqi Guo, Ilgee Hong, Vicente Balmaseda, Changlong Yu, Liang Qiu, Xin Liu, Haoming Jiang, Tuo Zhao, Tianbao Yang

ICML, 2025

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training ArXiv

Yuchen Zhuang, Jingfeng Yang, Haoming Jiang, Xin Liu, Kewei Cheng, Sanket Lokegaonkar, Yifan Gao, Qing Ping, Tianyi Liu, Binxuan Huang, others

NAACL, 2025

Uniconv: Unifying Retrieval And Response Generation For Large Language Models In Conversations ArXiv

Fengran Mo, Yifan Gao, Chuan Meng, Xin Liu, Zhuofeng Wu, Kelong Mao, Zhengyang Wang, Pei Chen, Zheng Li, Xian Li, others

ACL, 2025

EcomScriptBench: A multi-task benchmark for e-commerce script planning via step-wise intention-driven product association ArXiv

Weiqi Wang, Limeng Cui, Xin Liu, Sreyashi Nag, Wenju Xu, Chen Luo, Sheikh Muhammad Sarwar, Yang Li, Hansu Gu, Hui Liu, others

ACL, 2025

IHEval: Evaluating language models on following the instruction hierarchy ArXiv Code

Zhihan Zhang, Shiyang Li, Zixuan Zhang, Xin Liu, Haoming Jiang, Xianfeng Tang, Yifan Gao, Zheng Li, Haodong Wang, Zhaoxuan Tan, others

NAACL, 2025

Can Language Models Follow Multiple Turns of Entangled Instructions? ArXiv Code

Chi Han, Xin Liu, Haodong Wang, Shiyang Li, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Qingyu Yin, Liang Qiu, Changlong Yu, others

Findings of EMNLP, 2025

DrAgent: Empowering Large Language Models as Medical Agents for Multi-hop Medical Reasoning PDF

Fenglin Liu, Zheng Li, Hongjian Zhou, Qingyu Yin, Jingfeng Yang, Xin Liu, Zhengyang Wang, Xianfeng Tang, Shiyang Li, Xiang He, others

Findings of EMNLP, 2025

InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding ArXiv Code

Jiayang Cheng, Qianqian Zhuang, Haoran Li, Chunkit Chan, Xin Liu, Lin Qiu, Yangqiu Song

Findings of EMNLP, 2025

On the role of entity and event level conceptualization in generalizable reasoning: A survey of tasks, methods, applications, and future directions ArXiv

Weiqi Wang, Tianqing Fang, Haochen Shi, Baixuan Xu, Wenxuan Ding, Liyu Zhang, Wei Fan, Jiaxin Bai, Haoran Li, Xin Liu, others

Findings of EMNLP, 2025

MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding ArXiv

Baixuan Xu*, Weiqi Wang*, Haochen Shi, Wenxuan Ding, Huihao JING, Tianqing Fang, Jiaxin Bai, Xin Liu, Changlong Yu, Zheng Li, Chen Luo, Qingyu Yin, Bing Yin, Long Chen, Yangqiu Song

EMNLP, 2024

IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce ArXiv

Wenxuan Ding*, Weiqi Wang*, Sze Heng Douglas Kwok, Minghao LIU, Tianqing Fang, Jiaxin Bai, Xin Liu, Changlong Yu, Zheng Li, Chen Luo, Qingyu Yin, Bing Yin, Junxian He, Yangqiu Song

Findings of EMNLP, 2024

NegotiationToM: A Benchmark for Stress-testing Machine Theory of Mind on Negotiation Surrounding ArXiv

Chunkit Chan, Cheng Jiayang, Yauwai Yim, Zheye Deng, Wei Fan, Haoran Li, Xin Liu, Hongming Zhang, Weiqi Wang, Yangqiu Song

Findings of EMNLP, 2024

AbsInstruct: Eliciting Abstraction Ability from LLMs through Explanation Tuning with Plausibility Estimation ArXiv Code

Zhaowei Wang, Wei Fan, Qing Zong, Hongming Zhang, Sehyun Choi, Tianqing Fang, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

ACL, 2024

PrivLM-Bench: A Multi-level Privacy Evaluation Benchmark for Language Models ArXiv

Haoran Li, Dadi Guo, Donghao Li, Wei Fan, Qi Hu, Xin Liu, Chunkit Chan, Duanyi Yao, Yangqiu Song

ACL, 2024

CANDLE: Iterative Conceptualization and Instantiation Distillation from Large Language Models for Commonsense Reasoning ArXiv Code

Weiqi Wang, Tianqing Fang, Chunyang Li, Haochen Shi, Wenxuan Ding, Baixuan Xu, Zhaowei Wang, Jiaxin Bai, Xin Liu, Jiayang Cheng, Chunkit Chan, Yangqiu Song

ACL, 2024

AbsPyramid: Benchmarking the Abstraction Ability of Language Models with a Unified Entailment Graph ArXiv Code

Zhaowei Wang, Haochen Shi, Weiqi Wang, Tianqing Fang, Hongming Zhang, Sehyun Choi, Xin Liu, Yangqiu Song

Findings of NAACL, 2024

EventGround: Narrative Reasoning by Grounding to Eventuality-centric Knowledge Graphs ArXiv Code

Cheng Jiayang, Lin Qiu, Chunkit Chan, Xin Liu, Yangqiu Song, Zheng Zhang

COLING, 2024

ChatGPT Evaluation on Sentence Level Relations: A Focus on Temporal, Causal, and Discourse Relations ArXiv

Chunkit Chan, Jiayang Cheng, Weiqi Wang, Yuxin Jiang, Tianqing Fang, Xin Liu, Yangqiu Song

Findings of EACL, 2024

COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at Amazon

Changlong Yu*, Xin Liu*, Jefferson Maia, Tianyu Cao, Laurence (Yang) Li, Yifan Gao, Yangqiu Song, Rahul Goutam, Haiyang Zhang, Bing Yin, Zheng Li*

SIGMOD, 2024

🚗CAR: Conceptualization-Augmented Reasoner for Zero-Shot Commonsense Question Answering ArXiv Code

Weiqi Wang*, Tianqing Fang*, Wenxuan Ding, Baixuan Xu, Xin Liu, Yangqiu Song, Antoine Bosselut

Findings of EMNLP, 2023

QaDynamics: Training Dynamics-Driven Synthetic QA Diagnostic for Zero-Shot Commonsense Question Answering ArXiv Code

Haochen Shi*, Weiqi Wang*, Tianqing Fang, Baixuan Xu, Wenxuan Ding, Xin Liu, Yangqiu Song

Findings of EMNLP, 2023

Gold: A Global and Local-aware Denoising Framework for Commonsense Knowledge Graph Noise Detection ArXiv Code

Zheye Deng, Weiqi Wang, Zhaowei Wang, Xin Liu, Yangqiu Song

Findings of EMNLP, 2023

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting ArXiv

Hejie Cui, Xinyu Fang, Zihan Zhang, Ran Xu, Xuan Kan, Xin Liu, Yue Yu, Manling Li, Yangqiu Song, Carl Yang

NeurIPS, 2023

Self-Consistency Narrative Prompts on Abductive Natural Language Inference ArXiv Code

Chunkit Chan, Xin Liu, Tsz Ho Chan, Jiayang Cheng, Yangqiu Song, Ginny Y. Wong, Simon See

AACL, 2023

DiscoPrompt: Path Prediction Prompt Tuning for Implicit Discourse Relation Recognition ArXiv Code

Chunkit Chan*, Xin Liu*, Jiayang Cheng, Zihan Li, Yangqiu Song, Ginny Y. Wong, Simon See

Findings of ACL, 2023

FolkScope: Intention Knowledge Graph Construction for E-commerce Commonsense Discovery ArXiv Code

Changlong Yu, Weiqi Wang, Xin Liu, Jiaxin Bai, Yangqiu Song, Zheng Li, Yifan Gao, Tianyu Cao, Bing Yin

Findings of ACL, 2023

ASER: Towards Large-scale Commonsense Knowledge Acquisition via Higher-order Selectional Preference over Eventualities ArXiv Code

Hongming Zhang*, Xin Liu*, Haojie Pan*, Haowen Ke, Jiefu Ou, Tianqing Fang, Yangqiu Song

Artificial Intelligence, 2022

Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier Transform ArXiv Code

Huiru Xiao, Xin Liu, Yangqiu Song, Ginny Y. Wong, Simon See

EMNLP, 2022

Exploring Discourse Structures for Argument Impact Classification ArXiv Code

Xin Liu, Jiefu Ou, Yangqiu Song, Xin Jiang

ACL, 2021

On the Importance of Word and Sentence Representation Learning in Implicit Discourse Relation Classification ArXiv Code

Xin Liu, Jiefu Ou, Yangqiu Song, Xin Jiang

IJCAI, 2020

ASER: A Large-scale Eventuality Knowledge Graph ArXiv Code

Hongming Zhang*, Xin Liu*, Haojie Pan*, Yangqiu Song, Cane Wing-Ki Leung

WWW, 2020

Relation Discovery with Out-of-Relation Knowledge Base as Supervision ArXiv Code

Yan Liang, Xin Liu, Jianwen Zhang, Yangqiu Song

NAACL, 2019

A Variational Approach to Weakly Supervised Document-Level Multi-Aspect Sentiment Classification ArXiv Code

Ziqian Zeng, Wenxuan Zhou, Xin Liu, Yangqiu Song

NAACL, 2019

Efficient Path Prediction for Semi-Supervised and Weakly Supervised Hierarchical Text Classification ArXiv Code

Huiru Xiao, Xin Liu, Yangqiu Song

WWW, 2019

Biased RandomWalk based Social Regularization for Word Embeddings PDF Code

Ziqian Zeng*, Xin Liu*, Yangqiu Song

IJCAI, 2018

Awards

HKUST RedBird Academic Excellence Award (Hong Kong University of Science and Technology, 2022)

HKUST SENG Academic Award (Hong Kong University of Science and Technology, 2020)

HKUST CSE Professor Samuel Chanson Best Teaching Assistant Award (Hong Kong University of Science and Technology, 2020)

SYSU Outstanding Graduate (Sun Yat-sen University, 2018)

SYSU Outstanding Graduate Thesis (Sun Yat-sen University, 2018)

The First Prize, Guangdong Collegiate Programming Contest (Computer Academy of Guangdong, 2018)

Top 10, Tencent social advertising college algorithm competition (Tencent, 2018)

Top 10, Tencent social advertising college algorithm competition (Tencent, 2017)

Meritorious Winner, Mathematical Contest in Modeling / Interdisciplinary Contest in Modeling (Consortium for Mathematics and its Applications, 2017)

National Scholarship (Ministry of Education of the People's Republic of China, 2017)

National Scholarship (Ministry of Education of the People's Republic of China, 2016)

National Scholarship (Ministry of Education of the People's Republic of China, 2015)

Teaching Experience

Teaching Assistant (Postgraduate) at HKUST (Spring 2022)

MSBD5018H: Natural Language Processing

Teaching Assistant (Postgraduate) at HKUST (Spring 2021)

MSBD6000H: Natural Language Processing

Teaching Assistant (Undergraduate) at HKUST (Spring 2020)

COMP4332/RMBI4310: Big Data Mining

Teaching Assistant (Undergraduate) at HKUST (Spring 2019)

COMP4332/RMBI4310: Big Data Mining

Teaching Assistant (Undergraduate) at HKUST (Fall 2018)

COMP4901K/MATH4824B: Machine Learning for Natural Language Processing

Mentoring

Weiqi Wang (Fall 2024, 2025)

Intern (Amazon) [E-commerce Planning and Efficient RL with On-policy Data Augmentation] , now Researcher at Tencent Hunyuan Team

Jiayang Cheng (Fall 2025)

Intern (Amazon) [Agentic RL with Orchestration Rewards] , now Researcher at Alibaba Qwen Team

Yizhuo Chen (Fall 2024)

Intern (Amazon) [Joint Optimization on Summary and Generation Models] , now PhD@UIUC

Chi Han (Fall 2024)

Intern (Amazon) [Comprehensive Multi-turn Evaluation] , now PhD@UIUC

Zi-Yuan Hu (Spring 2021)

Undergraduate Research Assistant (HKUST) [Graph Representation Learning] , now PhD@CUHK

Jie Huang (Winter 2020 - Summer 2021)

Undergraduate Research Assistant (HKUST) [Hyper-network Representation Learning] , now MTSF@xAI

Jiefu Ou (Summer 2019 - Spring 2021)

Undergraduate Research Assistant (HKUST) [Implicit Discourse Representation Learning and Application] , now PhD@JHU

Xinran Zhao (Summer 2019 - Fall 2020)

Undergraduate Research Assistant (HKUST) [Eventuality Knowledge Graph Representation Learning] , now PhD@CMU

Weiqi Wang (Fall 2024, 2025)

Intern (Amazon) [E-commerce Planning and Efficient RL with On-policy Data Augmentation] , now Researcher at Tencent Hunyuan Team

Jiayang Cheng (Fall 2025)

Intern (Amazon) [Agentic RL with Orchestration Rewards] , now Researcher at Alibaba Qwen Team

Yizhuo Chen (Fall 2024)

Intern (Amazon) [Joint Optimization on Summary and Generation Models] , now PhD@UIUC

Chi Han (Fall 2024)

Intern (Amazon) [Comprehensive Multi-turn Evaluation] , now PhD@UIUC

Zi-Yuan Hu (Spring 2021)

Undergraduate Research Assistant (HKUST) [Graph Representation Learning] , now PhD@CUHK

Jie Huang (Winter 2020 - Summer 2021)

Undergraduate Research Assistant (HKUST) [Hyper-network Representation Learning] , now MTSF@xAI

Jiefu Ou (Summer 2019 - Spring 2021)

Undergraduate Research Assistant (HKUST) [Implicit Discourse Representation Learning and Application] , now PhD@JHU

Xinran Zhao (Summer 2019 - Fall 2020)

Undergraduate Research Assistant (HKUST) [Eventuality Knowledge Graph Representation Learning] , now PhD@CMU

Acknowledgement

This website uses the website design and template by Martin Saveski