Xin Liu

劉昕

Senior Applied Scientist

Store Foundational AI, Amazon

Office: SFO22, Palo Alto, CA 94301

Email: xliucr [at] amazon [dot] com; seanliu96 [at] outlook [dot] com

Senior Applied Scientist Fulltime

August 2023 - Now

Store Foundational AI

Foundational Models for E-commerce

Managers: Qingyu Yin

Show details

Pre-training & Mid-training: Babysitted and delivered two generations of Rufus MoE LLMs from scratch: latest ultra-sparse MoE (3.25% active parameters) matches or exceeds DeepSeek-V3-Base and Kimi-K2-Base on MMLU, MMLU-Pro, Math, and internal shopping benchmarks. Roadmapped data mixture and curriculum strategy across pre-training and mid-training [2024-2025]. Pioneered agentic mid-training with Hephaestus [NAACL’25], a 103B-token corpus of 76k+ APIs from API documents and avaliable tool trajectories for mid-training, outperforming open-source base checkpoints at the similar scale and boosting following post-training.

Post-training & Model Release: Led the initial release of four in-house instruct/reasoning/agent models (Dense, MoE, ultra-sparse MoE), improving performance in retrieval augmentation, instruction following, consistency [2023-2024], reasoning, helpfulness, and shopping agent [2025-2026] through systematic exploration of data recipes and training signals (reject sampling, verifiable rewards, reward models, etc.). Built HeaPA [ArXiv: 2601.22448], a difficulty-aware heap sampling and on-policy query augmentation framework for efficient RLVR, where heap sampling provides a more flexible difficulty schedule and query augmentation ensures the diversity of queries and difficulty managed by a tree-based reward aggregation.

Agentic Systems & Reinforcement Learning: Architected the first agentic multimodal shopping environment with visual search and image generation/editing tools for end-to-end RL, with the support of context management [2026]. Designed multi-step tool orchestration rewards after analyzing compositional patterns [ArXiv: 2603.24709], verified in a 100k+ real-API cache environment and achieved 19.9% turn accuracy and 34.2% call accuracy gains on ComplexFuncBench. Worked on DeepPlanner [Findings of ACL’26] for deep research agents via advantage shaping and upweighting (10× fewer training queries than prior SOTA).

Evaluation & Prompting Infrastructure: Designed the first Rufus shopping prompting system using a finite-state machine (A/B testing, routing, task planning, multilingual support) [2023–2024]. Built MultiTurnInstruct [EMNLP’25], a 1.1K-sample multi-turn benchmark across nine instruction-following categories to stress-test LLMs on entangled and conflicting instructions in complex conversations. Worked on several in-house shopping benchmarks for instruction following, consistency, helpfulness, product search agent, and multimodal agent.

Training Infrastructure: Directed mid-training scaling law package to identify optimal data mixture and curriculum schedules [2024–2025]. Extended pre-training and post-training infra NeMo, NeMo-Aligner, verl, and slime with data loading & processing, curriculum, reward services, and agent environment [2023–2026].

Research Intern Internship

June 2022 - Dec. 2022

Search Query Understanding, Amazon Search (A9)

Commonsense Knowedge Graph and Pattern Mining for E-commerce

Mentors: Zheng Li, Yifan Gao, Jingfeng Yang, Tianyu Cao

Show details

Pioneered graph learning for natural language understanding and parsing – implemented the most effective solution to mine important user intent patterns and parse millions of action-item-intention triples to construct commonsense knowledge graphs [ACL’2023] and built the commonsense knowledge graph at Amazon (COSMO) to improve ranking relevance and recommendation quality in Amazon Search and Navigation [SIGMOD’2024].

Architected session-based recommendation solutions by integrating graph learning – developed the state-of-the-art solution using pattern mining algorithms and memory augmentation to significantly enhance the item-item collaborative filtering graph [NeurIPS’2023].

Research Intern Internship

June 2017 - Dec. 2017

Cloud & Mobile, Microsoft Research Asia

Distributed Graph Database

Mentor: Liang Jeff Chen

Show details

Collaborated on the development of a new graph database upon relational databases and non-relational databases, contributing to the open-source project GraphView.

Focused on translation, compilation, and optimization, leading to the integration of GraphView as a key component of Microsoft Azure.

劉昕

Bio

Working Experience

Publications

Awards

Teaching Experience

Mentoring

Acknowledgement