publications

2026

  1. KDD
    Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
    Fatemeh Ghaffari , Siddarth Sitaraman , Xutong Liu , Xuchuang Wang, and Mohammad Hajiesmaili
    In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , 2026
  2. Online Optimal Probe Allocation for Quantum Network Tomography
    Xuchuang Wang, Yu-Zhen Janice Chen , Matheus Andrade , Mohammad Hajiesmaili , John C.S. Lui , Ting He , and Don Towsley
    In International Conference on Quantum Communications, Networking, and Computing , 2026
  3. ToN
    Cooperative Bandit Algorithms with Optimal Regret and Communication Costs
    Yang Lin , Xuchuang Wang, Haoxu Chen , Mohammad Hajiesmaili , Lijun Zhang , and John C.S. Lui
    In IEEE/ACM Transactions on Networking , 2026
  4. ToN
    Combinatorial Logistic Online Learning and Its Applications in Nonlinear Networked Systems
    Xutong Liu , Xiangxiang Dai , Xuchuang Wang, Mohammad Hajiesmaili , and John C.S. Lui
    In IEEE/ACM Transactions on Networking , 2026
  5. A Multi-Agent Conversational Bandit Approach to Online Evaluation and Selection of User-Aligned LLM Responses
    Xiangxiang Dai , Yuejin Xie , Maoli Liu , Xuchuang Wang, Zhuohua Li , Huanyu Wang , and John C.S. Lui
    In The 40th Annual AAAI Conference on Artificial Intelligence (AI Alignment Track) , 2026
  6. Heterogeneous Multi-Agent Multi-Armed Bandits on Stochastic Block Models
    Mengfan Xu , Liran Shan , Fatemeh Ghaffari , Xuchuang Wang, Xutong Liu , and Mohammad Hajiesmaili
    In ACM International Conference on Measurement and Modeling of Computer Systems , 2026

2025

  1. Federated Multi-armed Bandits with Efficient Bit-Level Communications
    Haoran Zhang , Xu Yang , Xuchuang Wang, Hao-Xu Chen , Hao Qiu , Lin Yang , and Yang Gao
    In Advances in Neural Information Processing Systems , 2025
  2. Fusing Reward and Dueling Feedback in Stochastic Bandits
    Xuchuang Wang, Qirun Zeng , Jinhang Zuo , Xutong Liu , Mohammad Hajiesmaili , John C.S. Lui , and Adam Wierman
    In Forty-second International Conference on Machine Learning , 2025
    Why it matters: Develops bandit algorithms that simultaneously consume absolute reward feedback and pairwise dueling comparisons – a step toward learners that handle the heterogeneous human feedback found in modern RLHF-style applications.
  3. UAI
    Optimal Regret Bounds for Federated Multi-armed Bandits with Fully Distributed Communication
    Haoran Zhang , Xuchuang Wang, Hao-Xu Chen , Hao Qiu , Lin Yang , and Yang Gao
    In The 41th Conference on Uncertainty in Artificial Intelligence , 2025
  4. Multi-Agent Stochastic Bandits Robust to Adversarial Corruptions
    Fatemeh Ghaffari , Xuchuang Wang, Jinhang Zuo , and Mohammad Hajiesmaili
    In 7th Annual Learning for Dynamics & Control Conference , 2025
  5. Stochastic Bandits Robust to Adversarial Attacks
    Xuchuang Wang, Maoli Liu , Jinhang Zuo , Xutong Liu , John C.S. Lui , and Mohammad Hajiesmaili
    In International Conference on Learning Representations , 2025
    Why it matters: Provides regret guarantees for stochastic bandits against a strong adversary that can corrupt rewards after observing the learner’s actions, making bandit learning usable in settings where some feedback is manipulated.
  6. Asynchronous Multi-Agent Bandits: Fully Distributed vs. Leader-Coordinated Algorithms
    Xuchuang Wang*, Yu-Zhen Janice Chen* , Lin Yang , Xutong Liu , Mohammad Hajiesmaili , Don Towsley , and John C.S. Lui
    In ACM International Conference on Measurement and Modeling of Computer Systems , 2025
    Why it matters: Drops the lock-step assumption that pervades multi-agent bandit analyses and compares fully distributed against leader-coordinated protocols, clarifying the communication-versus-regret tradeoff that governs how cooperating learners should share information.
  7. Combinatorial Logistic Bandits
    Xutong Liu , Xiangxiang Dai , Xuchuang Wang, Mohammad Hajiesmaili , and John C.S. Lui
    In ACM International Conference on Measurement and Modeling of Computer Systems , 2025
  8. Best Arm Identification with Quantum Oracles
    Xuchuang Wang, Yu-Zhen Janice Chen , Matheus Andrade , Jonathan Allcock , Mohammad Hajiesmaili , John C.S. Lui , and Don Towsley
    In The 39th Annual AAAI Conference on Artificial Intelligence , 2025
    Why it matters: Studies best-arm identification when the learner can issue quantum queries to the reward oracle, quantifying the speedup that quantum-enhanced sequential decision-making offers over classical bandit algorithms.
  9. Heterogeneous Multi-Agent Bandits with Parsimonious Hints
    Amirmahdi Mirfakhar , Xuchuang Wang, Jinhang Zuo , Yair Zick , and Mohammad Hajiesmaili
    In The 39th Annual AAAI Conference on Artificial Intelligence , 2025
  10. Learning Best Paths in Quantum Networks
    Xuchuang Wang, Maoli Liu , Xutong Liu , Zhuohua Li , Mohammad Hajiesmaili , John C.S. Lui , and Don Towsley
    In Proceedings of the IEEE Conference on Computer Communications , 2025
    Why it matters: Casts entanglement path selection in quantum networks as an online learning problem and gives a routing algorithm that adapts to unknown, noisy link fidelities – a building block for routing on the quantum internet.

2024

  1. Combinatorial Multivariant Multi-Armed Bandits with Applications to Episodic Reinforcement Learning and Beyond
    Xutong Liu , Siwei Wang , Jinhang Zuo , Han Zhong , Xuchuang Wang, Zhiyong Wang , Shuai Li , Mohammad Hajiesmaili , John CS Lui , and Wei Chen
    In Forty-first International Conference on Machine Learning , 2024

2023

  1. Multi-Fidelity Multi-Armed Bandits Revisited
    Xuchuang Wang, Qingyun Wu , Wei Chen , and John C.S. Lui
    In Advances in Neural Information Processing Systems , 2023
  2. TMC
    Analyzing Queueing Problems via Bandits with Linear Reward and Nonlinear Workload Fairness
    Xuchuang Wang, Hong Xie , and John C.S. Lui
    In The IEEE Transactions on Mobile Computing , 2023
  3. Optimizing Recommendations under Abandonment Risks: Models and Algorithms
    Xuchuang Wang, Hong Xie , Pinghui Wang , and John C.S. Lui
    Performance Evaluation, 2023
  4. UAI
    Exploration for Free: How Does Reward Heterogeneity Improve Regret in Cooperative Multi-agent Bandits?
    Xuchuang Wang, Lin Yang , Yu-Zhen Janice Chen , Xutong Liu , Mohammad Hajiesmaili , Don Towsley , and John C.S. Lui
    In The 39th Conference on Uncertainty in Artificial Intelligence , 2023
  5. Achieve Near-Optimal Individual Regret & Low Communications in Multi-Agent Bandits
    Xuchuang Wang, Lin Yang , Yu-Zhen Janice Chen , Xutong Liu , Mohammad Hajiesmaili , Don Towsley , and John C.S. Lui
    In International Conference on Learning Representations , 2023
  6. On-Demand Communication for Asynchronous Multi-Agent Bandits
    Yu-Zhen Janice Chen , Lin Yang , Xuchuang Wang, Xutong Liu , Mohammad Hajiesmaili , John C.S. Lui , and Don Towsley
    In The 26th International Conference on Artificial Intelligence and Statistics , 2023

2022

  1. Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications
    Xuchuang Wang, Hong Xie , and John C.S. Lui
    In Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI-22 , 2022
  2. Multiple-Play Stochastic Bandits with Shareable Finite-Capacity Arms
    Xuchuang Wang, Hong Xie , and John C.S. Lui
    In Proceedings of the 39th International Conference on Machine Learning , 2022