Fusing Reward and Dueling Feedback in Stochastic Bandits
Xuchuang Wang, Qirun Zeng , Jinhang Zuo , Xutong Liu , Mohammad Hajiesmaili , John C.S. Lui , and Adam Wierman
In Forty-second International Conference on Machine Learning , 2025
Why it matters: Develops bandit algorithms that simultaneously consume absolute reward feedback and pairwise dueling comparisons – a step toward learners that handle the heterogeneous human feedback found in modern RLHF-style applications.
@inproceedings{wang2025fusing,title={Fusing Reward and Dueling Feedback in Stochastic Bandits},author={Wang, Xuchuang and Zeng, Qirun and Zuo, Jinhang and Liu, Xutong and Hajiesmaili, Mohammad and Lui, John C.S. and Wierman, Adam},booktitle={Forty-second International Conference on Machine Learning},year={2025},significance={Develops bandit algorithms that simultaneously consume absolute reward feedback and pairwise dueling comparisons -- a step toward learners that handle the heterogeneous human feedback found in modern RLHF-style applications.}}
ICLR
Stochastic Bandits Robust to Adversarial Attacks
Xuchuang Wang, Maoli Liu , Jinhang Zuo , Xutong Liu , John C.S. Lui , and Mohammad Hajiesmaili
In International Conference on Learning Representations , 2025
Why it matters: Provides regret guarantees for stochastic bandits against a strong adversary that can corrupt rewards after observing the learner’s actions, making bandit learning usable in settings where some feedback is manipulated.
@inproceedings{wang2025robust,title={Stochastic Bandits Robust to Adversarial Attacks},author={Wang, Xuchuang and Liu, Maoli and Zuo, Jinhang and Liu, Xutong and Lui, John C.S. and Hajiesmaili, Mohammad},booktitle={International Conference on Learning Representations},year={2025},significance={Provides regret guarantees for stochastic bandits against a strong adversary that can corrupt rewards after observing the learner's actions, making bandit learning usable in settings where some feedback is manipulated.}}
SIGMETRICS
Asynchronous Multi-Agent Bandits: Fully Distributed vs. Leader-Coordinated Algorithms
Xuchuang Wang*, Yu-Zhen Janice Chen* , Lin Yang , Xutong Liu , Mohammad Hajiesmaili , Don Towsley , and John C.S. Lui
In ACM International Conference on Measurement and Modeling of Computer Systems , 2025
Why it matters: Drops the lock-step assumption that pervades multi-agent bandit analyses and compares fully distributed against leader-coordinated protocols, clarifying the communication-versus-regret tradeoff that governs how cooperating learners should share information.
@inproceedings{wang2025asynchronous,title={Asynchronous Multi-Agent Bandits: Fully Distributed vs. Leader-Coordinated Algorithms},author={Wang*, Xuchuang and Chen*, Yu-Zhen Janice and Yang, Lin and Liu, Xutong and Hajiesmaili, Mohammad and Towsley, Don and Lui, John C.S.},booktitle={ACM International Conference on Measurement and Modeling of Computer Systems},year={2025},significance={Drops the lock-step assumption that pervades multi-agent bandit analyses and compares fully distributed against leader-coordinated protocols, clarifying the communication-versus-regret tradeoff that governs how cooperating learners should share information.}}
AAAI
Best Arm Identification with Quantum Oracles
Xuchuang Wang, Yu-Zhen Janice Chen , Matheus Andrade , Jonathan Allcock , Mohammad Hajiesmaili , John C.S. Lui , and Don Towsley
In The 39th Annual AAAI Conference on Artificial Intelligence , 2025
Why it matters: Studies best-arm identification when the learner can issue quantum queries to the reward oracle, quantifying the speedup that quantum-enhanced sequential decision-making offers over classical bandit algorithms.
@inproceedings{wang2025best,title={Best Arm Identification with Quantum Oracles},author={Wang, Xuchuang and Chen, Yu-Zhen Janice and Guedes de Andrade, Matheus and Allcock, Jonathan and Hajiesmaili, Mohammad and Lui, John C.S. and Towsley, Don},booktitle={The 39th Annual AAAI Conference on Artificial Intelligence},year={2025},significance={Studies best-arm identification when the learner can issue quantum queries to the reward oracle, quantifying the speedup that quantum-enhanced sequential decision-making offers over classical bandit algorithms.}}
INFOCOM
Learning Best Paths in Quantum Networks
Xuchuang Wang, Maoli Liu , Xutong Liu , Zhuohua Li , Mohammad Hajiesmaili , John C.S. Lui , and Don Towsley
In Proceedings of the IEEE Conference on Computer Communications , 2025
Why it matters: Casts entanglement path selection in quantum networks as an online learning problem and gives a routing algorithm that adapts to unknown, noisy link fidelities – a building block for routing on the quantum internet.
@inproceedings{wang2025learn,title={Learning Best Paths in Quantum Networks},author={Wang, Xuchuang and Liu, Maoli and Liu, Xutong and Li, Zhuohua and Hajiesmaili, Mohammad and Lui, John C.S. and Towsley, Don},booktitle={Proceedings of the IEEE Conference on Computer Communications},year={2025},significance={Casts entanglement path selection in quantum networks as an online learning problem and gives a routing algorithm that adapts to unknown, noisy link fidelities -- a building block for routing on the quantum internet.}}