QSpec: Speculative decoding with complementary quantization schemes

Jan 1, 2025·
Juntao Zhao
Equal contribution
,
Wenhao Lu
Equal contribution
Sheng Wang
Sheng Wang
Equal contribution
,
Lingpeng Kong
,
Chuan Wu
· 0 min read
Abstract
QSpec proposes speculative decoding with complementary quantization schemes to achieve lossless inference acceleration in low-resource scenarios.
Type
Publication
In The 2025 Conference on Empirical Methods in Natural Language Processing
publications
Authors
Sheng Wang
Authors
Sheng Wang (Forence)
PhD Graduate in Computer Science
Sheng Wang is a PhD graduate from The University of Hong Kong, supervised by Prof. Chuan Wu and Prof. Lingpeng Kong. His research focuses on Agent, LLM Super-Alignment, and Data Synthesis. He has published 14+ papers in top-tier conferences including NIPS2025 (Spotlight), ICLR2025, ACL2024/2025, EMNLP2025.
Authors