QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Yuhui Xu,
Lingxi Xie, Xiaotao Gu, Xin Chen, Heng Chang, Hengheng Zhang, Zhengsu Chen,
Xiaopeng Zhang,
Qi Tian
accepted by
International Conference on Learning Representations (ICLR 2024) [PDF][Code]
Fitting the Search Space of Weight-sharing NAS with Graph Convolutional Networks
Xin Chen,
Lingxi Xie, Jun Wu, Longhui Wei,
Yuhui Xu,
Qi Tian
accepted by
35nd Association for the Advancement of Artificial Intelligence Conference on Artificial Intelligence (AAAI21) [
PDF]
TRP: Trained Rank Pruning for Efficient Deep Neural Networks
Yuhui Xu, Yuxi Li, Shuai Zhang,
Wei Wen, Botao Wang, Yingyong Qi,
Yiran Chen,
Weiyao Lin,
Hongkai Xiong
Accepetd by
International Joint Conference on Artificial Intelligence (IJCAI 2020), Yokohama, Japan, July 2020 [PDF][Code]
PC-DARTS: Partial Channel Connections for Memory-Efficient Differentiable Architecture Search
Trained Rank Pruning for Efficient Deep Neural Networks
DNQ: Dynamic Network Quantization
Deep Neural Network Compression with Single and Multiple Level Qauntization