Publications

An up-to-date list is available on Google Scholar.

2024

  1. math.png
    MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs
    Lei Wang ,  Shan Dong ,  Yuhui Xu , and 6 more authors
    arXiv preprint arXiv:2410.04698, 2024
  2. think.png
    ThinK: Thinner Key Cache by Query-Driven Pruning
    Yuhui Xu ,  Zhanming Jie ,  Hanze Dong , and 6 more authors
    arXiv preprint arXiv:2407.21018, 2024
  3. llmqfa.png
    One QuantLLM for ALL: Fine-tuning Quantized LLMs Once for Efficient Deployments
    Ke Yi* ,  Yuhui Xu* ,  Heng Chang , and 4 more authors
    arXiv preprint arXiv:2405.20202, 2024
    * = equal contribution
  4. terdit.png
    TerDiT: Ternary Diffusion Models with Transformers
    Xudong Lu ,  Aojun Zhou ,  Ziyi Lin , and 8 more authors
    arXiv preprint arXiv:2405.14854, 2024
  5. moe.png
    Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
    Xudong Lu ,  Qi Liu ,  Yuhui Xu , and 5 more authors
    The 62nd Annual Meeting of the Association for Computational Linguistics, 2024
  6. spp.png
    SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models
    Xudong Lu* ,  Aojun Zhou* ,  Yuhui Xu* , and 3 more authors
    International Conference on Machine Learning, 2024
    * = equal contribution
  7. qalora.png
    QA-LoRA: Quantization-aware low-rank adaptation of large language models
    Yuhui Xu ,  Lingxi Xie ,  Xiaotao Gu , and 6 more authors
    International Conference on Learning Representation, 2024

2023

  1. BNET.png
    BNET: Batch Normalization With Enhanced Linear Transformation
    Yuhui Xu ,  Lingxi Xie ,  Cihang Xie , and 6 more authors
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023

2021

  1. chenxin.png
    Fitting the search space of weight-sharing nas with graph convolutional networks
    Xin Chen ,  Lingxi Xie ,  Jun Wu , and 3 more authors
    2021
  2. pcpami.png
    Partially-Connected Neural Architecture Search for Reduced Computational Redundancy
    Yuhui Xu ,  Lingxi Xie ,  Wenrui Dai , and 5 more authors
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021

2020

  1. latency.png
    Latency-aware differentiable neural architecture search
    Yuhui Xu ,  Lingxi Xie ,  Xiaopeng Zhang , and 4 more authors
    arXiv preprint arXiv:2001.06392, 2020
  2. trp.png
    Trp: Trained rank pruning for efficient deep neural networks
    Yuhui Xu ,  Yuxi Li ,  Shuai Zhang , and 6 more authors
    International Joint Conference on Artificial Intelligence, 2020
  3. pc-darts.png
    PC-DARTS: Partial Channel Connections for Memory-Efficient Architecture Search
    Yuhui Xu ,  Lingxi Xie ,  Xiaopeng Zhang , and 4 more authors
    International Conference on Learning Representation, 2020
  4. tmm.png
    Iterative Deep Neural Network Quantization With Lipschitz Constraint
    Yuhui Xu ,  Wenrui Dai ,  Yingyong Qi , and 2 more authors
    IEEE Transactions on Multimedia, 2020

2019

  1. dnq.png
    DNQ: Dynamic network quantization
    Yuhui Xu ,  Shuai Zhang ,  Yingyong Qi , and 3 more authors
    IEEE DCC, 2019

2018

  1. mlq.png
    Deep neural network compression with single and multiple level quantization
    Yuhui Xu ,  Yongzhuang Wang ,  Aojun Zhou , and 2 more authors
    In Proceedings of the AAAI conference on artificial intelligence , 2018