Yue Cao is currently a researcher in Beijing Academy of Artificial Intelligence (BAAI), focusing on foundation model, self-supervised learning and multimodal learning.
Prior to that, he was a senior researcher at Microsoft Research Asia between 2019 and 2022, headed by Baining Guo and closely collaborated with Han Hu, Zheng Zhang and Steve Lin. His work of Swin Transformer won the Best Paper Award (Marr Prize) of ICCV 2021. He received both B.S. degree and Ph.D. degree from the School of Software, Tsinghua University with highest honors, under supervision of Prof. Jianmin Wang and Prof. Mingsheng Long in 2014 and 2019. During his Ph.D. study, he was a research intern in MSRA between 2018 and 2019, mentored by Jifeng Dai.
We are hiring at BAAI!!! If you are interested in the full-time position, please drop me an email. Internship position is only available on 3D/2D generation.
2022.12 Invited as Area Chair in ICCV 2023.
2022.10 Our study on frozen vision foundation model is accepted by NeurIPS 2022 as spotlight, congrats! [pdf]
2022.06 Gived a talk on MIM pre-training in BAAI 2022. [slides]
2022.03 Our SimMIM, Video Swin, Swin V2 are accepted by CVPR 2022, congrats!
2021.10 Our Swin Transformer (a general-purpose vision backbone) won the Best Paper Award (Marr Prize) of ICCV 2021!!!
2021.10 Gived a tutorial on Vision Transformer in VALSE 2021. [slides]
2020.12 The extension of GCNet (Best Paper Award at ICCV 2019 Neural Architects Workshop) got accepted by TPAMI.
2019.11 Our work on multi-modality pre-training (VL-BERT) was reviewed by Bill Gates and accepted by ICLR 2020.
(†Interns or Students, *Equal Contribution)
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang†, Wen Wang†, Binhui Xie†, Quan Sun, Ledell Wu, Xinggang Wang, Tiejun Huang, Xinlong Wang, Yue Cao
Arxiv, 2022 [pdf] [Code]
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie*†, Zheng Zhang*, Yue Cao*, Yutong Lin†, Jianmin Bao, Zhuliang Yao†, Qi Dai, Han Hu*
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022 [PDF] [Code] [Understanding] [Data Scaling]
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu*†, Yutong Lin*†, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Steve Lin, Baining Guo
International Conference on Computer Vision (ICCV), 2021 Best Paper Award (Marr Prize)
[PDF] [Code@Cls] [Code@Det] [Code@Seg] [Code@MoBY] [Code@Video]
Parametric Instance Classification for Unsupervised Visual Feature Learning
Yue Cao*, Zhenda Xie*†, Bin Liu*†, Yutong Lin†, Zheng Zhang, Han Hu
Neural Information Processing Systems (NeurIPS), 2020 [PDF] [Code@Github] [Post@Synced]
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su*†, Xizhou Zhu*†, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
International Conference on Learning Representations (ICLR), 2020
[PDF] [Code@Github] [Post@Synced] PaperDigest Most Influential Papers
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond
Yue Cao*, Jiarui Xu*, Steve Lin, Fangyun Wei, Han Hu
International Conference on Computer Vision Workshop (ICCVW), 2019
[PDF] [Code@Github] [Code@mmdet] [Post@Synced] Best Paper Award
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2020 [PDF]
Deep Hashing Network for Efficient Similarity Retrieval
Han Zhu, Mingsheng Long, Jianmin Wang, Yue Cao
AAAI Conference on Artificial Intelligence (AAAI), 2016
[PDF] [Code@Github] PaperDigest Most Influential Papers
Learning Transferable Features with Deep Adaptation Networks
Mingsheng Long, Yue Cao, Jianmin Wang, Michael I. Jordan
International Conference on Machine Learning (ICML), 2015
[PDF] [Code@Github] [Code@Github] PaperDigest Most Influential Papers
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2018 [PDF]
PC Member | Reviewer