Taojiannan Yang

I am an applied scientist at AWS AI. Before joining Amazon, I received my PhD degree from the Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), under the supervision of Prof. Chen Chen. Before that, I received my bachelor's degree from University of Science and Technology of China (USTC).

I have a broad interest in deep learning and computer vision. My current research mainly focuses on multimodal representation learning and multimodal generative models.

My work has been selected as CVPR'22 Best Paper Finalist.

Email  /  Google Scholar  /  Github  /  LinkedIn  / 

profile photo
Intern Experience
Applied Scientist Intern
AWS AI Labs, Santa Clara, USA. Summer 2022
Host: Yi Zhu, Yusheng Xie, Aston Zhang, Mu Li

Adapt image models for efficient video understanding.

Research Intern
ByteDance Inc., Mountain View, USA. Summer 2021
Host: Linjie Yang, Xiaojie Jin

Efficient neural architecture search.

Publications
Dense Connector for MLLMs
Huanjin Yao*, Wenhao Wu*, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang
Neural Information Processing Systems (NeurIPS), 2024
paper  /  code

A universal plug-and-play module to enhance Multimodal-LLM.

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen.
European Conference on Computer Vision (ECCV), 2024
paper  /  code  /  demo

Improving the controllability of generative models through discriminative models.

A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition
Andong Deng*, Taojiannan Yang*, Chen Chen.
International Conference on Computer Vision (ICCV), 2023
paper  /  code and data

A new comprehensive action recognition benchmark to evaluate spatiotemporal representation learning from various perspectives.

AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li.
International Conference on Learning Representations (ICLR), 2023
paper  /  project  /  code

How to efficiently and effectively adapt image models for video understanding.

Revisiting Training-free NAS Metrics: An Efficient Training-based Method
Taojiannan Yang, Linjie Yang, Xiaojie Jin, Chen Chen.
Winter Conference on Applications of Computer Vision (WACV), 2023
paper  /  code

Training-free metrics are highly correlated with #params and we propose a new efficient training-based method to address the problem.

Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
Matias Mendieta, Taojiannan Yang, Pu Wang, Minwoo Lee, Zhengming Ding, Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022
(Best Paper Finalist, 33 out of 8161)
paper  /  code

GradAug alleviates data heterogeneity in federated learning by smoothing loss landscape. We further improve its efficiency by proposing a new method FedAlign.

MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations
Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen.
IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI), 2021
paper  /  code

We extend MutualNet to learn adaptive video models and conduct more analyses.

GradAug: A New Regularization Method for Deep Neural Networks
Taojiannan Yang, Sijie Zhu, Chen Chen.
Neural Information Processing Systems (NeurIPS), 2020
paper  /  code

A well-generalized network should make predictions consistent with its subnetworks given differently augmented samples.

MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution
Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis.
European Conference on Computer Vision (ECCV), 2020
(Oral Presentation, 104 out of 5205)
paper  /  code

We learn networks that can run at different widths and resolutions to meet different resource budegets during runtime.

FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER
Ce Zheng, Matias, Mendieta, Taojiannan Yang, Guojun Qi, Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  /  code

An efficent method for human pose estimation and mesh reconstruction.

3D Human Pose Estimation with Spatial and Temporal Transformers
Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding.
International Conference on Computer Vision (ICCV), 2021
paper  /  code

A spatial-temporal transformer structure for 3D human pose estimation.

Visual Explanation for Deep Metric Learning
Sijie Zhu, Taojiannan Yang, Chen Chen.
IEEE Transactions on Image Processing (TIP), 2021
paper  /  code

We visualize point-to-point activation intensity between two images.

VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval
Sijie Zhu, Taojiannan Yang, Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
paper  /  code

A new benchmark for more realistic cross-view image geo-localization.

Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection
Weiping Yu*, Taojiannan Yang*, Chen Chen.
Winter Conference on Applications of Computer Vision (WACV), 2021
paper  /  code

We point out the long-tail distribution problem in UAV images and propose a new method to address it.

Service
Reviewer, IEEE Transaction on Pattern Analysis and Machine Intelligence
Reviewer, IEEE Transaction on Image Processing
Reviewer, CVPR 2022, 2023
Reviewer, ECCV 2022
Reviewer, ICCV 2021, 2023
Reviewer, NeurIPS 2021, 2022
Reviewer, ICLR 2022, 2023
Reviewer, ICML 2022
Volunteer, NeurIPS 2020

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.