Taojiannan Yang

I am an applied scientist at Amazon AGI. Before joining Amazon, I received my PhD degree from the Center for Research in Computer Vision (CRCV), University of Central Florida (UCF), under the supervision of Prof. Chen Chen. Before that, I received my bachelor's degree from University of Science and Technology of China (USTC).

My current research mainly focuses on diffusion models and multimodal large language models.

My work has been selected as CVPR'22 Best Paper Finalist.

Email / Google Scholar / Github / LinkedIn /

Work Experience

Applied Scientist
Amazon AGI/AWS AI, Aug 2023

Building Amazon Nova foundation models.

Applied Scientist Intern
AWS AI Labs, Santa Clara, USA. Summer 2022
Host: Yi Zhu, Yusheng Xie, Aston Zhang, Mu Li

Adapt image models for efficient video understanding.

Research Intern
ByteDance Inc., Mountain View, USA. Summer 2021
Host: Linjie Yang, Xiaojie Jin

Efficient neural architecture search.

Publications

	The Amazon Nova Family of Models: Technical Report and Model Card Core Contributor Amazon Science, 2024 paper We present Amazon Nova, a new generation of state-of-the-art foundation models that deliver frontier intelligence and industry-leading price performance.
	Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level Andong Den, Tongjia Chen, Shoubin Yu, Taojiannan Yang, Lincoln Spencer, Yapeng Tian, Ajmal Saeed Mian, Mohit Bansal, Chen Chen IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025 paper / code A new Motion-Grounded Video Reasoning benchmark, which evaluates multimodal models’ reasoning and perception capabilities for motion understanding.
	Dense Connector for MLLMs Huanjin Yao, Wenhao Wu, Taojiannan Yang, Yuxin Song, Mengxi Zhang, Haocheng Feng, Yifan Sun, Zhiheng Li, Wanli Ouyang, Jingdong Wang Neural Information Processing Systems (NeurIPS), 2024 paper / code A universal plug-and-play module to enhance Multimodal-LLM.
	ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen. European Conference on Computer Vision (ECCV), 2024 paper / code / demo Improving the controllability of generative models through discriminative models.
	A Large-scale Study of Spatiotemporal Representation Learning with a New Benchmark on Action Recognition Andong Deng, Taojiannan Yang, Chen Chen. International Conference on Computer Vision (ICCV), 2023 paper / code and data A new comprehensive action recognition benchmark to evaluate spatiotemporal representation learning from various perspectives.
	AIM: Adapting Image Models for Efficient Video Action Recognition Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li. International Conference on Learning Representations (ICLR), 2023 paper / project / code How to efficiently and effectively adapt image models for video understanding.
	Revisiting Training-free NAS Metrics: An Efficient Training-based Method Taojiannan Yang, Linjie Yang, Xiaojie Jin, Chen Chen. Winter Conference on Applications of Computer Vision (WACV), 2023 paper / code Training-free metrics are highly correlated with #params and we propose a new efficient training-based method to address the problem.
	Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning Matias Mendieta, Taojiannan Yang, Pu Wang, Minwoo Lee, Zhengming Ding, Chen Chen. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Best Paper Finalist, 33 out of 8161) paper / code GradAug alleviates data heterogeneity in federated learning by smoothing loss landscape. We further improve its efficiency by proposing a new method FedAlign.
	MutualNet: Adaptive ConvNet via Mutual Learning from Different Model Configurations Taojiannan Yang, Sijie Zhu, Matias Mendieta, Pu Wang, Ravikumar Balakrishnan, Minwoo Lee, Tao Han, Mubarak Shah, Chen Chen. IEEE Transaction on Pattern Analysis and Machine Intelligence (TPAMI), 2021 paper / code We extend MutualNet to learn adaptive video models and conduct more analyses.
	GradAug: A New Regularization Method for Deep Neural Networks Taojiannan Yang, Sijie Zhu, Chen Chen. Neural Information Processing Systems (NeurIPS), 2020 paper / code A well-generalized network should make predictions consistent with its subnetworks given differently augmented samples.
	MutualNet: Adaptive ConvNet via Mutual Learning from Network Width and Resolution Taojiannan Yang, Sijie Zhu, Chen Chen, Shen Yan, Mi Zhang, Andrew Willis. European Conference on Computer Vision (ECCV), 2020 (Oral Presentation, 104 out of 5205) paper / code We learn networks that can run at different widths and resolutions to meet different resource budegets during runtime.
	FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER Ce Zheng, Matias, Mendieta, Taojiannan Yang, Guojun Qi, Chen Chen. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023 paper / code An efficent method for human pose estimation and mesh reconstruction.
	3D Human Pose Estimation with Spatial and Temporal Transformers Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding. International Conference on Computer Vision (ICCV), 2021 paper / code A spatial-temporal transformer structure for 3D human pose estimation.
	Visual Explanation for Deep Metric Learning Sijie Zhu, Taojiannan Yang, Chen Chen. IEEE Transactions on Image Processing (TIP), 2021 paper / code We visualize point-to-point activation intensity between two images.
	VIGOR: Cross-View Image Geo-localization beyond One-to-one Retrieval Sijie Zhu, Taojiannan Yang, Chen Chen. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021 paper / code A new benchmark for more realistic cross-view image geo-localization.
	Towards Resolving the Challenge of Long-tail Distribution in UAV Images for Object Detection Weiping Yu, Taojiannan Yang, Chen Chen. Winter Conference on Applications of Computer Vision (WACV), 2021 paper / code We point out the long-tail distribution problem in UAV images and propose a new method to address it.

Service

Reviewer, IEEE Transaction on Pattern Analysis and Machine Intelligence
Reviewer, IEEE Transaction on Image Processing
Reviewer, CVPR 2022, 2023
Reviewer, ECCV 2022
Reviewer, ICCV 2021, 2023
Reviewer, NeurIPS 2021, 2022
Reviewer, ICLR 2022, 2023
Reviewer, ICML 2022
Volunteer, NeurIPS 2020

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.