Ce Zheng

I am currently a postdoctoral fellow at the Robotics Institute, Carnegie Mellon University, working under the guidance of Prof. László A. Jeni. I obtained my Ph.D. degree at Center for Research in Computer Vision (CRCV) at University of Central Florida (UCF) under the supervision of Prof. Chen Chen.

Before joining UCF, I obtained my Master's degree at Tufts University in Aug 2019, advised by Prof. Shuchin Aeron and Prof. Eric Miller. I received my Bachelor's Degree at University of Bridgeport and Wuhan University of Science and Technology in June 2016.

Email  /  Google Scholar  /  Github  /  Resume  / 

profile photo
Research

My research interests are Computer Vision, AIGC, and Vision Language Models. Specifically, I focus on:

  • 3D Vision, Human Pose Estimation and Mesh Recovery,
  • Efficient Networks,
  • Generative AI (Diffusion-based Generation/Synthesis),
  • Vision Language Models for Human Understanding,
  • ...

Below is a selected list of my works. The full publication can be found on my Google Scholar page. If you are interested in these topics and want to work with me, please don't hesitate to reach out to me via email.

Work/Internship Experience
Research Intern
Innopeak Tech, Seattle, USA. Summer 2022
Mentor: Guo-Jun Qi,

Human mesh recovery for single images.

Recent News
  • 2024-02: Two papers (MvACon and Domain Generalization 3D HPE) are accepted by CVPR 2024!
  • 2024-02: My PhD dissertation won Outstanding Dissertation Award (University-wise)!
  • 2023-12: I pass my PhD dissertation defense!
  • 2023-09: One paper (Context-aware PoseFormer) is accepted by NeurIPS 2023!
  • 2023-07: Two papers (MonoXiver and Source-free DA HPE) are accepted by ICCV 2023!
  • 2023-05: Our human pose estimation survey paper is accepted by ACM Computing Surveys (IF=14.3)!
  • 2023-02: Three papers (FeatER, POTTER, PoseFormerV2) are accepted by CVPR 2023!

Published papers
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
Ce Zheng, Xianpeng Liu, Guo-Jun Qi, Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  /  project page

A lightweight pure transformer architecture named POoling aTtention TransformER (POTTER) for the HMR task from single images.

FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER
Ce Zheng, Matias Mendieta, Taojiannan Yang, Guo-Jun Qi, Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  /  project page

An efficient transformer-based method for human pose estimation and mesh reconstruction.

Context-Aware PoseFormer: Single Image Beats Hundreds for 3D Human Pose Estimation
Qitao Zhao Ce Zheng, Mengyuan Liu, Chen Chen.
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
paper  /  project page

We revisit the 2D-3D lifting pipeline, leveraging the readily available intermediate visual representations.

PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Qitao Zhao, Ce Zheng, Mengyuan Liu, Pichao Wang, Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  /  project page

An extension of our PoseFormer paper to improve robustness.

Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver
Xianpeng Liu, Ce Zheng, Kelvin Cheng, Nan Xue, Guo-Jun Qi, Tianfu Wu.
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
paper  /  Code

MonoXiver: leverages the self-attention mechanism for proposal verification, and ultimately delivers high-quality 3D box predictions.

Source-free Domain Adaptive Human Pose Estimation
Qucheng Peng, Ce Zheng, Chen Chen.
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
paper  /  Code

we propose source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process.

POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition
Ce Zheng, Matias Mendieta, Chen Chen.
ICCV Workshop, 2023
paper  /  code

we propose a two-stream Pyramid crOss-fuSion TransformER network (POSTER) for facial expression recognition.

A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose
Ce Zheng, Matias Mendieta, Pu Wang, Aidong Lu, Chen Chen.
ACM International Conference on Multimedia(ACM MM), 2022
paper  /  code

A lightweight pose-based method that can reconstruct human mesh from 2D human pose

3D Human Pose Estimation with Spatial and Temporal Transformers
Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen, Zhengming Ding.
IEEE/CVF International Conference on Computer Vision (ICCV), 2021
paper  /  code

PoseFormer: a spatial-temporal transformer (the first transformer-based) structure for 3D human pose estimation in videos.

Deep Learning-Based Human Pose Estimation: A Survey
Ce Zheng*, Wenhan Wu*, Chen Chen, Taojiannan Yang, Sijie Zhu, Ju Shen, Nasser Kehtarnavaz, Mubarak Shah.
(ACM Computing Surveys, IF=14.32 ), 2023
paper  /  Porject page

A comprehensive survey for 2D and 3D Human Pose Estimation.

LodoNet: A Deep Neural Network with 2D Keypoint Matching for 3D LiDAR Odometry Estimation
Ce Zheng, Yecheng Lyu, Ming Li, Ziming Zhang.
ACM International Conference on Multimedia(ACM MM), 2020
paper  / 

A new approach that extracts the matched 2D keypoint pairs(MKPs) for 3D LiDAR Odometry.

Under-review and other collaborated publications
Part Aware Contrastive Learning for Self-Supervised Action Recognition
Yilei Hua, Wenhan Wu, Ce Zheng, Aidong Lu, Mengyuan Liu, Chen Chen, Shiqian Wu.
The International Joint Conference on Artificial Intelligence (IJCAI) , 2023
paper  /  code

SkeAttnCLR: An attention-based contrastive learning framework for skeleton representation learning.

LAMP: Leveraging Language Prompts for Multi-person Pose Estimation
Shengnan Hu, Ce Zheng, Zixiang Zhou, Chen Chen, Gita Sukthankar.
The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2023
paper  /  code

An end-to-end pipeline that leverages both instance and joint cues from the language model for occluded pose estimation.

DDT: A Diffusion-Driven Transformer-based Framework for Human Mesh Recovery from a Video
Ce Zheng, Guo-Jun Qi, Chen Chen.
arXiv, 2023
paper  /  project page

A Diffusion-Driven Transformer-based Framework to decode specific motion patterns from the input sequence.

Exploiting Multi-view Part-wise Correlation via an Efficient Transformer for Vehicle Re-Identification
Ming Li, Jun Liu, Ce Zheng, Xinming Huang, Ziming Zhang.
IEEE Transactions on Multimedia (TMM) , 2021
paper  / 

The first transformer-driven framework to capture comprehensive instance codes from multiple view images for vehicle ReID.

Service
Reviewer: TPAMI, IJCV, TIP, TCSVT, CVIU, TNNLS, Neurocomputing, Neural Networks
Reviewer: CVPR, ICCV, NeurIPS, ICLR, SIGGRAPH Asia, ACM MM,

Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.