Research
My research interests are Computer Vision, AIGC, and Vision Language Models. Specifically, I focus on:
- 3D Vision, Human Pose Estimation and Mesh Recovery,
- Efficient Networks,
- Generative AI (Diffusion-based Generation/Synthesis),
- Vision Language Models for Human Understanding,
- ...
Below is a selected list of my works. The full publication can be found on my Google Scholar page. If you are interested in these topics and want to work with me, please don't hesitate to reach out to me via email.
|
Work/Internship Experience
|
|
Research Intern
Innopeak Tech, Seattle, USA. Summer 2022
Mentor: Guo-Jun Qi,
Human mesh recovery for single images.
|
Recent News
-
2024-04: I joined the Robotics Institute, Carnegie Mellon University as a Postdoctoral Fellow!
-
2024-02: Two papers (MvACon and Domain Generalization 3D HPE) are accepted by CVPR 2024!
-
2024-02: My PhD dissertation won Outstanding Dissertation Award (University-wise)!
-
2023-12: I pass my PhD dissertation defense!
-
2023-09: One paper (Context-aware PoseFormer) is accepted by NeurIPS 2023!
-
2023-07: Two papers (MonoXiver and Source-free DA HPE) are accepted by ICCV 2023!
-
2023-05: Our human pose estimation survey paper is accepted by ACM Computing Surveys (IF=14.3)!
-
2023-02: Three papers (FeatER, POTTER, PoseFormerV2) are accepted by CVPR 2023!
|
|
VITA: ViT Acceleration for Efficient 3D Human Mesh Recovery via Hardware-Algorithm Co-Design
Shilin Tian,
Chase Szafranski,
Ce Zheng,
Fan Yao,
Ahmed Louri,
Chen Chen,
Hao Zheng.
Design Automation Conference(DAC), 2024
paper  / 
VITA, a hardware and algorithm co-design framework for ViT-based HMR with improved performance and energy efficiency
|
|
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Qucheng Peng,
Ce Zheng,
Chen Chen.
IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2024
paper  / 
code  / 
we propose a novel dual-augmentor framework designed to enhance domain generalization in 3D human pose estimation.
|
|
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu,
Ce Zheng,
Ming Qian,
Nan Xue,
Chen Chen,
Zhebin Zhang,
Chen Li,
Tianfu Wu.
IEEE Conference on Computer Vision and Pattern Recognition(CVPR), 2024
paper  / 
Project page  / 
Multi-View Attentive Contextualization (MvACon), a simple yet effective method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object detection.
|
|
Context-Aware PoseFormer: Single Image Beats Hundreds for 3D Human Pose Estimation
Qitao Zhao
Ce Zheng,
Mengyuan Liu,
Chen Chen.
Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS), 2023
paper  / 
project page
We revisit the 2D-3D lifting pipeline, leveraging the readily available intermediate visual representations.
|
|
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver
Xianpeng Liu,
Ce Zheng,
Kelvin Cheng,
Nan Xue,
Guo-Jun Qi,
Tianfu Wu.
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
paper  / 
Code
MonoXiver: leverages the self-attention mechanism for proposal verification, and ultimately delivers high-quality 3D box predictions.
|
|
Source-free Domain Adaptive Human Pose Estimation
Qucheng Peng,
Ce Zheng,
Chen Chen.
IEEE/CVF International Conference on Computer Vision (ICCV), 2023
paper  / 
Code
we propose source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process.
|
|
POSTER: A Pyramid Cross-Fusion Transformer Network for Facial Expression Recognition
Ce Zheng,
Matias Mendieta,
Chen Chen.
ICCV Workshop, 2023
paper  / 
code
we propose a two-stream Pyramid crOss-fuSion TransformER network (POSTER) for facial expression recognition.
|
|
POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery
Ce Zheng,
Xianpeng Liu,
Guo-Jun Qi,
Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  / 
project page
A lightweight pure transformer architecture named POoling aTtention TransformER (POTTER) for the HMR task from single images.
|
|
FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER
Ce Zheng,
Matias Mendieta,
Taojiannan Yang,
Guo-Jun Qi,
Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  / 
project page
An efficient transformer-based method for human pose estimation and mesh reconstruction.
|
|
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Qitao Zhao,
Ce Zheng,
Mengyuan Liu,
Pichao Wang,
Chen Chen.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
paper  / 
project page
An extension of our PoseFormer paper to improve robustness.
|
|
Part Aware Contrastive Learning for Self-Supervised Action Recognition
Yilei Hua,
Wenhan Wu,
Ce Zheng,
Aidong Lu,
Mengyuan Liu,
Chen Chen,
Shiqian Wu.
The International Joint Conference on Artificial Intelligence (IJCAI) , 2023
paper  / 
code
SkeAttnCLR: An attention-based contrastive learning framework for skeleton representation learning.
|
|
LAMP: Leveraging Language Prompts for Multi-person Pose Estimation
Shengnan Hu,
Ce Zheng,
Zixiang Zhou,
Chen Chen,
Gita Sukthankar.
The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , 2023
paper  / 
code
An end-to-end pipeline that leverages both instance and joint cues from the language model for occluded pose estimation.
|
|
A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose
Ce Zheng,
Matias Mendieta,
Pu Wang,
Aidong Lu,
Chen Chen.
ACM International Conference on Multimedia(ACM MM), 2022
paper  / 
code
A lightweight pose-based method that can reconstruct human mesh from 2D human pose
|
|
3D Human Pose Estimation with Spatial and Temporal Transformers
Ce Zheng,
Sijie Zhu,
Matias Mendieta,
Taojiannan Yang,
Chen Chen,
Zhengming Ding.
IEEE/CVF International Conference on Computer Vision (ICCV), 2021
paper  / 
code
PoseFormer: a spatial-temporal transformer (the first transformer-based) structure for 3D human pose estimation in videos.
|
|
Exploiting Multi-view Part-wise Correlation via an Efficient Transformer for Vehicle Re-Identification
Ming Li,
Jun Liu,
Ce Zheng,
Xinming Huang,
Ziming Zhang.
IEEE Transactions on Multimedia (TMM) , 2021
paper  / 
The first transformer-driven framework to capture comprehensive instance codes from multiple view images for vehicle ReID.
|
|
Deep Learning-Based Human Pose Estimation: A Survey
Ce Zheng*,
Wenhan Wu*,
Chen Chen,
Taojiannan Yang,
Sijie Zhu,
Ju Shen,
Nasser Kehtarnavaz,
Mubarak Shah.
(ACM Computing Surveys, IF=14.32 ), 2023
paper  / 
Porject page
A comprehensive survey for 2D and 3D Human Pose Estimation.
|
|
LodoNet: A Deep Neural Network with 2D Keypoint Matching for 3D LiDAR Odometry Estimation
Ce Zheng,
Yecheng Lyu,
Ming Li,
Ziming Zhang.
ACM International Conference on Multimedia(ACM MM), 2020
paper  / 
A new approach that extracts the matched 2D keypoint pairs(MKPs) for 3D LiDAR Odometry.
|
|
SignLLM: Sign Languages Production Large Language Models
Sen Fang,
Lei Wang,
Ce Zheng,
Yapeng Tian,
Chen Chen.
arXiv, 2024
paper  / 
project page
we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt.
|
|
DDT: A Diffusion-Driven Transformer-based Framework for Human Mesh Recovery from a Video
Ce Zheng,
Guo-Jun Qi,
Chen Chen.
arXiv, 2023
paper  / 
project page
A Diffusion-Driven Transformer-based Framework to decode specific motion patterns from the input sequence.
|
|
Reviewer: TPAMI, IJCV, TIP, TCSVT, CVIU, TNNLS, Neurocomputing, Neural Networks
Reviewer: CVPR, ICCV, NeurIPS, ICLR, SIGGRAPH Asia, ACM MM,
|
Feel free to steal this website's source code. Do not scrape the HTML from this page itself, as it includes analytics tags that you do not want on your own website — use the github code instead. Also, consider using Leonid Keselman's Jekyll fork of this page.
|
|