POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

1 Center for Research in Computer Vision, University of Central Florida
2 North Carolina State University
3 OPPO Seattle Research Center, USA 4 Westlake University
CVPR 2023



Abstract

Transformer architectures have achieved SOTA performance on the human mesh recovery (HMR) from monocular images. However, the performance gain has come at the cost of substantial memory and computational overhead. A lightweight and efficient model to reconstruct accurate human mesh is needed for real-world applications. In this paper, we propose a pure transformer architecture named POoling aTtention TransformER (POTTER) for the HMR task from single images. Observing that the conventional attention module is memory and computationally expensive, we propose an efficient pooling attention module, which significantly reduces the memory and computational cost without sacrificing performance. Furthermore, we design a new transformer architecture by integrating a High-Resolution (HR) stream for the HMR task. The high-resolution local and global features from the HR stream can be utilized for recovering more accurate human mesh. Our POTTER outperforms the SOTA method METRO by only requiring 7% of total parameters and 14% of the Multiply-Accumulate Operations on the Human3.6M and 3DPW datasets.


POTTER







Results of image classification task




Results of human mesh recovery






Mesh visualization



Frame-by-frame reconstruction for the video input



Qualitative comparison with SOTA method METRO



Hand mesh visualization





Video


Bibtex


            @InProceedings{zheng2023potter,
                title={POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery},
                author={Zheng, Ce and Liu, Xianpeng and Qi, Guo-Jun and Chen, Chen},
                booktitle ={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
                year={2023}
            }
        

This webpage template was adapted from here.