Attentive Multi-Scale Features with Adaptive Context PoseResNet for Resource-Efficient Human Pose Estimation

Takahashi HIROKI

doi:10.3390/electronics14112107

戻る

Attentive Multi-Scale Features with Adaptive Context PoseResNet for Resource-Efficient Human Pose Estimation

ジャーナル論文 - rm_published_papers: Scientific Journal

査読済み

Attentive Multi-Scale Features with Adaptive Context PoseResNet for Resource-Efficient Human Pose Estimation

Takahashi HIROKI

Electronics, Vol.14(11), ページ2107-2107

22/05/2025

DOI: https://doi.org/10.3390/electronics14112107

抄録

Human Pose Estimation (HPE) remains challenging due to scale variation, occlusion, and high computational costs. Standard methods often struggle to capture detailed spatial information when keypoints are obscured, and they typically rely on computationally expensive deconvolution layers for upsampling, making them inefficient for real-time or resource-constrained scenarios. We propose AMFACPose (Attentive Multi-scale Features with Adaptive Context PoseResNet) to address these limitations. Specifically, our architecture incorporates Coordinate Convolution 2D (CoordConv2d) to retain explicit spatial context, alleviating the loss of coordinate information in conventional convolutions. To reduce computational overhead while maintaining accuracy, we utilize Depthwise Separable Convolutions (DSCs), separating spatial and pointwise operations. At the core of our approach is an Adaptive Feature Pyramid Network (AFPN), which replaces costly deconvolution-based upsampling by efficiently aggregating multi-scale features to handle diverse human poses and body sizes. We further introduce Dual-Gate Context Blocks (DGCBs) that refine global context to manage partial occlusions and cluttered backgrounds. The model integrates Squeeze-and-Excitation (SE) blocks and the Spatial–Channel Refinement Module (SCRM) to emphasize the most informative feature channels and spatial regions, which is particularly beneficial for occluded or overlapping keypoints. For precise keypoint localization, we replace dense heatmap predictions with coordinate classification using Multi-Layer Perceptron (MLP) heads. Experiments on the COCO and CrowdPose datasets demonstrate that AMFACPose surpasses the existing 2D HPE methods in both accuracy and computational efficiency. Moreover, our implementation on edge devices achieves real-time performance while preserving high accuracy, confirming the suitability of AMFACPose for resource-constrained pose estimation in both benchmark and real-world environments.

ファイルとリンク (2)

url

https://doi.org/10.3390/electronics14112107表示

is_downloadable: True

url

https://www.mdpi.com/2079-9292/14/11/2107/pdf表示

is_downloadable: False

メトリック

2 レコードビュー

詳細

タイトル: Attentive Multi-Scale Features with Adaptive Context PoseResNet for Resource-Efficient Human Pose Estimation
作成者 – 役職なし: Takahashi HIROKI
出版物に関する詳細: Electronics, Vol.14(11), ページ2107-2107
出版者: MDPI AG
ID: 991002621780507421
組織: The University of Electro-Communications
言語: 英語
資料タイプ: ジャーナル論文
リソースのサブタイプ: rm_published_papers: Scientific Journal