Human Pose Estimation with Deep Learning

Human Pose Estimation with Deep Learning

Images are cropped around the predicted joint and fed to the next stage, in this way the subsequent pose regressors see higher resolution images and thus learn features for finer scales which ultimately leads to higher precision. The model is trained by minimizing the Mean Squared-Error (MSE) distance of our predicted heat-map to a target heat-map (The target is a 2D Gaussian of constant variance (σ ≈ 1.5 pixels) centered at the ground-truth \((x, y)\) joint location)

g1() and g2() predict heatmaps (belief maps in the paper). The HRNet (High-Resolution Network) model has outperformed all existing methods on Keypoint Detection, Multi-Person Pose Estimation and Pose Estimation tasks in the COCO dataset and is the most recent.

Source: blog.nanonets.com