Human Pose Estimation and Algorithms for Alignment and Registration Problems: Applications in Robotics, Computer Vision, and Stroke Rehabilitation
Files
TR Number
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Estimating human motion and inter-frame orientation reliably, efficiently, and with minimal instrumentation is central to robotics, computer vision, and rehabilitation. This dissertation develops a set of learning- and geometry-driven methods spanning (i) sparse-IMU upper-body 3D pose estimation with deep sequence models, and (ii) fast, correspondence-free alignment on the sphere (S2) and on the rotation group (SO(3)) with applications to sensor calibration and robot world–hand–eye (RWHE) problems. Chapter 2: Sparse-IMU upper-body pose estimation with deep sequence models. We study the problem of reconstructing upper-body kinematics from three wearable IMUs (sternum and bilateral forearms) for stroke rehabilitation. Using a synchronized XSens MVN system as reference, we introduce a cross-sensor mapping that transfers standalone XSens DOT measurements into the MVN coordinate system. Two mapping regimes are investi- gated: a variable (session-specific) mapping and a fixed mapping obtained by quaternion averaging across sessions. On top of these mapped signals, we train and evaluate a family of deep sequence models—Seq2Seq, Seq2Seq with BiRNN and attention, a Transformer encoder, and a full Transformer—to infer multi-joint upper-body orientations (15 segments) from the three-IMU streams. Transformers, particularly the full encoder–decoder variant, achieve state-of-the-art accuracy and robustness across participants and tasks, outperforming re- current baselines while maintaining deployment-friendly throughput. The results quantify trade-offs between model class, IMU placement (two configurations), and mapping regime, and demonstrate that a fixed mapping retains accuracy while enabling practical re-use across recording sessions. Chapters 3–4: Fast, correspondence-free alignment on S2 and SO(3). We recast set alignment of orientations as alignment of Transformed Basis Vectors (TBVs): each rotation yields three unit vectors on S2, producing three spherical point clouds per dataset. Build- ing on this representation, we develop (and rigorously analyze) three spherical matchers— SPMC (Spherical Pattern Matching by Correlation), FRS (Fast Rotation Search), and a hybrid SPMC+FRS—that estimate relative rotation in linear time O(n) and without point- wise correspondences. These methods are robust under severe contamination (empirically up to 90% outliers) and avoid the cubic-log scaling of FFT-based spherical/SO(3) cross- correlation. We lift these spherical matchers to SO(3) by aligning TBV triplets per axis and projecting the fused estimate back to the group, yielding SO3_SPMC, SO3_FRS, and SO3_SPMC_FRS for axis-consistent settings (e.g., homogeneous IMUs from the same vendor). To handle axis-ambiguous scenarios (cross-vendor frames, world/hand–eye), we make the pipeline Permutation-and-Sign Invariant by enumerating signed permutations L = P S of axes and selecting the maximizer over 24 proper hypotheses (det L = +1). This preserves the O(n) profile: the heavy spherical matches (18 axis/sign pairs) are computed once and re-used during hypothesis scoring. We demonstrate PASI-SO(3) alignment as a drop-in rotational initializer (or stand-alone estimator) for RWHE calibration on real data (ETH robot_arm_real), using raw, unpaired trajectories without time alignment. Chapter 5: Motion-driven, axis-consistent automatic orientation calibration for wearable IMUs. We target session- and subject-specific orientation drift that arises in wearable IMUs due to don/doff variability and strap placement, even when devices share the same vendor axis convention. The key idea is to calibrate from motion: we maintain a library of ground-truth SO(3) motion signatures (walking, sit-to-stand, reach, etc.) collected previously in a canonical frame; at the start of a new session, the user performs a brief activity sequence, producing an unpaired SO(3) set. Assuming axis consistency, we apply SO3_SPMC to align the session's TBV distributions to the canonical signatures and re- cover the session's global rotational offset in a single, correspondence-free step. We show: (i) calibration is accurate and repeatable across participants and days; (ii) multi-activity signatures increase identifiability and reduce bias; (iii) runtime is linear in sequence length (tens of milliseconds for typical windows); and (iv) the calibrated orientations improve down- stream pose estimation and cross-session comparability without magnetometer reliance or time alignment. Empirical validation and impact. Across synthetic and real evaluations, the proposed methods deliver: (i) accurate upper-body pose reconstruction from minimal IMU instru- mentation with Transformer models and a reusable cross-sensor mapping; (ii) fast, robust S2/SO(3) alignment that scales linearly and tolerates extreme outliers; and (iii) a practical, motion-driven, axis-consistent calibration procedure for wearable IMUs that removes session/subject orientation bias from brief activity snippets. Together, these contributions pro- vide a cohesive toolkit—learning for motion inference and geometry for fast, correspondence-free alignment—that enables calibration-light monitoring and cross-device fusion in wearable sensing, robotics, and vision.