Human Pose Estimation and Algorithms for Alignment and Registration Problems: Applications in Robotics, Computer Vision, and Stroke Rehabilitation

Sarker, Anik

Human Pose Estimation and Algorithms for Alignment and Registration Problems: Applications in Robotics, Computer Vision, and Stroke Rehabilitation

dc.contributor.author	Sarker, Anik	en
dc.contributor.committeechair	Asbeck, Alan Thomas	en
dc.contributor.committeemember	Komendera, Erik	en
dc.contributor.committeemember	Buehrer, Richard M.	en
dc.contributor.committeemember	Losey, Dylan Patrick	en
dc.contributor.department	Mechanical Engineering	en
dc.date.accessioned	2025-10-28T08:00:12Z	en
dc.date.available	2025-10-28T08:00:12Z	en
dc.date.issued	2025-10-27	en
dc.description.abstract	Estimating human motion and inter-frame orientation reliably, efficiently, and with minimal instrumentation is central to robotics, computer vision, and rehabilitation. This dissertation develops a set of learning- and geometry-driven methods spanning (i) sparse-IMU upper-body 3D pose estimation with deep sequence models, and (ii) fast, correspondence-free alignment on the sphere (S2) and on the rotation group (SO(3)) with applications to sensor calibration and robot world–hand–eye (RWHE) problems. Chapter 2: Sparse-IMU upper-body pose estimation with deep sequence models. We study the problem of reconstructing upper-body kinematics from three wearable IMUs (sternum and bilateral forearms) for stroke rehabilitation. Using a synchronized XSens MVN system as reference, we introduce a cross-sensor mapping that transfers standalone XSens DOT measurements into the MVN coordinate system. Two mapping regimes are investi- gated: a variable (session-specific) mapping and a fixed mapping obtained by quaternion averaging across sessions. On top of these mapped signals, we train and evaluate a family of deep sequence models—Seq2Seq, Seq2Seq with BiRNN and attention, a Transformer encoder, and a full Transformer—to infer multi-joint upper-body orientations (15 segments) from the three-IMU streams. Transformers, particularly the full encoder–decoder variant, achieve state-of-the-art accuracy and robustness across participants and tasks, outperforming re- current baselines while maintaining deployment-friendly throughput. The results quantify trade-offs between model class, IMU placement (two configurations), and mapping regime, and demonstrate that a fixed mapping retains accuracy while enabling practical re-use across recording sessions. Chapters 3–4: Fast, correspondence-free alignment on S2 and SO(3). We recast set alignment of orientations as alignment of Transformed Basis Vectors (TBVs): each rotation yields three unit vectors on S2, producing three spherical point clouds per dataset. Build- ing on this representation, we develop (and rigorously analyze) three spherical matchers— SPMC (Spherical Pattern Matching by Correlation), FRS (Fast Rotation Search), and a hybrid SPMC+FRS—that estimate relative rotation in linear time O(n) and without point- wise correspondences. These methods are robust under severe contamination (empirically up to 90% outliers) and avoid the cubic-log scaling of FFT-based spherical/SO(3) cross- correlation. We lift these spherical matchers to SO(3) by aligning TBV triplets per axis and projecting the fused estimate back to the group, yielding SO3_SPMC, SO3_FRS, and SO3_SPMC_FRS for axis-consistent settings (e.g., homogeneous IMUs from the same vendor). To handle axis-ambiguous scenarios (cross-vendor frames, world/hand–eye), we make the pipeline Permutation-and-Sign Invariant by enumerating signed permutations L = P S of axes and selecting the maximizer over 24 proper hypotheses (det L = +1). This preserves the O(n) profile: the heavy spherical matches (18 axis/sign pairs) are computed once and re-used during hypothesis scoring. We demonstrate PASI-SO(3) alignment as a drop-in rotational initializer (or stand-alone estimator) for RWHE calibration on real data (ETH robot_arm_real), using raw, unpaired trajectories without time alignment. Chapter 5: Motion-driven, axis-consistent automatic orientation calibration for wearable IMUs. We target session- and subject-specific orientation drift that arises in wearable IMUs due to don/doff variability and strap placement, even when devices share the same vendor axis convention. The key idea is to calibrate from motion: we maintain a library of ground-truth SO(3) motion signatures (walking, sit-to-stand, reach, etc.) collected previously in a canonical frame; at the start of a new session, the user performs a brief activity sequence, producing an unpaired SO(3) set. Assuming axis consistency, we apply SO3_SPMC to align the session's TBV distributions to the canonical signatures and re- cover the session's global rotational offset in a single, correspondence-free step. We show: (i) calibration is accurate and repeatable across participants and days; (ii) multi-activity signatures increase identifiability and reduce bias; (iii) runtime is linear in sequence length (tens of milliseconds for typical windows); and (iv) the calibrated orientations improve down- stream pose estimation and cross-session comparability without magnetometer reliance or time alignment. Empirical validation and impact. Across synthetic and real evaluations, the proposed methods deliver: (i) accurate upper-body pose reconstruction from minimal IMU instru- mentation with Transformer models and a reusable cross-sensor mapping; (ii) fast, robust S2/SO(3) alignment that scales linearly and tolerates extreme outliers; and (iii) a practical, motion-driven, axis-consistent calibration procedure for wearable IMUs that removes session/subject orientation bias from brief activity snippets. Together, these contributions pro- vide a cohesive toolkit—learning for motion inference and geometry for fast, correspondence-free alignment—that enables calibration-light monitoring and cross-device fusion in wearable sensing, robotics, and vision.	en
dc.description.abstractgeneral	Our bodies are constantly in motion, but measuring that motion accurately outside a labora- tory is hard. Cameras raise privacy concerns and often fail in everyday settings; markers and special suits are inconvenient; and the tiny sensors built into wearables (like smartwatches or shoe pods) are often worn at slightly different angles each day. This thesis explores how to get precise, useful information about human movement using only a few inexpensive wear- able sensors—no cameras, no markers—by combining modern machine learning with careful geometry. The first part of the work tackles a practical challenge: estimating the posture of the upper body (for example, trunk and shoulders) using only a handful of small motion sensors called inertial measurement units (IMUs). An IMU contains accelerometers and gyroscopes that measure how a device moves and rotates. We design transformer-based neural networks— popular models for language and time-series—that read raw IMU signals and produce a person's pose frame by frame. The models are trained to be robust to the messy realities of daily life: sensors can drift, straps can loosen, and people move in unpredictable ways. Despite using far fewer sensors than traditional systems, the networks recover meaningful pose information suitable for clinical and everyday use. The second part addresses a hidden but critical problem: calibration. Even if two people wear identical sensors, each sensor's axes (its internal "forward, left, up" directions) rarely align perfectly between sessions. Most existing systems fix this with special calibration motions or manual setups. We avoid that overhead by aligning sensors automatically from whatever natural motion is available—walking to the bus, doing desk work, or exercising—without requiring the same movement twice or matching samples one-to-one in time. The key idea is to summarize each rotation sample from a sensor by three unit arrows that represent its internal axes. Across time, these arrows form a distinctive "pattern" on the surface of a sphere. We then align two sessions by matching their spherical patterns directly. The resulting algorithms run in linear time (fast enough for long recordings), handle extreme outliers, and don't need pairings between samples. In technical terms, we contribute a family of correspondence-free and near-real-time meth- ods for aligning sets of orientations. Some variants assume the same axis naming across sessions (typical for identical wearable devices), while others explicitly handle axis relabel- ing and sign flips (common when mixing devices or connecting robot and camera frames). We demonstrate the methods on both simulated and real datasets, including a standard robotics benchmark where our approach provides a clean, fast estimate of the relative rota- tion between a robot arm and a camera using raw, unaligned trajectories. The final part puts these ideas together for motion-driven automatic orientation calibration of wearable IMUs. We treat a short recording from a trusted setup as a reusable "reference pattern," and when a person dons the sensors on a new day, we align their new motion to this reference—no special pose or countdown required. Because the alignment depends only on overall motion statistics, it works with different activities and recording lengths and naturally respects privacy (no images are needed). Why does this matter? Reliable, low-friction motion sensing unlocks new possibilities: home- based rehabilitation that adapts to how people actually wear devices, workplace ergonomics without cameras, fitness tracking that stays consistent across sessions, and human-robot collaboration that does not rely on careful instrumented environments. Methodologically, the thesis shows how to pair deep learning with geometric structure: transformers to interpret raw signals, and fast, principled alignment on the space of rotations to make those signals comparable across time, people, and devices. In short, this work reduces the gap between research-grade motion capture and everyday wearables. With minimal hardware, no cameras, and little user effort, we obtain accurate pose estimates and automatic calibration from natural movement—bringing robust motion understanding closer to the rhythms of daily life.	en
dc.description.degree	Doctor of Philosophy	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:44762	en
dc.identifier.uri	https://hdl.handle.net/10919/138780	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Human Pose Estimation	en
dc.subject	Spherical Pattern Matching	en
dc.subject	SO3 Optimization	en
dc.subject	Point Cloud Registration	en
dc.subject	Orientation Alignment	en
dc.title	Human Pose Estimation and Algorithms for Alignment and Registration Problems: Applications in Robotics, Computer Vision, and Stroke Rehabilitation	en
dc.type	Dissertation	en
thesis.degree.discipline	Mechanical Engineering	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	doctoral	en
thesis.degree.name	Doctor of Philosophy	en