High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring
| dc.contributor.author | Sahin, Aykut | en |
| dc.contributor.committeechair | Noh, Sam Hyuk | en |
| dc.contributor.committeechair | Butt, Ali | en |
| dc.contributor.committeemember | Li, Huaicheng | en |
| dc.contributor.department | Computer Science and#38; Applications | en |
| dc.date.accessioned | 2026-06-02T08:01:46Z | en |
| dc.date.available | 2026-06-02T08:01:46Z | en |
| dc.date.issued | 2026-06-01 | en |
| dc.description.abstract | Multi-stream video inference systems typically employ a thread-heavy architecture as the underlying infrastructure. At scale, this model suffers from CPU migrations, context switch storms, synchronization overhead, cache thrashing, TLB pollution, and unpredictable latency. This thesis presents a thread-per-core alternative that replaces OS-managed thread scheduling with user-space coroutine scheduling. Our system combines C++20 stackless coroutines for cooperative multitasking, Linux io_uring for asynchronous I/O, and non-blocking GPU completions for asynchronous inference requests. Benchmarking across four scale factors and four execution modes on an AMD EPYC / NVIDIA A2 platform with perf stat and NVIDIA Nsight Systems profiling, our architecture achieves up to 13.6% higher throughput, 365x fewer CPU migrations, 2.48x fewer context switches, 2.7x fewer page faults, and 16.9% reduction in total CPU work compared to a properly optimized threaded baseline. Both architectures converge within 1% GPU utilization. | en |
| dc.description.abstractgeneral | Modern applications such as autonomous vehicles, smart city surveillance and industrial inspection require computers to analyze many video streams simultaneously using artificial intelligence. The standard approach relies on the operating system to share the resources between the video streams. When the number of streams greatly exceeds the underlying processors, the operating system spends significant effort solely on management, wasting computational resources that could be used for actual video analysis. This thesis develops an alternative approach that shifts much of the management from the operating system to the user-space using modern C++ language and Linux features, essentially bypassing the components causing system overhead at scale. The results suggest that our proposed work can be a more efficient foundation for large-scale video analysis systems than traditional approaches. | en |
| dc.description.degree | Master of Science | en |
| dc.format.medium | ETD | en |
| dc.identifier.other | vt_gsexam:46476 | en |
| dc.identifier.uri | https://hdl.handle.net/10919/143230 | en |
| dc.language.iso | en | en |
| dc.publisher | Virginia Tech | en |
| dc.rights | In Copyright | en |
| dc.rights.uri | http://rightsstatements.org/vocab/InC/1.0/ | en |
| dc.subject | Video Inference | en |
| dc.subject | C++ Coroutines | en |
| dc.subject | User-Space Scheduling | en |
| dc.subject | io_uring | en |
| dc.subject | AI Infrastructure | en |
| dc.title | High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring | en |
| dc.type | Thesis | en |
| thesis.degree.discipline | Computer Science & Applications | en |
| thesis.degree.grantor | Virginia Polytechnic Institute and State University | en |
| thesis.degree.level | masters | en |
| thesis.degree.name | Master of Science | en |
Files
Original bundle
1 - 1 of 1