High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring

Sahin, Aykut

High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring

dc.contributor.author	Sahin, Aykut	en
dc.contributor.committeechair	Noh, Sam Hyuk	en
dc.contributor.committeechair	Butt, Ali	en
dc.contributor.committeemember	Li, Huaicheng	en
dc.contributor.department	Computer Science and#38; Applications	en
dc.date.accessioned	2026-06-02T08:01:46Z	en
dc.date.available	2026-06-02T08:01:46Z	en
dc.date.issued	2026-06-01	en
dc.description.abstract	Multi-stream video inference systems typically employ a thread-heavy architecture as the underlying infrastructure. At scale, this model suffers from CPU migrations, context switch storms, synchronization overhead, cache thrashing, TLB pollution, and unpredictable latency. This thesis presents a thread-per-core alternative that replaces OS-managed thread scheduling with user-space coroutine scheduling. Our system combines C++20 stackless coroutines for cooperative multitasking, Linux io_uring for asynchronous I/O, and non-blocking GPU completions for asynchronous inference requests. Benchmarking across four scale factors and four execution modes on an AMD EPYC / NVIDIA A2 platform with perf stat and NVIDIA Nsight Systems profiling, our architecture achieves up to 13.6% higher throughput, 365x fewer CPU migrations, 2.48x fewer context switches, 2.7x fewer page faults, and 16.9% reduction in total CPU work compared to a properly optimized threaded baseline. Both architectures converge within 1% GPU utilization.	en
dc.description.abstractgeneral	Modern applications such as autonomous vehicles, smart city surveillance and industrial inspection require computers to analyze many video streams simultaneously using artificial intelligence. The standard approach relies on the operating system to share the resources between the video streams. When the number of streams greatly exceeds the underlying processors, the operating system spends significant effort solely on management, wasting computational resources that could be used for actual video analysis. This thesis develops an alternative approach that shifts much of the management from the operating system to the user-space using modern C++ language and Linux features, essentially bypassing the components causing system overhead at scale. The results suggest that our proposed work can be a more efficient foundation for large-scale video analysis systems than traditional approaches.	en
dc.description.degree	Master of Science	en
dc.format.medium	ETD	en
dc.identifier.other	vt_gsexam:46476	en
dc.identifier.uri	https://hdl.handle.net/10919/143230	en
dc.language.iso	en	en
dc.publisher	Virginia Tech	en
dc.rights	In Copyright	en
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	en
dc.subject	Video Inference	en
dc.subject	C++ Coroutines	en
dc.subject	User-Space Scheduling	en
dc.subject	io_uring	en
dc.subject	AI Infrastructure	en
dc.title	High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring	en
dc.type	Thesis	en
thesis.degree.discipline	Computer Science & Applications	en
thesis.degree.grantor	Virginia Polytechnic Institute and State University	en
thesis.degree.level	masters	en
thesis.degree.name	Master of Science	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Sahin_A_T_2026.pdf
Size:: 594.13 KB
Format:: Adobe Portable Document Format

Download

Collections

Masters Theses