High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring

dc.contributor.authorSahin, Aykuten
dc.contributor.committeechairNoh, Sam Hyuken
dc.contributor.committeechairButt, Alien
dc.contributor.committeememberLi, Huaichengen
dc.contributor.departmentComputer Science and#38; Applicationsen
dc.date.accessioned2026-06-02T08:01:46Zen
dc.date.available2026-06-02T08:01:46Zen
dc.date.issued2026-06-01en
dc.description.abstractMulti-stream video inference systems typically employ a thread-heavy architecture as the underlying infrastructure. At scale, this model suffers from CPU migrations, context switch storms, synchronization overhead, cache thrashing, TLB pollution, and unpredictable latency. This thesis presents a thread-per-core alternative that replaces OS-managed thread scheduling with user-space coroutine scheduling. Our system combines C++20 stackless coroutines for cooperative multitasking, Linux io_uring for asynchronous I/O, and non-blocking GPU completions for asynchronous inference requests. Benchmarking across four scale factors and four execution modes on an AMD EPYC / NVIDIA A2 platform with perf stat and NVIDIA Nsight Systems profiling, our architecture achieves up to 13.6% higher throughput, 365x fewer CPU migrations, 2.48x fewer context switches, 2.7x fewer page faults, and 16.9% reduction in total CPU work compared to a properly optimized threaded baseline. Both architectures converge within 1% GPU utilization.en
dc.description.abstractgeneralModern applications such as autonomous vehicles, smart city surveillance and industrial inspection require computers to analyze many video streams simultaneously using artificial intelligence. The standard approach relies on the operating system to share the resources between the video streams. When the number of streams greatly exceeds the underlying processors, the operating system spends significant effort solely on management, wasting computational resources that could be used for actual video analysis. This thesis develops an alternative approach that shifts much of the management from the operating system to the user-space using modern C++ language and Linux features, essentially bypassing the components causing system overhead at scale. The results suggest that our proposed work can be a more efficient foundation for large-scale video analysis systems than traditional approaches.en
dc.description.degreeMaster of Scienceen
dc.format.mediumETDen
dc.identifier.othervt_gsexam:46476en
dc.identifier.urihttps://hdl.handle.net/10919/143230en
dc.language.isoenen
dc.publisherVirginia Techen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectVideo Inferenceen
dc.subjectC++ Coroutinesen
dc.subjectUser-Space Schedulingen
dc.subjectio_uringen
dc.subjectAI Infrastructureen
dc.titleHigh Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uringen
dc.typeThesisen
thesis.degree.disciplineComputer Science & Applicationsen
thesis.degree.grantorVirginia Polytechnic Institute and State Universityen
thesis.degree.levelmastersen
thesis.degree.nameMaster of Scienceen

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sahin_A_T_2026.pdf
Size:
594.13 KB
Format:
Adobe Portable Document Format

Collections