High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring

Sahin, Aykut

High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring

Files

Sahin_A_T_2026.pdf (594.13 KB)

Downloads: 89

Date

2026-06-01

Authors

Sahin, Aykut

Publisher

Virginia Tech

Abstract

Multi-stream video inference systems typically employ a thread-heavy architecture as the underlying infrastructure. At scale, this model suffers from CPU migrations, context switch storms, synchronization overhead, cache thrashing, TLB pollution, and unpredictable latency. This thesis presents a thread-per-core alternative that replaces OS-managed thread scheduling with user-space coroutine scheduling. Our system combines C++20 stackless coroutines for cooperative multitasking, Linux io_uring for asynchronous I/O, and non-blocking GPU completions for asynchronous inference requests. Benchmarking across four scale factors and four execution modes on an AMD EPYC / NVIDIA A2 platform with perf stat and NVIDIA Nsight Systems profiling, our architecture achieves up to 13.6% higher throughput, 365x fewer CPU migrations, 2.48x fewer context switches, 2.7x fewer page faults, and 16.9% reduction in total CPU work compared to a properly optimized threaded baseline. Both architectures converge within 1% GPU utilization.

Keywords

Video Inference, C++ Coroutines, User-Space Scheduling, io_uring, AI Infrastructure

Persistent link

https://hdl.handle.net/10919/143230

Collections

Masters Theses

Full item page

High Performance Video Inference at Scale: Addressing System Overhead with C++ Coroutines and io_uring

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections