Conroy, Thomas Joseph2022-08-282022-08-282021-03-05vt_gsexam:29297http://hdl.handle.net/10919/111649The expectations on embedded systems have grown incredibly in recent years. Not only are there more applications for them than ever, the applications are increasingly complex, and their security is essential. To meet such demanding goals, designers and programmers are always looking for more efficient methods of computation. One technique that has gained attention over the past couple of decades is bitsliced software. In addition to high efficiency in certain situations, including block ciphers computation, it has been used in designs to resist hardware attacks. However, this technique requires both program and data to be in a specific format. This requirement makes writing bitsliced software by hand laborious and adds computational overhead to transpose the data before and after computation. This work describes a code generation tool that produces it from a higher-level description in Verilog. By supporting the synthesis of sequential circuits, this tool extends bitsliced software to parallel synchronous software. This tool is then used to implement a method for accelerating software neural network processing with reduced-precision computation on highly constrained devices. To address the data transposition overhead and to support a hardware attack-resistant architecture, a custom DMA controller is introduced that efficiently transposes the data as it transfers along with dedicated hardware for masking and redundancy generation. In combination, these tools make bitsliced software and its benefits more accessible to system designers and programmers.ETDIn CopyrightBitsliced SoftwareCode GenerationDirect Memory AccessNeural Network AccelerationSupport for Accessible Bitsliced SoftwareThesis