VTechWorks Repository :: Browsing by Author "Athanas, Peter M."

Browsing by Author "Athanas, Peter M."

Now showing 1 - 20 of 169

An 8 GHz Ultra Wideband Transceiver Testbed
Agarwal, Deepak (Virginia Tech, 2005-10-07)
Software defined radios have the potential of changing the fundamental usage model of wireless communications devices, but the capabilities of these transceivers are often limited by the speed of the underlying processors and FPGAs. This thesis presents the digital design for an impulse-based ultra wideband communication system capable of supporting raw data rates of up to 100 MB/s. The transceiver is being developed using software/reconfigurable radio concepts and will be implemented using commercially available off-the-shelf components. The receiver uses eight 1 GHz ADCs to perform time interleaved sampling at an aggregate rate of 8 Gsamples/s. The high sampling rates present extraordinary demands on the down-conversion resources. Samples are captured by the high-speed ADC and processed using a Xilinx Virtex-II Pro (XC2VP70) FPGA. The testbed has two components: a non real-time part for data capture and signal acquisition, and a real-time part for data demodulation and signal processing. The overall objective is to demonstrate a testbed that will allow researchers to evaluate different UWB modulation, multiple access, and coding schemes. As proof-of-concept, a scaled down prototype receiver which utilized 2 ADCs and a Xilinx Virtex-II Pro (XC2VP30) FPGA was fabricated and tested.
Accelerating Incremental Floorplanning of Partially Reconfigurable Designs to Improve FPGA Productivity
Chandrasekharan, Athira (Virginia Tech, 2010-08-05)
FPGA implementation tool turnaround time has unfortunately not kept pace with FPGA density advances. It is difficult to parallelize place-and-route algorithms without sacrificing determinism or quality of results. We approach the problem in a different way for development environments in which some circuit speed and area optimization may be sacrificed for improved implementation and debug turnaround. The PATIS floorplanner enables dynamic modular design, which accelerates non-local changes to the physical layout arising from design exploration and the addition of debug circuitry. We focus in this work on incremental and speculative floorplanning in PATIS, to accommodate minor design changes and to proactively generate possible floorplan variants. Current floorplan topology is preserved to minimize ripple effects and maintain reasonable module aspect ratios. The design modules are run-time reconfigurable to enable concurrent module implementation by independent invocations of the standard FPGA tools running on separate cores or hosts.
Advanced System-Scale and Chip-Scale Interconnection Networks for Ultrascale Systems
Shalf, John Marshall (Virginia Tech, 2010-12-14)
The path towards realizing next-generation petascale and exascale computing is increasingly dependent on building supercomputers with unprecedented numbers of processors. Given the rise of multicore processors, the number of network endpoints both on-chip and off-chip is growing exponentially, with systems in 2018 anticipated to contain thousands of processing elements on-chip and billions of processing elements system-wide. To prevent the interconnect from dominating the overall cost of future systems, there is a critical need for scalable interconnects that capture the communication requirements of target ultrascale applications. It is therefore essential to understand high-end application communication characteristics across a broad spectrum of computational methods, and utilize that insight to tailor interconnect designs to the specific requirements of the underlying codes. This work makes several unique contributions towards attaining that goal. First, the communication traces for a number of high-end application communication requirements, whose computational methods include: finite-difference, lattice-Boltzmann, particle-in-cell, sparse linear algebra, particle mesh ewald, and FFT-based solvers. This thesis presents an introduction to the fit-tree approach for designing network infrastructure that is tailored to application requirements. A fit-tree minimizes the component count of an interconnect without impacting application performance compared to a fully connected network. The last section introduces a methodology for reconfigurable networks to implement fit-tree solutions called Hybrid Flexibly Assignable Switch Topology (HFAST). HFAST uses both passive (circuit) and active (packet) commodity switch components in a unique way to dynamically reconfigure interconnect wiring to suit the topological requirements of scientific applications. Overall the exploration points to several promising directions for practically addressing both the on-chip and off-chip interconnect requirements of future ultrascale systems.
Analysis of a self-contained motion capture garment for e-textiles
Lewis, Robert Alan (Virginia Tech, 2011-05-04)
Wearable computers and e-textiles are becoming increasingly widespread in today's society. Motion capture is one of the many potential applications for on-body electronic systems. Previous work has been performed at Virginia Tech's E-textiles Laboratory to design a framework for a self-contained loose fit motion capture system. This system gathers information from sensors distributed throughout the body on a "smart" garment. This thesis presents the hardware and software components of the framework, along with improvements made to it. This thesis also presents an analysis of both the on-body and off-body network communication to determine how many sensors can be supported on the garment at a given time. Finally, this thesis presents a method for determining the accuracy of the smart garment and shows how it compares against a commercially available motion capture system.
Application Benchmarks for SCMP: Single Chip Message-Passing Computer
Shah, Jignesh (Virginia Tech, 2004-05-12)
As transistor feature sizes continue to shrink, it will become feasible, and for a number of reasons more efficient, to include multiple processors on a single chip. The SCMP system being developed at Virginia Tech includes up to 64 processors on a chip, connected in a 2-D mesh. On-chip memory is included with each processor, and the architecture includes support for communication and the execution of parallel threads. As with any new computer architecture, benchmark kernels and applications are needed to guide the design and development, as well as to quantify the system performance. This thesis presents several benchmarks that have been developed for or ported to SCMP. Discussion of the benchmark algorithms and their implementations is included, as well as an analysis of the system performance. The thesis also includes discussion of the programming environment available for developing parallel applications for SCMP.
Applications of TORC: An Open Toolkit for Reconfigurable Computing
Couch, Jacob Donald (Virginia Tech, 2011-08-05)
Two research projects are proposed that rely on Tools for open Reconfigurable Computing (TORC) and the openness of the Xilinx tool chain. The first project, the Embedded FPGA Transmitter, relies on the ability to add arbitrary routes to a physical FPGA which serve no obvious purpose. These routes can then mimic an antenna and transmit directly from the FPGA. This mechanism is not supported utilizing standard hardware description languages; however, the Embedded FPGA Transmitter requires measurements on a real FPGA to determine success. The second project is a back-end tools accelerator designed to reduce the compilation time for FPGA times. As the complexity of FPGAs have exceeded over a million logic cells, the compilation problem size has greatly expanded. The open-source project, TORC, provides an excellent framework for new FPGA research that provides physical, real-world results to ensure the applicability of the research.
An Architecture Study on a Xilinx Zynq Cluster with Software Defined Radio Applications
Dobson, Christopher Vaness (Virginia Tech, 2014-07-16)
The rapid rise in computational performance offered by computer systems has greatly increased the number of practical software defined radio applications. The addition of FPGAs to these flexible systems has resulted in platforms that can address a multitude of applications with performance levels that were once only known to ASICs. This work presents an embedded heterogeneous scalable cluster platform with software defined radio applications. The Xilinx Zynq chip provides a hybrid platform consisting of an embedded ARM general-purpose processing core and a low-power FPGA. The ARM core provides all of the benefits and ease of use common to modern high-level software languages while the FPGA segment offers high performance for computationally intensive components of the application. Four of these chips were combined in a scalable cluster and a task assigner was written to automatically place data flows across the FPGAs and ARM cores. The rapid reconfiguration software tFlow was used to dynamically build arbitrary FPGA images out of a library of pre-built modules.
Architecture-Independent Design for Run-Time Reconfigurable Custom Computing Machines
Hudson, Rhett Daniel (Virginia Tech, 2000-07-20)
The configurable computing research community has provided a wealth of evidence that computational platforms based on FPGA technology are capable of cost-effectively accelerating certain kinds of computations. One actively growing area in the research community examines the benefits to computation that can be gained by reconfiguring the FPGAs in a system during the execution of an application. This technique is commonly referred to as run-time reconfiguration. Widespread acceptance of run-time reconfigurable custom computing depends upon the existence of high-level automated design tools. Given the wide variety of available platforms and the rate that the technology is evolving, a set of architecturally independent tools that provide the ability to port applications between different architectures will allow application-based intellectual property to be easily migrated between platforms. A Java implementation of such a toolset, called Janus, is presented and analyzed here. In this environment, developers create a Java class that describes the structural behavior of an application. The design framework allows hardware and software modules to be freely intermixed. During the compilation phase of the development process, the Janus tools analyze the structure of the application and adapt it to the target architecture. Janus is capable of structuring the run-time behavior of an application to take advantage of the resources available on the platform. Examples of applications developed using the toolset are presented. The performance of the applications is reported. The retargeting of applications for multiple hardware architectures is demonstrated.
Architectures for e-Textiles
Nakad, Zahi Samir (Virginia Tech, 2003-12-10)
The huge advancement in the textiles industry and the accurate control on the mechanization process coupled with cost-effective manufacturing offer an innovative environment for new electronic systems, namely electronic textiles. The abundance of fabrics in our regular life offers immense possibilities for electronic integration both in wearable and large-scale applications. Augmenting this technology with a set of precepts and a simulation environment creates a new software/hardware architecture with widely useful implementations in wearable and large-area computational systems. The software environment acts as a functional modeling and testing platform, providing estimates of design metrics such as power consumption. The construction of an electronic textile (e-textile) hardware prototype, a large-scale acoustic beamformer, provides a basis for the simulator and offers experience in building these systems. The contributions of this research focus on defining the electronic textile architecture, creating a simulation environment, defining a networking scheme, and implementing hardware prototypes.
Automatic Generation of Efficient Parallel Streaming Structures for Hardware Implementation
Koehn, Thaddeus E. (Virginia Tech, 2016-11-30)
Digital signal processing systems demand higher computational performance and more operations per second than ever before, and this trend is not expected to end any time soon. Processing architectures must adapt in order to meet these demands. The two techniques most prevalent for achieving throughput constraints are parallel processing and stream processing. By combining these techniques, significant throughput improvements have been achieved. These preliminary results apply to specific applications, and general tools for automation are in their infancy. In this dissertation techniques are developed to automatically generate efficient parallel streaming hardware architectures.
Automatically Locating Sensor Position on an E-textile Garment Via Pattern Recognition
Love, Andrew R. (Virginia Tech, 2009-09-30)
Electronic textiles are a sound platform for wearable computing. Many applications have been devised that use sensors placed on these textiles for fields such as medical monitoring and military use or for display purposes. Most of these applications require that the sensors have known locations for accurate results. Activity recognition is one application that is highly dependent on knowledge of the sensor position. Therefore, this thesis presents the design and implementation of a method whereby the location of the sensors on the electronic textile garments can be automatically identified when the user is performing an appropriate activity. The software design incorporates principle component analysis using singular value decomposition to identify the location of the sensors. This thesis presents a method to overcome the problem of bilateral symmetry through sensor connector design and sensor orientation detection. The scalability of the solution is maintained through the use of culling techniques. This thesis presents a flexible solution that allows for the fine-tuning of the accuracy of the results versus the number of valid queries, depending on the constraints of the application. The resulting algorithm is successfully tested on both motion capture and sensor data from an electronic textile garment.
Autonomous Computing Systems
Steiner, Neil Joseph (Virginia Tech, 2008-03-27)
This work discusses autonomous computing systems, as implemented in hardware, and the properties required for such systems to function. Particular attention is placed on shifting the associated complexity into the systems themselves, and making them responsible for their own resources and operation. The resulting systems present simpler interfaces to their environments, and are able to respond to changes within themselves or their environments with little or no outside intervention. This work proposes a roadmap for the development of autonomous computing systems, and shows that their individual components can be implemented with present day technology. This work further implements a proof-of-concept demonstration system that advances the state-of-the-art. The system detects activity on connected inputs, and responds to the conditions without external assistance. It works from mapped netlists, that it dynamically parses, places, routes, configures, connects, and implements within itself, at the finest granularity available, while continuing to run. The system also models itself and its resource usage, and keeps that model synchronized with the changes that it undergoes—a critical requirement for autonomous systems. Furthermore, because the system assumes responsibility for its resources, it is able to dynamically avoid resources that have been masked out, in a manner suitable for defect tolerance.
Biologically Inspired Modular Neural Networks
Azam, Farooq (Virginia Tech, 2000-05-19)
This dissertation explores the modular learning in artificial neural networks that mainly driven by the inspiration from the neurobiological basis of the human learning. The presented modularization approaches to the neural network design and learning are inspired by the engineering, complexity, psychological and neurobiological aspects. The main theme of this dissertation is to explore the organization and functioning of the brain to discover new structural and learning inspirations that can be subsequently utilized to design artificial neural network. The artificial neural networks are touted to be a neurobiologicaly inspired paradigm that emulate the functioning of the vertebrate brain. The brain is a highly structured entity with localized regions of neurons specialized in performing specific tasks. On the other hand, the mainstream monolithic feed-forward neural networks are generally unstructured black boxes which is their major performance limiting characteristic. The non explicit structure and monolithic nature of the current mainstream artificial neural networks results in lack of the capability of systematic incorporation of functional or task-specific a priori knowledge in the artificial neural network design process. The problem caused by these limitations are discussed in detail in this dissertation and remedial solutions are presented that are driven by the functioning of the brain and its structural organization. Also, this dissertation presents an in depth study of the currently available modular neural network architectures along with highlighting their shortcomings and investigates new modular artificial neural network models in order to overcome pointed out shortcomings. The resulting proposed modular neural network models have greater accuracy, generalization, comprehensible simplified neural structure, ease of training and more user confidence. These benefits are readily obvious for certain problems, depending upon availability and usage of available a priori knowledge about the problems. The modular neural network models presented in this dissertation exploit the capabilities of the principle of divide and conquer in the design and learning of the modular artificial neural networks. The strategy of divide and conquer solves a complex computational problem by dividing it into simpler sub-problems and then combining the individual solutions to the sub-problems into a solution to the original problem. The divisions of a task considered in this dissertation are the automatic decomposition of the mappings to be learned, decompositions of the artificial neural networks to minimize harmful interaction during the learning process, and explicit decomposition of the application task into sub-tasks that are learned separately. The versatility and capabilities of the new proposed modular neural networks are demonstrated by the experimental results. A comparison of the current modular neural network design techniques with the ones introduced in this dissertation, is also presented for reference. The results presented in this dissertation lay a solid foundation for design and learning of the artificial neural networks that have sound neurobiological basis that leads to superior design techniques. Areas of the future research are also presented.
Cellular Automata for Structural Optimization on Recongfigurable Computers
Hartka, Thomas Ryan (Virginia Tech, 2004-05-12)
Structural analysis and design optimization is important to a wide variety of disciplines. The current methods for these tasks require significant time and computing resources. Reconfigurable computers have shown the ability to speed up many applications, but are unable to handle efficiently the precision requirements for traditional analysis and optimization techniques. Cellular automata theory provides a method to model these problems in a format conducive to representation on a reconfigurable computer. The calculations do not need to be executed with high precision and can be performed in parallel. By implementing cellular automata simulations on a reconfigurable computer, structural analysis and design optimization can be performed significantly faster than conventional methods.
Characterization of FPGA-based High Performance Computers
Pimenta Pereira, Karl Savio (Virginia Tech, 2011-08-09)
As CPU clock frequencies plateau and the doubling of CPU cores per processor exacerbate the memory wall, hybrid core computing, utilizing CPUs augmented with FPGAs and/or GPUs holds the promise of addressing high-performance computing demands, particularly with respect to performance, power and productivity. While traditional approaches to benchmark high-performance computers such as SPEC, took an architecture-based approach, they do not completely express the parallelism that exists in FPGA and GPU accelerators. This thesis follows an application-centric approach, by comparing the sustained performance of two key computational idioms, with respect to performance, power and productivity. Specifically, a complex, single precision, floating-point, 1D, Fast Fourier Transform (FFT) and a Molecular Dynamics modeling application, are implemented on state-of-the-art FPGA and GPU accelerators. As results show, FPGA floating-point FFT performance is highly sensitive to a mix of dedicated FPGA resources; DSP48E slices, block RAMs, and FPGA I/O banks in particular. Estimated results show that for the floating-point FFT benchmark on FPGAs, these resources are the performance limiting factor. Fixed-point FFTs are important in a lot of high performance embedded applications. For an integer-point FFT, FPGAs exploit a flexible data path width to trade-off circuit cost and speed of computation, improving performance and resource utilization. GPUs cannot fully take advantage of this, having a fixed data-width architecture. For the molecular dynamics application, FPGAs benefit from the flexibility in creating a custom, tightly-pipelined datapath, and a highly optimized memory subsystem of the accelerator. This can provide a 250-fold improvement over an optimized CPU implementation and 2-fold improvement over an optimized GPU implementation, along with massive power savings. Finally, to extract the maximum performance out of the FPGA, each implementation requires a balance between the formulation of the algorithm on the platform, the optimum use of available external memory bandwidth, and the availability of computational resources; at the expense of a greater programming effort.
Chebyshev Approximation of Discrete polynomials and Splines
Park, Jae H. (Virginia Tech, 1999-11-19)
The recent development of the impulse/summation approach for efficient B-spline computation in the discrete domain should increase the use of B-splines in many applications. Because we show here how the impulse/summation approach can also be used for constructing polynomials, the approach with a search table approach for the inverse square root operation allows an efficient shading algorithm for rendering an image in a computer graphics system. The approach reduces the number of multiplies and makes it possible for the entire rendering process to be implemented using an integer processor. In many applications, Chebyshev approximation with polynomials and splines is useful in representing a stream of data or a function. Because the impulse/summation approach is developed for discrete systems, some aspects of traditional continuous approximation are not applicable. For example, the lack of the continuity concept in the discrete domain affects the definition of the local extrema of a function. Thus, the method of finding the extrema must be changed. Both forward differences and backward differences must be checked to find extrema instead of using the first derivative in the continuous domain approximation. Polynomial Chebyshev approximation in the discrete domain, just as in the continuous domain, forms a Chebyshev system. Therefore, the Chebyshev approximation process always produces a unique best approximation. Because of the non-linearity of free knot polynomial spline systems, there may be more than one best solution and the convexity of the solution space cannot be guaranteed. Thus, a Remez Exchange Algorithm may not produce an optimal approximation. However, we show that the discrete polynomial splines approximate a function using a smaller number of parameters (for a similar minimax error) than the discrete polynomials do. Also, the discrete polynomial spline requires much less computation and hardware than the discrete polynomial for curve generation when we use the impulse/summation approach. This is demonstrated using two approximated FIR filter implementations.
Cognitive RF Front-end Control
Imana, Eyosias Yoseph (Virginia Tech, 2014-12-09)
This research addresses the performance degradation in receivers due to poor selectivity. Poor selectivity is expected to be a primary limitation on the performance of Dynamic-Spectrum-Access (DSA) and millimeter wave (mmWave) technologies. Both DSA and mmWave are highly desired technologies because they can address the spectrum-deficit problem that is currently challenging the wireless industry. Accordingly, addressing poor receiver selectivity is necessary to expedite the adoption of these technologies into the main street of wireless. This research develops two receiver design concepts to enhance the performance of poorly-selective receivers. The first concept is called cognitive RF front-end control (CogRF). CogRF operates by cognitively controlling the local-oscillator and sampling frequencies in receivers. This research shows that CogRF can fulfil the objective of pre-selectors by minimizing the effects of weak and moderately-powered neighboring-channel signals on the desired signal. This research shows that CogRF can be an alternative to high-performance pre-selectors, and hence, CogRF is a viable architecture to implement reliable DSA and mmWave receivers. The theoretical design and hardware implementation of a cognitive engine and a spectrum sensor of CogRF are reported in this dissertation. Measurement results show that CogRF significantly reduces the rate of communication outage due to interference from neighboring-channel signals in poorly-selective receivers. The results also indicate that CogRF can enable a poorly-selective receiver to behave like a highly-selective receiver. The second receiver design concept addresses very strong neighboring-channel signals. The performance of poorly selective receivers can easily suffer due to a strong, unfiltered neighboring-channel signal. A strong neighboring-channel signal is likely for a DSA radio that is operating in military radar bands. Traditionally, strong neighboring signals are addressed using an Automatic-Gain-Control (AGC) that attempt to accommodate the strong received signal into the dynamic range of the receiver. However, this technique potentially desensitizes the receiver because it sacrifices the Signal-to-Noise-Ratio (SNR) of the desired signal. This research proposes the use of auxiliary-receive path to address strong neighboring-channel signals with minimal penalty on the SNR of the desired signal. Through simulation based analysis, and hardware-based measurement, this research shows that the proposed technique can provide significant improvement in the neighboring-channel-interference handling capability of the receiver.
Context Switching Strategies in a Run-Time Reconfigurable system
Puttegowda, Kiran (Virginia Tech, 2002-04-16)
A distinctive feature of run-time reconfigurable systems is the ability to change the configuration of programmable resources during execution. This opens a number of possibilities such as virtualisation of computational resources, simplified routing and in certain applications lower power. Seamless run-time reconfiguration requires rapid configuration. Commodity programmable devices have relatively long configuration time, which makes them poor candidates for run-time reconfigurable systems. Reducing this reconfiguration time to the order of nano seconds will enable rapid run-time reconfiguration. Having multiple configuration planes and switching between them while processing data is one approach towards achieving rapid reconfiguration. An experimental context switching programmable device, called the Context Switching Reconfigurable Computer (CSRC), has been created by BAE Systems, which provided opportunities to explore context-switching strategies for run-time reconfigurable systems. The work presented here studies this approach for run-time reconfiguration, by applying the concepts to develop applications on a context switching reconfigurable system. The work also discusses the advantages and disadvantages of such an approach and ways of leveraging the concept for efficient computing.
A cost quality model for CMOS IC design
Deshpande, Sandeep (Virginia Tech, 1994-09-15)
With a decreasing minimum feature size in very large scale integration (VLSI) complementary metal oxide semiconductor (CMOS) technology, the number of transistors that can be integrated on a single chip is increasing rapidly. Ensuring that these extremely dense chips are almost free of defects, and at the same time, cost-effective requires planning from the initial stage of design. This research proposes a concurrent engineering-based design methodology for layout optimization. The proposed method for layout optimization is iterative, and layout changes in each design iteration are made based on the principles of physical design for testability (P-DFT). P-DFT modifies a design such that the circuit has fewer faults, difficult to detect faults are made easier to detect, and difficult to detect faults are made less likely to occur. To implement this design methodology, a mathematical model is required to evaluate alternate designs. This research proposes an evaluation measure: the cost quality model. The cost quality model extends known test quality and testability estimation measures for gate-level circuits to switch-level circuits. To provide high fidelity in testability estimation and reasonable CPU time overhead, the cost quality model uses inductive fault analysis techniques to extract a realistic circuit fault list, I_DDQ test generation techniques to generate tests for these faults, statistical models to reduce computational overhead due to test generation and fault simulation, yield simulation tools, and mathematical models to estimate test quality and costs. To demonstrate the effectiveness of this model, results are presented for CMOS layouts of benchmark circuits and modifications of these layouts.
A Cost-Efficient Digital ESN Architecture on FPGA
Gan, Victor Ming (Virginia Tech, 2020-09-01)
Echo State Network (ESN) is a recently developed machine-learning paradigm whose processing capabilities rely on the dynamical behavior of recurrent neural networks (RNNs). Its performance metrics outperform traditional RNNs in nonlinear system identification and temporal information processing. In this thesis, we design and implement ESNs through Field-programmable gate array (FPGA) and explore their full capacity of digital signal processors (DSPs) to target low-cost and low-power applications. We propose a cost-optimized and scalable ESN architecture on FPGA, which exploits Xilinx DSP48E1 units to cut down the need of configurable logic blocks (CLBs). The proposed work includes a linear combination processor with negligible deployment of CLBs, as well as a high-accuracy non-linear function approximator, both with the help of only 9 DSP units in each neuron. The architecture is verified with the classical NARMA dataset, and a symbol detection task for an orthogonal frequency division multiplexing (OFDM) system on a wireless communication testbed. In the worst-case scenario, our proposed architecture delivers a matching bit error rate (BER) compares to its corresponding software ESN implementation. The performance difference between the hardware and software approach is less than 6.5%. The testbed system is built on a software-defined radio (SDR) platform, showing that our work is capable of processing the real-world data.

Browsing by Author "Athanas, Peter M."

Results Per Page

Sort Options