VTechWorks Repository :: Browsing by Author "Batra, Dhruv"

Browsing by Author "Batra, Dhruv"

Now showing 1 - 20 of 27

Advances in Iterative Probabilistic Processing for Communication Receivers
Jakubisin, Daniel Joseph (Virginia Tech, 2016-06-27)
As wireless communication systems continue to push the limits of energy and spectral efficiency, increased demands are placed on the capabilities of the receiver. At the same time, the computational resources available for processing received signals will continue to grow. This opens the door for iterative algorithms to play an increasing role in the next generation of communication receivers. In the context of receivers, the goal of iterative probabilistic processing is to approximate maximum a posteriori (MAP) symbol-by-symbol detection of the information bits and estimation of the unknown channel or signal parameters. The sum-product algorithm is capable of efficiently approximating the marginal posterior probabilities desired for MAP detection and provides a unifying framework for the development of iterative receiver algorithms. However, in some applications the sum-product algorithm is computationally infeasible. Specifically, this is the case when both continuous and discrete parameters are present within the model. Also, the complexity of the sum-product algorithm is exponential in the number of variables connected to a particular factor node and can be prohibitive in multi-user and multi-antenna applications. In this dissertation we identify three key problems which can benefit from iterative probabilistic processing, but for which the sum-product algorithm is too complex. They are (1) joint synchronization and detection in multipath channels with emphasis on frame timing, (2) detection in co-channel interference and non-Gaussian noise, and (3) joint channel estimation and multi-signal detection. This dissertation presents the advances we have made in iterative probabilistic processing in order to tackle these problems. The motivation behind the work is to (a) compromise as little as possible on the performance that is achieved while limiting the computational complexity and (b) maintain good theoretical justification to the algorithms that are developed.
Automated Cross-Platform Code Synthesis from Web-Based Programming Resources
Byalik, Antuan (Virginia Tech, 2015-08-04)
For maximal market penetration, popular mobile applications are typically supported on all major platforms, including Android and iOS. Despite the vast differences in the look-and-feel of major mobile platforms, applications running on these platforms in essence provide the same core functionality. As an application is maintained and evolved, programmers need to replicate the resulting changes on all the supported platforms, a tedious and error-prone programming process. Commercial automated source-to-source translation tools prove inadequate due to the structural and idiomatic differences in how functionalities are expressed across major platforms. In this thesis, we present a new approach---Native-2-Native---that automatically synthesizes code for a mobile application to make use of native resources on one platform, based on the equivalent program transformations performed on another platform. First, the programmer modifies a mobile application's Android version to make use of some native resource, with a plugin capturing code changes. Based on the changes, the system then parameterizes a web search query over popular programming resources (e.g., Google Code, StackOverflow, etc.), to discover equivalent iOS code blocks with the closest similarity to the programmer-written Android code. The discovered iOS code block is then presented to the programmer as an automatically synthesized Swift source file to further fine-tune and subsequently integrate in the mobile application's iOS version. Our evaluation, enhancing mobile applications to make use of common native resources, shows that the presented approach can correctly synthesize more than 86% of Swift code for the subject applications' iOS versions.
Biodiversity and dynamics of direction finding accuracy in bat biosonar
Uzair Gilani, Syed (Virginia Tech, 2016-04-04)
In the biosonar systems of bats, emitted acoustic energy and receiver sensitivity are distributed over direction and frequency through beampattern functions that have diverse and often complicated geometries. This complexity could be used by the animals to determine the direction of incoming sounds based on spectral signatures. The present study in its first part has investigated how well bat biosonar beampatterns are suited for direction finding using a measure of the smallest estimator variance that is possible for a given direction (Cram{'e}r-Rao lower bound, CRLB). CRLB values were estimated for numerical beampattern estimates derived from 330 individual shape samples, 157 noseleaves (used for emission) and 173 outer ears (pinnae). At an assumed unit[60]{dB} signal-to-noise ratio, the average value of the CRLB was 3.9textdegree, which is similar to previous behavioral findings. Distribution for the CRLBs in individual beampatterns were found to have a positive skew indicating the existence of regions where a given beampattern does not support a high accuracy. The highest supported accuracies were for direction finding in elevation (with the exception of phyllostomid emission patterns). Beampatterns in the dataset were also characterized based upon the differences in the type of acoustic signal they are associated with, the functionality of the baffle shape producing them and their phylogeny. In the second part of the study, functionality of various local shape features was investigated under static and dynamic conditions. Each local shape feature was found to have an impact on the estimation performance of the baffle shape. Interaction of the local shape features among themselves as well as their dynamic motion produced a plethora of results, not achievable through either single features or through their static states only.
CloudCV: Deep Learning and Computer Vision on the Cloud
Agrawal, Harsh (Virginia Tech, 2016-06-20)
We are witnessing a proliferation of massive visual data. Visual content is arguably the fastest growing data on the web. Photo-sharing websites like Flickr and Facebook now host more than 6 and 90 billion photos, respectively. Unfortunately, scaling existing computer vision algorithms to large datasets leaves researchers repeatedly solving the same algorithmic and infrastructural problems. Designing and implementing efficient and provably correct computer vision algorithms is extremely challenging. Researchers must repeatedly solve the same low-level problems: building and maintaining a cluster of machines, formulating each component of the computer vision pipeline, designing new deep learning layers, writing custom hardware wrappers, etc. This thesis introduces CloudCV, an ambitious system that contain algorithms for end-to-end processing of visual content. The goal of the project is to democratize computer vision; one should not have to be a computer vision, big data and deep learning expert to have access to state-of-the-art distributed computer vision algorithms. We provide researchers, students and developers access to state-of-art distributed computer vision and deep learning algorithms as a cloud service through web interface and APIs.
Collaborative Unmanned Air and Ground Vehicle Perception for Scene Understanding, Planning and GPS-denied Localization
Christie, Gordon A. (Virginia Tech, 2017-01-05)
Autonomous robot missions in unknown environments are challenging. In many cases, the systems involved are unable to use a priori information about the scene (e.g. road maps). This is especially true in disaster response scenarios, where existing maps are now out of date. Areas without GPS are another concern, especially when the involved systems are tasked with navigating a path planned by a remote base station. Scene understanding via robots' perception data (e.g. images) can greatly assist in overcoming these challenges. This dissertation makes three contributions that help overcome these challenges, where there is a focus on the application of autonomously searching for radiation sources with unmanned aerial vehicles (UAV) and unmanned ground vehicles (UGV) in unknown and unstructured environments. The three main contributions of this dissertation are: (1) An approach to overcome the challenges associated with simultaneously trying to understand 2D and 3D information about the environment. (2) Algorithms and experiments involving scene understanding for real-world autonomous search tasks. The experiments involve a UAV and a UGV searching for potentially hazardous sources of radiation is an unknown environment. (3) An approach to the registration of a UGV in areas without GPS using 2D image data and 3D data, where localization is performed in an overhead map generated from imagery captured in the air.
Comparison and Development of Algorithms for Motor Imagery Classification in EEG- based Brain-Computer Interfaces
Ailsworth, James William Jr. (Virginia Tech, 2016-06-20)
Brain-computer interfaces are an emerging technology that could provide channels for communication and control to severely disabled people suffering from locked-in syndrome. It has been found that motor imagery can be detected and classified from EEG signals. The motivation of the present work was to compare several algorithms for motor imagery classification in EEG signals as well as to test several novel algorithms. The algorithms tested included the popular method of common spatial patterns (CSP) spatial filtering followed by linear discriminant analysis (LDA) classification of log-variance features (CSP+LDA). A second set of algorithms used classification based on concepts from Riemannian geometry. The basic idea of these methods is that sample spatial covariance matrices (SCMs) of EEG epochs belong to the Riemannian manifold of symmetric positive-definite (SPD) matrices and that the tangent space at any SPD matrix on the manifold is a finite-dimensional Euclidean space. Riemannian classification methods tested included minimum distance to Riemannian mean (MDRM), tangent space LDA (TSLDA), and Fisher geodesic filtering followed by MDRM classification (FGDA). The novel algorithms aimed to combine the CSP method with the Riemannian geometry methods. CSP spatial filtering was performed prior to sample SCM calculation and subsequent classification using Riemannian methods. The novel algorithms were found to improve classification accuracy as well as reduce the computational costs of Riemannian classification methods for binary, synchronous classification on BCI competition IV dataset 2a.
Data Augmentation with Seq2Seq Models
Granstedt, Jason Louis (Virginia Tech, 2017-07-06)
Paraphrase sparsity is an issue that complicates the training process of question answering systems: syntactically diverse but semantically equivalent sentences can have significant disparities in predicted output probabilities. We propose a method for generating an augmented paraphrase corpus for the visual question answering system to make it more robust to paraphrases. This corpus is generated by concatenating two sequence to sequence models. In order to generate diverse paraphrases, we sample the neural network using diverse beam search. We evaluate the results on the standard VQA validation set. Our approach results in a significantly expanded training dataset and vocabulary size, but has slightly worse performance when tested on the validation split. Although not as fruitful as we had hoped, our work highlights additional avenues for investigation into selecting more optimal model parameters and the development of a more sophisticated paraphrase filtering algorithm. The primary contribution of this work is the demonstration that decent paraphrases can be generated from sequence to sequence models and the development of a pipeline for developing an augmented dataset.
Digital State Models for Infrastructure Condition Assessment and Structural Testing
Lama Salomon, Abraham (Virginia Tech, 2017-02-10)
This research introduces and applies the concept of digital state models for civil infrastructure condition assessment and structural testing. Digital state models are defined herein as any transient or permanent 3D model of an object (e.g. textured meshes and point clouds) combined with any electromagnetic radiation (e.g., visible light, infrared, X-ray) or other two-dimensional image-like representation. In this study, digital state models are built using visible light and used to document the transient state of a wide variety of structures (ranging from concrete elements to cold-formed steel columns and hot-rolled steel shear-walls) and civil infrastructures (bridges). The accuracy of digital state models was validated in comparison to traditional sensors (e.g., digital caliper, crack microscope, wire potentiometer). Overall, features measured from the 3D point clouds data presented a maximum error of ±0.10 in. (±2.5 mm); and surface features (i.e., crack widths) measured from the texture information in textured polygon meshes had a maximum error of ±0.010 in. (±0.25 mm). Results showed that digital state models have a similar performance between all specimen surface types and between laboratory and field experiments. Also, it is shown that digital state models have great potential for structural assessment by significantly improving data collection, automation, change detection, visualization, and augmented reality, with significant opportunities for commercial development. Algorithms to analyze and extract information from digital state models such as cracks, displacement, and buckling deformation are developed and tested. Finally, the extensive data sets collected in this effort are shared for research development in computer vision-based infrastructure condition assessment, eliminating the major obstacle for advancing in this field, the absence of publicly available data sets.
Global Energy Conservation in Large Data Networks
Durbeck, Lisa J. (Virginia Tech, 2016-01-07)
Seven to ten percent of the energy used globally goes towards powering information and communications technology (ICT): the global data- and telecommunications network, the private and commercial datacenters it supports, and the 19 billion electronic devices around the globe it interconnects, through which we communicate, and access and produce information. As bandwidth and data rates increase, so does the volume of traffic, as well as the absolute amount of new information digitized and uploaded onto the Net and into the cloud each second. Words like gigabit and terabyte were needless fifteen years ago in the public arena; now, they are common phrases. As people use their networked devices to do more, to access more, to send more, and to connect more, they use more energy--not only in their own devices, but also throughout the ICT. While there are many endeavors focused on individual low-power devices, few are examining broad strategies that cross the many boundaries of separate concerns within the ICT; also, few are assessing the impact of specific strategies on the global energy supply: at a global scale. This work examines the energy savings of several such strategies; it also assesses their efficacy in reducing energy consumption, both within specific networks and within the larger ICT. All of these strategies save energy by reducing the work done by the system as a whole on behalf of a single user, often by exploiting commonalities among what many users around the globe are also doing to amortize the costs.
Greedy Inference Algorithms for Structured and Neural Models
Sun, Qing (Virginia Tech, 2018-01-18)
A number of problems in Computer Vision, Natural Language Processing, and Machine Learning produce structured outputs in high-dimensional space, which makes searching for the global optimal solution extremely expensive. Thus, greedy algorithms, making trade-offs between precision and efficiency, are widely used. Unfortunately, they in general lack theoretical guarantees. In this thesis, we prove that greedy algorithms are effective and efficient to search for multiple top-scoring hypotheses from structured (neural) models: 1) Entropy estimation. We aim to find deterministic samples that are representative of Gibbs distribution via a greedy strategy. 2) Searching for a set of diverse and high-quality bounding boxes. We formulate this problem as the constrained maximization of a monotonic sub-modular function such that there exists a greedy algorithm having near-optimal guarantee. 3) Fill-in-the-blank. The goal is to generate missing words conditioned on context given an image. We extend Beam Search, a greedy algorithm applicable on unidirectional expansion, to bidirectional neural models when both past and future information have to be considered. We test our proposed approaches on a series of Computer Vision and Natural Language Processing benchmarks and show that they are effective and efficient.
Intelligent Approaches for Communication Denial
Amuru, SaiDhiraj (Virginia Tech, 2015-10-05)
Spectrum supremacy is a vital part of security in the modern era. In the past 50 years, a great deal of work has been devoted to designing defenses against attacks from malicious nodes (e.g., anti-jamming), while significantly less work has been devoted to the equally important task of designing effective strategies for denying communication between enemy nodes/radios within an area (e.g., jamming). Such denial techniques are especially useful in military applications and intrusion detection systems where untrusted communication must be stopped. In this dissertation, we study these offensive attack procedures, collectively termed as communication denial. The communication denial strategies studied in this dissertation are not only useful in undermining the communication between enemy nodes, but also help in analyzing the vulnerabilities of existing systems. A majority of the works which address communication denial assume that knowledge about the enemy nodes is available a priori. However, recent advances in communication systems creates the potential for dynamic environmental conditions where it is difficult and most likely not even possible to obtain a priori information regarding the environment and the nodes that are present in it. Therefore, it is necessary to have cognitive capabilities that enable the attacker to learn the environment and prevent enemy nodes from accessing valuable spectrum, thereby denying communication. In this regard, we ask the following question in this dissertation ``Can an intelligent attacker learn and adapt to unknown environments in an electronic warfare-type scenario?" Fundamentally speaking, we explore whether existing machine learning techniques can be used to address such cognitive scenarios and, if not, what are the missing pieces that will enable an attacker to achieve spectrum supremacy by denying an enemy the ability to communicate? The first task in achieving spectrum supremacy is to identify the signal of interest before it can be attacked. Thus, we first address signal identification, specifically modulation classification, in practical wireless environments where the interference is often non-Gaussian. Upon identifying the signal of interest, the next step is to effectively attack the victim signals in order to deny communication. We present a rigorous fundamental analysis regarding the attackers performance, in terms of achieving communication denial, in practical communication settings. Furthermore, we develop intelligent approaches for communication denial that employ novel machine learning techniques to attack the victim either at the physical layer, the MAC layer, or the network layer. We rigorously investigate whether or not these learning techniques enable the attacker to approach the fundamental performance limits achievable when an attacker has complete knowledge of the environment. As a result of our work, we debunk several myths about communication denial strategies that were believed to be true mainly because incorrect system models were previously considered and thus the wrong questions were answered.
Interactively Guiding Semi-Supervised Clustering via Attribute-based Explanations
Lad, Shrenik (Virginia Tech, 2015-07-01)
Unsupervised image clustering is a challenging and often ill-posed problem. Existing image descriptors fail to capture the clustering criterion well, and more importantly, the criterion itself may depend on (unknown) user preferences. Semi-supervised approaches such as distance metric learning and constrained clustering thus leverage user-provided annotations indicating which pairs of images belong to the same cluster (must-link) and which ones do not (cannot-link). These approaches require many such constraints before achieving good clustering performance because each constraint only provides weak cues about the desired clustering. In this work, we propose to use image attributes as a modality for the user to provide more informative cues. In particular, the clustering algorithm iteratively and actively queries a user with an image pair. Instead of the user simply providing a must-link/cannot-link constraint for the pair, the user also provides an attribute-based reasoning e.g. "these two images are similar because both are natural and have still water'' or "these two people are dissimilar because one is way older than the other''. Under the guidance of this explanation, and equipped with attribute predictors, many additional constraints are automatically generated. We demonstrate the effectiveness of our approach by incorporating the proposed attribute-based explanations in three standard semi-supervised clustering algorithms: Constrained K-Means, MPCK-Means, and Spectral Clustering, on three domains: scenes, shoes, and faces, using both binary and relative attributes.
Leveraging Multimodal Perspectives to Learn Common Sense for Vision and Language Tasks
Lin, Xiao (Virginia Tech, 2017-10-05)
Learning and reasoning with common sense is a challenging problem in Artificial Intelligence (AI). Humans have the remarkable ability to interpret images and text from different perspectives in multiple modalities, and to use large amounts of commonsense knowledge while performing visual or textual tasks. Inspired by that ability, we approach commonsense learning as leveraging perspectives from multiple modalities for images and text in the context of vision and language tasks. Given a target task (e.g., textual reasoning, matching images with captions), our system first represents input images and text in multiple modalities (e.g., vision, text, abstract scenes and facts). Those modalities provide different perspectives to interpret the input images and text. And then based on those perspectives, the system performs reasoning to make a joint prediction for the target task. Surprisingly, we show that interpreting textual assertions and scene descriptions in the modality of abstract scenes improves performance on various textual reasoning tasks, and interpreting images in the modality of Visual Question Answering improves performance on caption retrieval, which is a visual reasoning task. With grounding, imagination and question-answering approaches to interpret images and text in different modalities, we show that learning commonsense knowledge from multiple modalities effectively improves the performance of downstream vision and language tasks, improves interpretability of the model and is able to make more efficient use of training data. Complementary to the model aspect, we also study the data aspect of commonsense learning in vision and language. We study active learning for Visual Question Answering (VQA) where a model iteratively grows its knowledge through querying informative questions about images for answers. Drawing analogies from human learning, we explore cramming (entropy), curiosity-driven (expected model change), and goal-driven (expected error reduction) active learning approaches, and propose a new goal-driven scoring function for deep VQA models under the Bayesian Neural Network framework. Once trained with a large initial training set, a deep VQA model is able to efficiently query informative question-image pairs for answers to improve itself through active learning, saving human effort on commonsense annotations.
Load Modeling using Synchrophasor Data for Improved Contingency Analysis
Retty, Hema (Virginia Tech, 2016-01-18)
For decades, researchers have sought to make the North American power system as reliable as possible with many security measures in place to include redundancy. Yet the increasing number of blackouts and failures have highlighted the areas that require improvement. Meeting the increasing demand for energy and the growing complexity of the loads are two of the main challenges faced by the power grid. In order to prepare for contingencies and maintain a secure state, power engineers must perform simulations using steady state and dynamic models of the system. The results from the contingency studies are only as accurate as the models of the grid components. The load components are generally the most difficult to model since they are controlled by the consumer. This study focuses on developing static and dynamic load models using advanced mathematical approximation algorithms and wide area measurement devices, which will improve the accuracy of the system analysis and hopefully decrease the frequency of blackouts. The increasing integration of phasor measurement units (PMUs) into the power system allows us to take advantage of synchronized measurements at a high data rate. These devices are capable of changing the way we manage online security within the Energy Management System (EMS) and can enhance our offline tools. This type of data helps us redevelop the measurement-based approach to load modeling. The static ZIP load model composition is estimated using a variation of the method of least squares, called bounded-variable least squares. The bound on the ZIP load parameters allows the measurement matrix to be slightly correlated. The ZIP model can be determined within a small range of error that won't affect the contingency studies. Machine learning is used to design the dynamic load model. Neural network training is applied to fault data obtained near the load bus and the derived network model can estimate the load parameters. The neural network is trained using simulated data and then applied to real PMU measurements. A PMU algorithm was developed to transform the simulated measurements into a realistic representation of phasor data. These new algorithms will allow us to estimate the load models that are used in contingency studies.
Low-shot Visual Recognition
Pemula, Latha (Virginia Tech, 2016-10-24)
Many real world datasets are characterized by having a long tailed distribution, with several samples for some classes and only a few samples for other classes. While many Deep Learning based solutions exist for object recognition when hundreds of samples are available, there are not many solutions for the case when there are only a few samples available per class. Recognition in the regime where the number of training samples available for each class are low, ranging from 1 to couple of tens of examples is called Lowshot Recognition. In this work, we attempt to solve this problem. Our framework is similar to [1]. We use a related dataset with sufficient number (a couple of hundred) of samples per class to learn representations using a Convolutional Neural Network (CNN). This CNN is used to extract features of the lowshot samples and learn a classifier . During representation learning, we enforce the learnt representations to obey certain property by using a custom loss function. We believe that when the lowshot sample obey this property the classification step becomes easier. We show that the proposed solution performs better than the softmax classifier by a good margin.
Making diffusion work for you: Classification sans text, finding culprits and filling missing values
Sundareisan, Shashidhar (Virginia Tech, 2014-07-24)
Can we find people infected with the flu virus even though they did not visit a doctor? Can the temporal features of a trending hashtag or a keyword indicate which topic it belongs to without any textual information? Given a history of interactions between blogs and news websites, can we predict blogs posts/news websites that are not in the sample but talk about the "the state of the economy" in 2008? These questions have two things in common: a network (social networks or human contact networks) and a virus (meme, keyword or the flu virus) diffusing over the network. We can think of interactions like memes, hashtags, influenza infections, computer viruses etc., as viruses spreading in a network. This treatment allows for the usage of epidemiologically inspired models to study or model these interactions. Understanding the complex propagation dynamics involved in information diffusion with the help of these models uncovers various non-trivial and interesting results. In this thesis we propose (a) A fast and efficient algorithm NetFill, which can be used to find quantitatively and qualitatively correct infected nodes, not in the sample and finding the culprits and (b) A method, SansText that can be used to find out which topic a keyword/hashtag belongs to just by looking at the popularity graph of the keyword without textual analysis. The results derived in this thesis can be used in various areas like epidemiology, news and protest detection, viral marketing and it can also be used to reduce sampling errors in graphs.
Multisensor Multitemporal Fusion for Remote Sensing using Landsat and MODIS Data
Ghannam, Sherin Ghannam (Virginia Tech, 2017-12-07)
The growing Landsat data archive represents more than four decades of continuous Earth observation. Landsat's role in scientific analysis has increased dramatically in recent years as a result of the open-access policy of the U.S. Geological Survey (USGS). However, this rich data record suffers from relatively low temporal resolution due to the 16-day revisit period of each Landsat satellite. To estimate Landsat images at other points in time, researchers have proposed data-fusion approaches that combine existing Landsat data with images from other sensors, such as MODIS (Moderate Resolution Imaging Spectroradiometer) from the Terra and Aqua satellites. MODIS provides daily revisits, however, with a spatial resolution that is significantly lower than that of Landsat. Fusion of Landsat and MODIS is challenging because of differences in their spatial resolution, band designations, swath width, viewing angle and the noise level. Fusion is even more challenging for heterogeneous landscapes. In the first part of our work, the multiresolution analysis offered by the wavelet transform was explored as a suitable environment for Landsat and MODIS fusion. Our proposed Wavelet-based Spatiotemporal Adaptive Reflectance Fusion Model (WSTARFM) is the first model to merge Landsat and MODIS successfully. It handles the heterogeneity of the landscapes more effectively than the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM) does. The system has been tested on simulated data and on actual data of two study areas in North Carolina. For a challenging heterogeneous study area near Greensboro, North Carolina, WSTARFM produced results with median R-squared values of 0.98 and 0.95 for the near-infrared band over deciduous forests and developed areas, respectively. Those results were obtained by withholding an actual Landsat image, and comparing it with a predicted version of the same image. These values represent an improvement over results obtained using the well-known STARFM technique. Similar improvements were obtained for the red band. For the second (homogeneous) study area, WSTARFM produced comparable prediction results to STARFM. In the second part of our work, Landsat-MODIS fusion has been explored from the temporal perspective. The fusion is performed on the Landsat and MODIS per-pixel time series. A new Multisensor Adaptive Time Series Fitting Model (MATSFM) is proposed. MATSFM is the first model to use mapped MODIS values to guide the fitting applied to the sparse Landsat time series. MATSFM produced results with median R-squared of 0.98 over the NDVI images of the first heterogeneous study area compared to 0.97 produced by STARFM. For the second study area, MATSFM also produced better prediction accuracy than STARFM.
Natural Language Driven Image Edits using a Semantic Image Manipulation Language
Mohapatra, Akrit (Virginia Tech, 2018-06-04)
Language provides us with a powerful tool to articulate and express ourselves! Understanding and harnessing the expressions of natural language can open the doors to a vast array of creative applications. In this work we explore one such application - natural language based image editing. We propose a novel framework to go from free-form natural language commands to performing fine-grained image edits. Recent progress in the field of deep learning has motivated solving most tasks using end-to-end deep convolutional frameworks. Such methods have shown to be very successful even achieving super-human performance in some cases. Although such progress has shown significant promise for the future we believe there is still progress to be made before their effective application to a task like fine-grained image editing. We approach the problem by dissecting the inputs (image and language query) and focusing on understanding the language input utilizing traditional natural language processing (NLP) techniques. We start by parsing the input query to identify the entities, attributes and relationships and generate a command entity representation. We define our own high-level image manipulation language that serves as an intermediate programming language connecting natural language requests that represent a creative intent over an image into the lower-level operations needed to execute them. The semantic command entity representations are mapped into this high- level language to carry out the intended execution.
Object Proposals in Computer Vision
Chavali, Neelima (Virginia Tech, 2015-09-09)
Object recognition is a central problem in computer vision which deals with both localizing and identifying objects in images. Object proposals have recently become an important part of the object recognition process. Object proposals are algorithms used for localizing objects in images. This thesis is a study in object proposals and is composed of three parts. First, we present a new data-driven approach for generating object proposals. Second, we release a MATLAB library which can be used to generate object proposals using all the existing algorithms. The library can also be used for evaluating object proposals using the three most commonly used metrics. Finally, we identify previously unnoticed bias in the existing protocol for evaluating object proposals and propose ways to alleviate this bias.
Probabilistic Modeling of Multi-relational and Multivariate Discrete Data
Wu, Hao (Virginia Tech, 2017-02-07)
Modeling and discovering knowledge from multi-relational and multivariate discrete data is a crucial task that arises in many research and application domains, e.g. text mining, intelligence analysis, epidemiology, social science, etc. In this dissertation, we study and address three problems involving the modeling of multi-relational discrete data and multivariate multi-response count data, viz. (1) discovering surprising patterns from multi-relational data, (2) constructing a generative model for multivariate categorical data, and (3) simultaneously modeling multivariate multi-response count data and estimating covariance structures between multiple responses. To discover surprising multi-relational patterns, we first study the ``where do I start?'' problem originating from intelligence analysis. By studying nine methods with origins in association analysis, graph metrics, and probabilistic modeling, we identify several classes of algorithmic strategies that can supply starting points to analysts, and thus help to discover interesting multi-relational patterns from datasets. To actually mine for interesting multi-relational patterns, we represent the multi-relational patterns as dense and well-connected chains of biclusters over multiple relations, and model the discrete data by the maximum entropy principle, such that in a statistically well-founded way we can gauge the surprisingness of a discovered bicluster chain with respect to what we already know. We design an algorithm for approximating the most informative multi-relational patterns, and provide strategies to incrementally organize discovered patterns into the background model. We illustrate how our method is adept at discovering the hidden plot in multiple synthetic and real-world intelligence analysis datasets. Our approach naturally generalizes traditional attribute-based maximum entropy models for single relations, and further supports iterative, human-in-the-loop, knowledge discovery. To build a generative model for multivariate categorical data, we apply the maximum entropy principle to propose a categorical maximum entropy model such that in a statistically well-founded way we can optimally use given prior information about the data, and are unbiased otherwise. Generally, inferring the maximum entropy model could be infeasible in practice. Here, we leverage the structure of the categorical data space to design an efficient model inference algorithm to estimate the categorical maximum entropy model, and we demonstrate how the proposed model is adept at estimating underlying data distributions. We evaluate this approach against both simulated data and US census datasets, and demonstrate its feasibility using an epidemic simulation application. Modeling data with multivariate count responses is a challenging problem due to the discrete nature of the responses. Existing methods for univariate count responses cannot be easily extended to the multivariate case since the dependency among multiple responses needs to be properly accounted for. To model multivariate data with multiple count responses, we propose a novel multivariate Poisson log-normal model (MVPLN). By simultaneously estimating the regression coefficients and inverse covariance matrix over the latent variables with an efficient Monte Carlo EM algorithm, the proposed model takes advantages of association among multiple count responses to improve the model prediction accuracy. Simulation studies and applications to real world data are conducted to systematically evaluate the performance of the proposed method in comparison with conventional methods.

Browsing by Author "Batra, Dhruv"

Results Per Page

Sort Options