Browsing by Author "Vergara, John Paul C."
Now showing 1 - 13 of 13
Results Per Page
Sort Options
- Algorithms for Feature Selection in Rank-Order SpacesSlotta, Douglas J.; Vergara, John Paul C.; Ramakrishnan, Naren; Heath, Lenwood S. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2005)The problem of feature selection in supervised learning situations is considered, where all features are drawn from a common domain and are best interpreted via ordinal comparisons with other features, rather than as numerical values. In particular, each instance is a member of a space of ranked features. This problem is pertinent in electoral, financial, and bioinformatics contexts, where features denote assessments in terms of counts, ratings, or rankings. Four algorithms for feature selection in such rank-order spaces are presented; two are information-theoretic, and two are order-theoretic. These algorithms are empirically evaluated against both synthetic and real world datasets. The main results of this paper are (i) characterization of relationships and equivalences between different feature selection strategies with respect to the spaces in which they operate, and the distributions they seek to approximate; (ii) identification of computationally simple and efficient strategies that perform surprisingly well; and (iii) a feasibility study of order-theoretic feature selection for large scale datasets.
- Edge-packing by isomorphic subgraphsVergara, John Paul C. (Virginia Tech, 1990)Maximum G Edge-Packing (E PackG) is the problem of finding the maximum number of edge-disjoint isomorphic copies of a fixed guest graph G in a host graph H. The problem is primarily considered for several guest graphs (stars, paths and cycles) and host graphs (arbitrary graphs, planar graphs and trees). We give polynomial-time algorithms when G is a 2-path or when H is a tree; we show the problem is NP-complete otherwise. Also, we propose straightforward greedy polynomial-time approximation algorithms which are at least 1/|EG| optimal.
- Edge-Packing by Isomorphic SubgraphsVergara, John Paul C.; Heath, Lenwood S. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1991)Maximum G Edge-Packing (EPackG) is the problem of finding the maximum number of edge-disjoint isomorphic copies of a fixed guest graph G in a host graph H. This paper investigates the computational complexity of edge-packing for planar guests and planar hosts. Edge-packing is solvable in linear time when G is a 2-path and H is arbitrary, or when H is outerplanar and G is either a 3-cycle or a k-star. Edge-packing is solvable in polynomial time when both G and H are trees. Edge-packing is NP-complete when H is planar and G is either a cycle or a tree with ≥3 edges. The approximability of EPackG is considered. A strategy for developing polynomial-time approximation algorithms for planar hosts is exemplified by a linear-time approximation algorithm for EPack k-star that finds an edge-packing of size at least one-half optimal. Finally, EPackG is shown not to be in the complexity class Max SNP, though it is Max SNP-hard for G a k-star, ≥3.
- Edge-Packing in Planar GraphsHeath, Lenwood S.; Vergara, John Paul C. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1995-10-01)Maximum G Edge-Packing (EPack-sub G) is the problem of finding the maximum number of edge-disjoint isomorphic copies of a fixed guest graph G in a host graph H. This paper investigates the computational complexity of edge-packing for planar guests and planar hosts. Edge-packing is solvable in polynomial time when both G and H are either a 3-cycle or a k-star (graphs isomorphic to K(sub 1,k). Edge-packing is NP-complete when H is planar and G is either a cycle or a tree with greater than or equal to 3 edges. A strategy for developing polynomial-time approximation algorithms for planar hosts is exemplified by a linear-time approximation algorithm that finds a k-star edge-packing of size at least 1/2 optimal.
- Edge-Packing Planar Graphs by Cyclic GraphsHeath, Lenwood S.; Vergara, John Paul C. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1996-11-01)Maximum G Edge-Packing is the problem of finding the maximum number of edge-disjoint isomorphic copies of a fixed guest graph G in a host graph H. This paper considers the cases where G and H are planar and G is cyclic. Recent work on the general problem is surveyed, inadequacies and limitations in these results are identified, and NP-completeness proofs for key cases are presented.
- Evalutating Biological Data Using Rank Correlation MethodsSlotta, Douglas J. (Virginia Tech, 2005-05-05)Analyses based upon rank correlation methods, such as Spearman's Rho and Kendall's Tau, can provide quick insights into large biological data sets. Comparing expression levels between different technologies and models is problematic due to the different units of measure. Here again, rank correlation provides an effective means of comparison between the two techniques. Massively Parallel Signature Sequencing (MPSS) transcript abundance levels to microarray signal intensities for Arabidopsis thaliana are compared. Rank correlations can be applied to subsets as well as the entire set. Results of subset comparisons can be used to improve the capabilities of predictive models, such as Predicted Highly Expressed (PHX). This is done for Escherichia coli. Methods are given to combine predictive models based upon feedback from experimental data. The problem of feature selection in supervised learning situations is also considered, where all features are drawn from a common domain and are best interpreted via ordinal comparisons with other features, rather than as numerical values. This is done for synthetic data as well as for microarray experiments examining the life cycle of Drosophila melanogaster and human leukemia cells. Two novel methods are presented based upon Rho and Tau, and their efficacy is tested with synthetic and real world data. The method based upon Spearman's Rho is shown to be more effective.
- GridWeaver: A Fully-Automatic System for Microarray Image Analysis Using Fast Fourier TransformsVergara, John Paul C.; Heath, Lenwood S.; Grene, Ruth; Ramakrishnan, Naren; Watson, Layne T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2008)Experiments using microarray technology generate large amounts of image data that are used in the analysis of genetic function. An important stage in the analysis is the determination of relative intensities of spots on the images generated. This paper presents GridWeaver, a program that reads in images from a microarray experiment, automatically locates subgrids and spots in the images, and then determines the spot intensities needed in the analysis of gene function. Automatic gridding is performed by running Fast Fourier Transforms on pixel intensity sums. Tests on several data sets show that the program responds well even on images that have significant noise, both random and systemic.
- Mining Posets from Linear OrdersFernandez, Proceso L.; Heath, Lenwood S.; Ramakrishnan, Naren; Vergara, John Paul C. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2009)There has been much research on the combinatorial problem of generating the linear extensions of a given poset. This paper focuses on the reverse of that problem, where the input is a set of linear orders, and the goal is to construct a poset or set of posets that generates the input. Such a problem finds applications in computational neuroscience, systems biology, paleontology, and physical plant engineering. In this paper, several algorithms are presented for efficiently finding a single poset that generates the input set of linear orders. The variation of the problem where a minimum set of posets that cover the input is also explored. It is found that the problem is polynomially solvable for one class of simple posets (kite(2) posets) but NP-complete for a related class (hammock(2,2,2) posets).
- Some Experiments on the Sorting by Reversals MethodHeath, Lenwood S.; Vergara, John Paul C. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1995-09-01)Sorting by reversals is the problem of finding the minimum number of reversals required to sort a permutation pi. The problem is significant with respect to the study of genome rearrangements and phylogeny reconstruction. This paper presents a programming framework for performing experiments on the problem. Several conjectures concerning optimal sorting sequences are tested using this framework.
- Sorting by Bounded Block-MovesHeath, Lenwood S.; Vergara, John Paul C. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1997-05-01)Given a permutation pi, a block-move is an operation that switches two adjacent blocks of elements in pi. The problem of finding the minimum number of block-moves required to sort pi has applications in computational biology, particularly in the study of genome rearrangements. This paper investigates variants of the problem where bounds are imposed on the lengths of the blocks moved. Algorithms and reduction results are presented for these variants.
- Sorting by Bounded PermutationsVergara, John Paul C. (Virginia Tech, 1997-04-08)Let P be a predicate applicable to permutations. A permutation that satisfies P is called a generator. Given a permutation $pi$, MinSort_P is the problem of finding a shortest sequence of generators that, when composed with $pi$, yields the identity permutation. The length of this sequence is called the P distance of $pi$. Diam_P is the problem of finding the longest such distance for permutations of a given length. MinSort_P and Diam_P, for some choices of P, have applications in the study of genome rearrangements and in the design of interconnection networks. This dissertation considers generators that are swaps, reversals, or block-moves. Distance bounds on these generators are introduced and the corresponding problems are investigated. Reduction results, graph-theoretic models, exact and approximation algorithms, and heuristics for these problems are presented. Experimental results on the heuristics are also provided. When the bound is a function of the length of the permutation, there are several sorting problems such as sorting by block-moves and sorting by reversals whose bounded variants are at least as difficult as the corresponding unbounded problems. For some bounded problems, a strong relationship exists between finding optimal sorting sequences and correcting the relative order of individual pairs of elements. This fact is used in investigating MinSort_P and Diam_P for two particular predicates. A short block-move is a generator that moves an element at most two positions away from its original position. Sorting by short block-moves is solvable in polynomial time for two large classes of permutations: woven bitonic permutations and woven double-strip permutations. For general permutations, a polynomial-time (4/3)-approximation algorithm that computes short block-move distance is devised. The short block-move diameter for length-n permutations is determined. A short swap is a generator that swaps two elements that have at most one element between them. A polynomial-time 2-approximation algorithm for computing short swap distance is devised and a class of permutations where the algorithm computes the exact short swap distance is determined. Bounds for the short swap diameter for length-n permutations are determined.
- Sorting by Short Block-MovesHeath, Lenwood S.; Vergara, John Paul C. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1998-02-01)Sorting permutations by operations such as reversals and block-moves has received much interest because of its applications in computational biology, particularly in the study of genome rearrangements. A short block-move is an operation on a permutation that moves an element at most two positions away from its original position. This paper investigates the problem of finding a minimum-length sorting sequence of short block-moves for a given permutation. A 4/3-approximation algorithm for this problem is presented. Exact polynomial-time algorithms are presented for woven bitonic permutations and woven double-strip permutations. A linear-time maximum matching algorithm for a special class of grid graphs is also discovered that improves the time complexity of one of these exact algorithms.
- Subsequence and Run Heuristics for Sorting by TranspositionsGuyer, Scott A.; Heath, Lenwood S.; Vergara, John Paul C. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1997-11-01)Sorting by tranpositions is the problem of finding the minimum number of transpositions required to sort a permutation pi. A transposition involves repositioning a contiguous sequence (block) of elements by inserting it elsewhere in the permutation. The problem has applications in the study of genome rearrangements and phylogeny reconstruction. In this paper, several heuristics based on analyses of subsequences and runs in a permutation are employed. Experimental results are provided. The algorithm based on the longest increasing subsequence in a permutation appears most promising.