Browsing by Author "Frakes, William B."
Now showing 1 - 20 of 34
Results Per Page
Sort Options
- Automatic Lexicon Generation for Unsupervised Part-of-Speech Tagging Using Only Unannotated TextPereira, Dennis V. (Virginia Tech, 1999-05-07)With the growing number of textual resources available, the ability to understand them becomes critical. An essential first step in understanding these sources is the ability to identify the parts-of-speech in each sentence. The goal of this research is to propose, improve, and implement an algorithm capable of finding terms (words in a corpus) that are used in similar ways--a term categorizer. Such a term categorizer can be used to find a particular part-of-speech, i.e. nouns in a corpus, and generate a lexicon. The proposed work is not dependent on any external sources of information, such as dictionaries, and it shows a significant improvement (~30%) over an existing method of categorization. More importantly, the proposed algorithm can be applied as a component of an unsupervised part-of-speech tagger, making it truly unsupervised, requiring only unannotated text. The algorithm is discussed in detail, along with its background, and its performance. Experimentation shows that the proposed algorithm performs within 3% of the baseline, the Penn-TreeBank Lexicon.
- A case study in object-oriented development: code reuse for two computer gamesScott, Roger E. (Virginia Tech, 1992)A case study of the object-oriented development of two computer games using commercially available products was conducted. The games were constructed for use on Apple Macintosh computers using a C+ + like programming language and an accompanying object-oriented class library. Object-oriented techniques are compared with procedure oriented techniques, and benefits of object-oriented techniques for code reuse are introduced. The reuse of object-oriented code within a target domain of applications is discussed, with examples drawn from the reuse of specific functions between the two games. Other reuse topics encountered in the development effort which are discussed: reuse of operating system routines, reuse of code provided by an object-oriented class library, and reuse of code to provide functions needed for a graphical user interface.
- A Case Study of Using Domain Analysis for the Conflation Algorithms DomainYilmaz, Okan; Frakes, William B. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2007)This paper documents the domain engineering process for much of the conflation algorithms domain. Empirical data on the process and products of domain engineering were collected. Six conflation algorithms of four different types: three affix removal, one successor variety, one table lookup, and one n-gram were analyzed. Products of the analysis include a generic architecture, reusable components, a little language and an application generator that extends the scope of the domain analysis beyond previous generators. The application generator produces source code for not only affix removal type but also successor variety, table lookup, and n-gram stemmers. The performance of the stemmers generated automatically was compared with the stemmers developed manually in terms of stem similarity, source and executable sizes, and development and execution times. All five stemmers generated by the application generator produced more than 99.9% identical stems with the manually developed stemmers. Some of the generated stemmers were as efficient as their manual equivalents and some were not.
- A Class of Call Admission Control Algorithms for Resource Management and Reward Optimization for Servicing Multiple QoS Classes in Wireless Networks and Its ApplicationsYilmaz, Okan (Virginia Tech, 2008-11-17)We develop and analyze a class of CAC algorithms for resource management in wireless networks with the goal not only to satisfy QoS constraints, but also to maximize a value or reward objective function specified by the system. We demonstrate through analytical modeling and simulation validation that the CAC algorithms developed in this research for resource management can greatly improve the system reward obtainable with QoS guarantees, when compared with existing CAC algorithms designed for QoS satisfaction only. We design hybrid partitioning-threshold, spillover and elastic CAC algorithms based on the design techniques of partitioning, setting thresholds and probabilistic call acceptance to use channel resources for servicing distinct QoS classes. For each CAC algorithm developed, we identify optimal resource management policies in terms of partitioning or threshold settings to use channel resources. By comparing these CAC algorithms head-to-head under identical conditions, we determine the best algorithm to be used at runtime to maximize system reward with QoS guarantees for servicing multiple service classes in wireless networks. We study solution correctness, solution optimality and solution efficiency of the class of CAC algorithms developed. We ensure solution optimality by comparing optimal solutions achieved with those obtained by ideal CAC algorithms via exhaustive search. We study solution efficiency properties by performing complexity analyses and ensure solution correctness by simulation validation based on real human mobility data. Further, we analyze the tradeoff between solution optimality vs. solution efficiency and suggest the best CAC algorithm used to best tradeoff solution optimality for solution efficiency, or vice versa, to satisfy the system's solution requirements. Moreover, we develop design principles that remain applicable despite rapidly evolving wireless network technologies since they can be generalized to deal with management of 'resources' (e.g., wireless channel bandwidth), 'cells' (e.g., cellular networks), "connections" (e.g., service calls with QoS constraints), and "reward optimization" (e.g., revenue optimization in optimal pricing determination) for future wireless service networks. To apply the CAC algorithms developed, we propose an application framework consisting of three stages: workload characterization, call admission control, and application deployment. We demonstrate the applicability with the optimal pricing determination application and the intelligent switch routing application.
- A Comparison of Statistical Filtering Methods for Automatic Term Extraction for Domain AnalysisTilley, Jason W. (Virginia Tech, 2008-12-22)Fourteen word frequency metrics were tested to evaluate their effectiveness in identifying vocabulary in a domain. Fifteen domain engineering projects were examined to measure how closely the vocabularies selected by the fourteen word frequency metrics were to the vocabularies produced by domain engineers. Six filtering mechanisms were also evaluated to measure their impact on selecting proper vocabulary terms. The results of the experiment show that stemming and stop word removal do improve overlap scores and that term frequency is a valuable contributor to overlap. Variations on term frequency are not always significant improvers of overlap.
- Configuration Management for Reusable Software ?Frakes, William B. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2002-12-01)This paper discusses the configuration management of reusable software, and proposes a software libarary architecture that incorporates configuration management.?
- Cyrano: a meta model for federated database systemsDzikiewicz, Joseph (Virginia Tech, 1996-05-01)The emergence of new data models requires further research into federated database systems. A federated database system (FDBS) provides uniform access to multiple heterogeneous databases. Most FDBS's provide access to only the older data models such as relational, hierarchical, and network models. A federated system requires a meta data model. The meta model is a uniform data model through which users access data regardless of the data model of the data's native database. This dissertation examines the question of meta models for use in an FDBS that provides access to relational, object oriented, and rule based databases. This dissertation proposes Cyrano, a hybrid of object oriented and rule based data models. The dissertation demonstrates that Cyrano is suitable as a meta model by showing that Cyrano satisfies the following three criteria: 1) Cyrano fully supports relational, object oriented, and rule based member data models. 2) Cyrano provides sufficient capabilities to support integration of heterogeneous databases. 3) Cyrano can be implemented as the meta model of an operational FDBS. This dissertation describes four primary products of this research: 1) The dissertation presents Cyrano, a meta model designed as part of this research that supports both the older and the newer data models. Cyrano is an example of analytic object orientation. Analytic object orientation is a conceptual approach that combines elements of object oriented and rule based data models. 2) The dissertation describes Roxanne, a proof-of-concept FDBS that uses Cyrano as its meta model. 3) The dissertation proposes a set of criteria for the evaluation of meta models. The dissertation uses these criteria to demonstrate Cyrano's Suitability as a meta model. 4) The dissertation presents an object oriented FDBS reference architecture suitable for use in describing and designing an FDBS.
- The Design and Implementation of the Tako Language and CompilerVasudeo, Jyotindra (Virginia Tech, 2006-05-05)Aliasing complicates both formal and informal reasoning and is a particular problem in object-oriented languages, where variables denote references to objects rather than object values. Researchers have proposed various approaches to the aliasing problem in object-oriented languages, but all use reference semantics to reason about programs. This thesis describes the design and implementation of Tako—a Java-like language that facilitates value semantics by incorporating alias-avoidance. The thesis describes a non-trivial application developed in the Tako language and discusses some of the object-oriented programming paradigm shifts involved in translating that application from Java to Tako. It introduces a proof rule for procedure calls that uses value semantics and accounts for both repeated arguments and subtyping.
- Developing distributed applications with distributed heterogenous databasesDixon, Eric Richard (Virginia Tech, 1993-05-05)This report identifies how Tuxedo fits into the scheme of distributed database processing. Tuxedo is an On-Line Transaction Processing (OLTP) system. Tuxedo was studied because it is the oldest and most widely used transaction processing system on UNIX. That means that it is established, extensively tested, and has the most tools available to extend its capabilities. The disadvantage of Tuxedo is that newer UNIX OLTP systems are often based on more advanced technology. For this reason, other OLTPs were examined to compare their additional capabilities with those offered by Tuxedo. As discussed in Sections I and II, Tuxedo is modeled according to the X/Open's Distributed Transaction Processing (DTP) model. The DTP model includes three pieces: Application Programs (APs), Transaction Monitors (TMs), and Resource Managers (RMs). Tuxedo provides a TM in the model and uses the XA specification to communicate with RMs (e.g. Informix). Tuxedo's TX specification, which defines communications between the APs and TMs is also being considered by X/Open as the standard interface between APs and TMs. There is currently no standard interface between those two pieces. Tuxedo conforms to all X/Open's current standards related to the model. Like the other major OLTPs for UNIX, Tuxedo is based on the client/server model. Tuxedo expands that support to include both synchronous and asynchronous service calls. Tuxedo calls that extension the enhanced client/server model. Tuxedo also expands their OLTP support to allow distributed transactions to include databases on IBM compatible Personal Computers (PCs) and proprietary mainframe (Host) systems. Tuxedo calls this extension Enterprise Transaction Processing (ETP). The name enterprise comes from the fact that since Tuxedo supports database transactions supporting UNIX, PCs. and Host computers, transactions can span the computer systems of entire businesses, or enterprises. Tuxedo is not as robust as the distributed database system model presented by Date. Tuxedo requires programmer participation in providing the capabilities that Date says the distributed database manager should provide. The coordinating process is the process which is coordinating a global transaction. According to Date's model, agents exist on remote sites participating in the transaction in order to handle the calls to the local resource manager. In Tuxedo, the programmer must provide that agent code in the form of services. Tuxedo does provide location transparency, but not in the form Date describes. Date describes location transparency as controlled by a global catalog. In Tuxedo, location transparency is provided by the location of servers as specified in the Tuxedo configuration file. Tuxedo also does not provide replication transparency as specified by Date. In Tuxedo, the programmer must write services which maintain replicated records. Date also describes five problems faced by distributed database managers. The first problem is query processing. Tuxedo provides capabilities to fetch records from databases, but does not provide the capabilities to do joins across distributed databases. The second problem is update propagation. Tuxedo does not provide for replication transparency. Tuxedo does provide enough capabilities for programmers to reliably maintain replicated records. The third problem is concurrency control, which is supported by Tuxedo. The fourth problem is the commit protocol. Tuxedo's commit protocol is the two-phase commit protocol. The fifth problem is the global catalog. Tuxedo does not have a global catalog. The other comparison presented in the paper was between Tuxedo and the other major UNIX OL TPs: Transarc's Encina, Top End, and CICS. Tuxedo is the oldest and has the largest market share. This gives 38 Tuxedo the advantage of being the most thoroughly tested and the most stable. Tuxedo also has the most tools available to extend its capabilities. The disadvantage Tuxedo has is that since it is the oldest, it is based on the oldest technology. Transarc's Encina is the most advanced UNIX OLTP. Encina is based on DCB and supports multithreading. However, Encina has been slow to market and has had stability problems because of its advanced features. Also, since Encina is based on DCB, its success is tied to the success of DCB. Top End is less advanced than Encina, but more advanced than Tuxedo. It is also much more stable than Encina. However. Top End is only now being ported from the NCR machines on which it was originally built. CICS is not yet commercially available. CICS is good for companies with CICS code to port to UNIX and CICS programmers who are already experts. The disadvantage to CICS is that companies which work with UNIX already and do not use CICS will find the interface less natural than Tuxedo, which originated under UNIX.
- Domain Engineering: An Empirical StudyHarris, Charles; Frakes, William B. (Department of Computer Science, Virginia Polytechnic Institute & State University, 2006)This paper presents a summary and analysis of data gathered from thirteen domain engineering projects, participant surveys, and demographic information. Taking a failure modes approach, project data is compared to an ideal model of the DARE methodology, revealing valuable insights into points of failure in the domain engineering process. This study suggests that success is a function of the domain analyst’s command of a specific set of domain engineering concepts and skills, the time invested in the process, and persistence in difficult areas. We conclude by presenting strategies to avoid points of failure in future domain engineering projects.
- Effects of driver characteristics and traffic composition on traffic flowGolden, Gaylynn (Virginia Tech, 1994-05-15)This paper describes the development of simulation models for a variety of traffic flow scenarios. The major goal of the models was to evaluate the effects of driver characteristics and traffic composition on traffic flow. The five scenarios modeled and their respective objectives were as follows: 1. Vehicles switching lanes to increase speed. Objectives were thruput and number of lane switches. 2. Vehicles merging into an adjacent lane. Objectives were distance traveled before merging and number of collisions during lane switching. 3. Vehicles switching from the left or right lane into the center lane. Objectives were number of collisions and number of new misses during lane switching. 4. Vehicles passing on a two-lane bidirectional road. Objective was number of collisions during passing. 5. Vehicles switching from the center lane to the left or right lane to avoid an impassible obstacle. Objectives were number of collisions during lane switching and number of collisions with obstacle. Various driver characteristics were implemented in the models. The concept of preoccupation/attentiveness was factored into the models through the use of varied reaction times. 0ther driver characteristics were incorporated in the models via the assignment of vehicle speed. The models provided for a wide variety of driver types. Examples are as follows: 1. Drivers in a hurry. 2. Tourists or drivers unfamiliar with the area. 3. Law-abiding drivers. 4. Aggressive and passive drivers. 5. Young, inexperienced drivers. 6. Tired truck drivers. The driver characteristics were varied via percentage allocations entered at run-time. The traffic composition for the models consisted of automobiles and multi-axle vehicles of fixed lengths. The percentages for each vehicle type were also entered at run-time. The scope and level of detail for each model was delineated with assumptions. General assumptions made included the following: 1. An automobile is 10 feet long, a multi-axle vehicle is 30 feet long. 2. The width of a lane is such that only one vehicle can be accommodated at a time. 3. A vehicle is considered to be entirely in one lane or another. 4. A vehicle switches lanes instantaneously. 5. The reaction time of an attentive driver is normally distributed with a mean of .5; the reaction time of a preoccupied driver is normally distributed with a mean of .7. Three standard deviations are included to ensure complete population coverage. 6. A collision between two vehicles results in the termination of the vehicle causing the collision: the other vehicle continues. Implementation of these models was performed using the student version of the simulation language GPSS/H. The models were validated. but not verified against their real world counterparts. Test results showed that select driver characteristics can affect Traffic flow; however, the effect of traffic composition was relatively unshown.
- The Effects of Open Source License Choice on Software ReuseBrewer, John VIII (Virginia Tech, 2012-05-04)Previous research shows that software reuse can have a positive impact on software development economics, and that the adoption of a specific open source license can influence how a software product is received by users and programmers. This study attempts to bridge these two research areas by examining how the adoption of an open source license affects software reuse. Two reuse metrics were applied to 9,570 software packages contained in the Fedora Linux software repository. Each package was evaluated to determine how many external components it reuses, as well as how many times it is reused by other software packages. This data was divided into subsets according to license type and software category. The study found that, in general, (1) software released under a restrictive license reuse more external components than software released under a permissive license, and (2) that software released under a permissive license is more likely to be reused than software released under a restrictive license. However, there are exceptions to these conclusions, as the effect of license choice on reuse varies by software category.
- An Empirical Study of a Repeatable Method for Reengineering Procedural Software Systems to Object- Oriented SystemsFrakes, William B.; Kulczycki, Gergory (Department of Computer Science, Virginia Polytechnic Institute & State University, 2009)This paper describes a repeatable method for reengineering a procedural system to an object-oriented system. The method uses coupling metrics to assist a domain expert in identifying candidate objects. An application of the method to a simple program is given, and the effectiveness of the various coupling metrics are discussed. We perform a detailed comparison of our repeatable method with an ad hoc, manual reengineering effort based on the same procedural program. The repeatable method was found to be effective for identifying objects. It produced code that was much smaller, more efficient, and passed more regression tests than the ad hoc method. Analysis of object-oriented metrics indicated both simpler code and less variability among classes for the repeatable method.
- An Empirical Study of Representation Methods for Reusable SoftwareComponentsFrakes, William B.; Pole, T. (Department of Computer Science, Virginia Polytechnic Institute & State University, 1994)An empirical study of methods for representing reusable software components is described. Thirty-five subjects searched for reusable components in a database of UNIX tools using four different representation methods: attribute-value, enumerated, faceted, and keyword. The study used Proteus, a reuse library system that supports multiple representation methods. Searching effectiveness was measured with recall, precision, and overlap. Search time for the four methods was also compared. Subjects rated the methods in terms of preference and helpfulness in understanding components. Some principles for constructing reuse libraries, based on the results of this study, are discussed.
- An Empirical Study of Reuse, Quality, and ProductivityFrakes, William B.; Succi, Giancarlo (Department of Computer Science, Virginia Polytechnic Institute & State University, 1997-08-01)This paper presents an analysis of four sets of industrial data to determine if software reuse is correlated with higher levels of software quality and productivity.
- Evaluating Term Extraction Methods for Domain AnalysisNemallapudi, Chaitanya (Virginia Tech, 2010-08-02)This study compared the vocabularies created by various domain experts and the source documents selected by them to create the vocabulary. The results indicate that there is similarity among the vocabularies created and the source documents selected. Also, the relationship between the overlap scores of vocabularies created and overlap scores of source documents selected was tested and it was observed that no significant relation exists between them. In addition, the variability of the overlap scores of the vocabularies generated automatically to the variability of the overlap scores of those produced manually by domain experts was evaluated. The results suggested that these vocabularies are significantly different from each other.
- Exploratory Study of the Impact of Value and Reference Semantics on ProgrammingKhedekar, Neha N. (Virginia Tech, 2007-08-10)In this thesis, we measure the impact of reference semantics on programming and reasoning. We designed a survey to compare how well programmers perform under three different programming paradigms. Two of the paradigms, object-copying and swapping use value semantics, while the third, reference-copying, uses reference semantics. We gave the survey to over 25 people. We recorded number of questions answered correctly in each paradigm and the amount of time it took to answer each question. We looked at the overall results as well as the results within various levels of Java experience. Based on anecdotal evidence from the literature, we expected questions based on value semantics to be easier than questions based on reference semantics. The results did not yield differences that were statistically significant, but they did conform to our general expectations. While further investigation is clearly needed, we believe that this work represents an important first step in the empirical analysis of a topic that has previously only been discussed informally.
- Factors Affecting the Design and Use of Reusable ComponentsAnguswamy, Reghu (Virginia Tech, 2013-07-31)Designing software components for future reuse has been an important area in software engineering. A software system developed with reusable components follows a "with" reuse process while a component designed to be reused in other systems follows a "for" reuse process. This dissertation explores the factors affecting design for reuse and design with reusable components through empirical studies. The studies involve Java components implementing a particular algorithm, a stemming algorithm that is widely used in the conflation domain. The method and empirical approach are general and independent of the programming language. Such studies may be extended to other types of components, for example, components implementing data structures such as stacks, queues etc. Design for reuse: In this thesis, the first study was conducted analyzing one-use and equivalent reusable components for the overhead in terms of component size, effort required, number of parameters, and productivity. Reusable components were significantly larger than their equivalent one-use components and had significantly more parameters. The effort required for the reusable components was higher than for one-use components. The productivity of the developers was significantly lower for the reusable components compared to the one-use components. Also, during the development of reusable components, the subjects spent more time on writing code than designing the components, but not significantly so. A ranking of the design principles by frequency of use is also reported. A content analysis performed on the feedback is also reported and the reasons for using and not using the reuse design principles are identified. A correlation analysis showed that the reuse design principles were, in general, used independently of each other. Design with reuse: Through another empirical study, the effect of the size of a component and the reuse design principles used in building the component on the ease of reuse were analyzed. It was observed that the higher the complexity the lower the ease of reuse, but the correlation is not significant. When considered independently, four of the reuse design principles: well-defined interface, clarity and understandability, generality, and separate concepts from content significantly increased the ease of reuse while commonality and variability analysis significantly decreased the ease of reuse, and documentation did not have a significant impact on the ease of reuse. Experience in the programming language had no significant relationship with the reusability of components. Experience in software engineering and software reuse showed a relationship with reusability but the effect size was small. Testing components before integrating them into a system was found to have no relationship with the reusability of components. A content analysis of the feedback is presented identifying the challenges of components that were not easy to reuse. Features that make a component easily reusable were also identified. The Mahalanobis-Taguchi Strategy (MTS) was employed to develop a model based on Mahalanobis Distance to identify the factors that can detect if a component is easy to reuse or not. The identified factors within the model are: size of a component, a set of reuse design principles (well-defined interface, clarity and understandability, commonality and variability analysis, and generality), and component testing.
- Formal Specification and Verification of Data-Centric Web ServicesMoustafa, Iman Saleh (Virginia Tech, 2012-02-10)In this thesis, we develop and evaluate a formal model and contracting framework for data-centric Web services. The central component of our framework is a formal specification of a common Create-Read-Update-Delete (CRUD) data store. We show how this model can be used in the formal specification and verification of both basic and transactional Web service compositions. We demonstrate through both formal proofs and empirical evaluations that our proposed framework significantly decreases ambiguity about a service, enhances its reuse, and facilitates detection of errors in service-based implementations. Web Services are reusable software components that make use of standardized interfaces to enable loosely-coupled business-to-business and customer-to-business interactions over the Web. In such environments, service consumers depend heavily on the service interface specification to discover, invoke, and synthesize services over the Web. Data-centric Web services are services whose behavior is determined by their interactions with a repository of stored data. A major challenge in this domain is interpreting the data that must be marshaled between consumer and producer systems. While the Web Services Description Language (WSDL) is currently the de facto standard for Web services, it only specifies a service operation in terms of its syntactical inputs and outputs; it does not provide a means for specifying the underlying data model, nor does it specify how a service invocation affects the data. The lack of data specification potentially leads to erroneous use of the service by a consumer. In this work, we propose a formal contract for data-centric Web services. The goal is to formally and unambiguously specify the service behavior in terms of its underlying data model and data interactions. We address the specification of a single service, a flow of services interacting with a single data store, and also the specification of distributed transactions involving multiple Web services interacting with different autonomous data stores. We use the proposed formal contract to decrease ambiguity about a service behavior, to fully verify a composition of services, and to guarantee correctness and data integrity properties within a transactional composition of services.
- A graphical alternative to direct SQL based queryingBeasley, Johnita (Virginia Tech, 1993-05-04)SQL provides a fairly straightforward means of querying database data. However, as with all command languages, SQL can get very complicated, even for experienced programmers. This complexity can be intimidating to the novice or intermediate user who needs to access data from a database with complex SQL statements, especially when users don't want to know or even become familiar with a command oriented query language like SQL.