Browsing by Author "Fan, Weiguo Patrick"
Now showing 1 - 20 of 35
Results Per Page
Sort Options
- Acceleration of Hardware Testing and Validation Algorithms using Graphics Processing UnitsLi, Min (Virginia Tech, 2012-09-17)With the advances of very large scale integration (VLSI) technology, the feature size has been shrinking steadily together with the increase in the design complexity of logic circuits. As a result, the efforts taken for designing, testing, and debugging digital systems have increased tremendously. Although the electronic design automation (EDA) algorithms have been studied extensively to accelerate such processes, some computational intensive applications still take long execution times. This is especially the case for testing and validation. In order tomeet the time-to-market constraints and also to come up with a bug-free design or product, the work presented in this dissertation studies the acceleration of EDA algorithms on Graphics Processing Units (GPUs). This dissertation concentrates on a subset of EDA algorithms related to testing and validation. In particular, within the area of testing, fault simulation, diagnostic simulation and reliability analysis are explored. We also investigated the approaches to parallelize state justification on GPUs, which is one of the most difficult problems in the validation area. Firstly, we present an efficient parallel fault simulator, FSimGP2, which exploits the high degree of parallelism supported by a state-of-the-art graphic processing unit (GPU) with the NVIDIA Compute Unified Device Architecture (CUDA). A novel three-dimensional parallel fault simulation technique is proposed to achieve extremely high computation efficiency on the GPU. The experimental results demonstrate a speedup of up to 4Ã compared to another GPU-based fault simulator. Then, another GPU based simulator is used to tackle an even more computation-intensive task, diagnostic fault simulation. The simulator is based on a two-stage framework which exploits high computation efficiency on the GPU. We introduce a fault pair based approach to alleviate the limited memory capacity on GPUs. Also, multi-fault-signature and dynamic load balancing techniques are introduced for the best usage of computing resources on-board. With continuously feature size scaling and advent of innovative nano-scale devices, the reliability analysis of the digital systems becomes more important nowadays. However, the computational cost to accurately analyze a large digital system is very high. We proposes an high performance reliability analysis tool on GPUs. To achieve highmemory bandwidth on GPUs, two algorithms for simulation scheduling and memory arrangement are proposed. Experimental results demonstrate that the parallel analysis tool is efficient, reliable and scalable. In the area of design validation, we investigate state justification. By employing the swarm intelligence and the power of parallelism on GPUs, we are able to efficiently find a trace that could help us reach the corner cases during the validation of a digital system. In summary, the work presented in this dissertation demonstrates that several applications in the area of digital design testing and validation can be successfully rearchitected to achieve maximal performance on GPUs and obtain significant speedups. The proposed algorithms based on GPU parallelism collectively aim to contribute to improving the performance of EDA tools in Computer aided design (CAD) community on GPUs and other many-core platforms.
- Applying the 5S Framework To Integrating Digital LibrariesShen, Rao (Virginia Tech, 2006-04-17)We formalize the digital library (DL) integration problem and propose an overall approach based on the 5S (Streams, Structures, Spaces, Scenarios, and Societies) framework. We then apply that framework to integrate domain-specific (archaeological) DLs, illustrating our solutions for key problems in DL integration. An integrated Archaeological DL, ETANA-DL, is used as a case study to justify and evaluate our DL integration approach. We develop a minimum metamodel for archaeological DLs within the 5S theory. We implement the 5SSuite toolkit set to cover the process of union DL generation, including requirements gathering, conceptual modeling, rapid prototyping, and code generation. 5SSuite consists of 5SGraph, 5SGen, and SchemaMapper, which plays an important role during integration. SchemaMapper, a visual mapping tool, maps the schema of diverse DLs into a global schema for a union DL and generates a wrapper for each individual DL. Each wrapper transforms the metadata catalog of its DL to one conforming to the global schema. The converted catalogs are stored in the union catalog, so that the union DL has a global metadata format and union catalog. We also propose a formal approach to DL exploring services for integrated DLs based on 5S, which provides a systematic and functional method to design and implement DL exploring services. Finally, we propose a DL success model to assess integrated DLs from the perspective of DL end users by integrating 5S theory with diverse research on information systems success and adoption models, and information-seeking behavior models.
- A Deep Learning Based Pipeline for Image Grading of Diabetic RetinopathyWang, Yu (Virginia Tech, 2018-06-21)Diabetic Retinopathy (DR) is one of the principal sources of blindness due to diabetes mellitus. It can be identified by lesions of the retina, namely microaneurysms, hemorrhages, and exudates. DR can be effectively prevented or delayed if discovered early enough and well-managed. Prior studies on diabetic retinopathy typically extract features manually but are time-consuming and not accurate. In this research, we propose a research framework using advanced retina image processing, deep learning, and a boosting algorithm for high-performance DR grading. First, we preprocess the retina image datasets to highlight signs of DR, then follow by a convolutional neural network to extract features of retina images, and finally apply a boosting tree algorithm to make a prediction based on extracted features. Experimental results show that our pipeline has excellent performance when grading diabetic retinopathy images, as evidenced by scores for both the Kaggle dataset and the IDRiD dataset.
- Determinants and Consequences of Earnings Disclosure ReadabilityMeckfessel, Michele Dawn (Virginia Tech, 2012-02-03)This research examines whether changes in the regulatory environment (Plain English Guidelines, Reg. FD and SOX), management pessimism, and meeting/beating or missing analyst forecasts have had an impact on earnings disclosure readability over the 1997-2007 timeframe and whether firm managers are able to make negative firm financial information less transparent to the market by making negative earnings disclosures less readable. The idea that management may attempt to reduce the impact of bad news by making it more costly to analyze is not new. However, studying the qualitative aspects of the unaudited earnings disclosures is a unique setting and extends previous work on annual report readability. This study finds that the Plain English Guidelines, Reg. FD and SOX had differential impacts on earnings disclosure readability. Additionally, it finds that earnings disclosure readability decreases as firm earnings decrease. Moreover, this study demonstrates that institutional investors contribute to earnings disclosure readability and may serve as monitors of management in this regard. Finally, firms that beat analyst forecasts have more readable earnings disclosures. This study not only contributes to the body of academic literature, but also informs regulators regarding their ability to induce firm management to write more informative earnings disclosures.
- A Digital Library Success Model for Computer Science Student Use of a Meta-Search SystemVidya Sagar, Vikram Raj (Virginia Tech, 2006-08-04)The success of any product of Information Technology lies in its acceptance by the target audience. Several behavioral models have been formulated to analyze factors that affect human decisions to accept new technology while some technology is already in place. These models enable us to identify the areas of concern within the system and its environment and to address them. However, these models are based in industrial settings, and are more suited to situations when a person is introduced to the field of Information Technology. A separate stream of research tries to model the factors that cause an Information System, especially at the workplace, to be termed a success. No such models exist for the academic community and the Computer Science student community, in particular. In this thesis, the success of a new academic meta-search system for the Computer Science student community is measured and the extent to which various factors affect this success is identified. For this purpose, an Information System success model is composed with the help of models for technology acceptance and Digital Library quality metrics. The resultant model is then used to formulate a survey instrument and the results of a user study with this instrument are used to begin to validate this model.
- Effective Search in Online Knowledge Communities: A Genetic Algorithm ApproachZhang, Xiaoyu (Virginia Tech, 2009-09-11)Online Knowledge Communities, also known as online forum, are popular web-based tools that allow members to seek and share knowledge. Documents to answer varieties of questions are associated with the process of knowledge exchange. The social network of members in an Online Knowledge Community is an important factor to improve search precision. However, prior ranking functions don't handle this kind of document with using this information. In this study, we try to resolve the problem of finding authoritative documents for a user query within an Online Knowledge Community. Unlike prior ranking functions which consider either content based feature, hyperlink based feature, or document structure based feature, we explored the Online Knowledge Community social network structure and members social interaction activities to design features that can gauge the two major factors affecting user knowledge adoption decision: argument quality and source credibility. We then design a customized Genetic Algorithm to adjust the weights for new features we proposed. We compared the performance of our ranking strategy with several others baselines on a real world data www.vbcity.com/forums/. The evaluation results demonstrated that our method could improve the user search satisfaction with an obviously percentage. At the end, we concluded that our approach based on knowledge adoption model and Genetic Algorithm is a better ranking strategy in the Online Knowledge Community.
- Efficient Algorithms for Mining Data StreamsBoedihardjo, Arnold Priguna (Virginia Tech, 2010-08-10)Data streams are ordered sets of values that are fast, continuous, mutable, and potentially unbounded. Examples of data streams include the pervasive time series which span domains such as finance, medicine, and transportation. Mining data streams require approaches that are efficient, adaptive, and scalable. For several stream mining tasks, knowledge of the data's probability density function (PDF) is essential to deriving usable results. Providing an accurate model for the PDF benefits a variety of stream mining applications and its successful development can have far-reaching impact to the general discipline of stream analysis. Therefore, this research focuses on the construction of efficient and effective approaches for estimating the PDF of data streams. In this work, kernel density estimators (KDEs) are developed that satisfy the stringent computational stipulations of data streams, model unknown and dynamic distributions, and enhance the estimation quality of complex structures. Contributions of this work include: (1) theoretical development of the local region based KDE; (2) construction of a local region based estimation algorithm; (3) design of a generalized local region approach that can be applied to any global bandwidth KDE to enhance estimation accuracy; and (4) application extension of the local region based KDE to multi-scale outlier detection. Theoretical development includes the formulation of the local region concept to effectively approximate the computationally intensive adaptive KDE. This work also analyzes key theoretical properties of the local region based approach which include (amongst others) its expected performance, an alternative local region construction criterion, and its robustness under evolving distributions. Algorithmic design includes the development of a specific estimation technique that reduces the time/space complexities of the adaptive KDE. In order to accelerate mining tasks such as outlier detection, an integrated set of optimizations are proposed for estimating multiple density queries. Additionally, the local region concept is extended to an efficient algorithmic framework which can be applied to any global bandwidth KDEs. The combined solution can significantly improve estimation accuracy while retaining overall linear time/space costs. As an application extension, an outlier detection framework is designed which can effectively detect outliers within multiple data scale representations.
- Efficient Concurrent Operations in Spatial DatabasesDai, Jing (Virginia Tech, 2009-09-04)As demanded by applications such as GIS, CAD, ecology analysis, and space research, efficient spatial data access methods have attracted much research. Especially, moving object management and continuous spatial queries are becoming highlighted in the spatial database area. However, most of the existing spatial query processing approaches were designed for single-user environments, which may not ensure correctness and data consistency in multiple-user environments. This research focuses on designing efficient concurrent operations on spatial datasets. Current multidimensional data access methods can be categorized into two types: 1) pure multidimensional indexing structures such as the R-tree family and grid file; 2) linear spatial access methods, represented by the Space-Filling Curve (SFC) combined with B-trees. Concurrency control protocols have been designed for some pure multidimensional indexing structures, but none of them is suitable for variants of R-trees with object clipping, which are efficient in searching. On the other hand, there is no concurrency control protocol designed for linear spatial indexing structures, where the one-dimensional concurrency control protocols cannot be directly applied. Furthermore, the recently designed query processing approaches for moving objects have not been protected by any efficient concurrency control protocols. In this research, solutions for efficient concurrent access frameworks on both types of spatial indexing structures are provided, as well as for continuous query processing on moving objects, for multiple-user environments. These concurrent access frameworks can satisfy the concurrency control requirements, while providing outstanding performance for concurrent queries. Major contributions of this research include: (1) a new efficient spatial indexing approach with object clipping technique, ZR+-tree, that outperforms R-tree and R+-tree on searching; (2) a concurrency control protocol, GLIP, to provide high throughput and phantom update protection on spatial indexing with object clipping; (3) efficient concurrent operations for indices based on linear spatial access methods, which form up the CLAM protocol; (4) efficient concurrent continuous query processing on moving objects for both R-tree-based and linear spatial indexing frameworks; (5) a generic access framework, Disposable Index, for optimal location update and parallel search.
- ETANA-CMV: A coordinated multiple view visual browsing interface for ETANA-DLSam Rajkumar, Johnny L. (Virginia Tech, 2006-12-13)Archeological research embracing complex Information Technology techniques can result in vast quantities of heterogeneous information from different sites in different formats. ETANA-DL is an Archeological Digital Library (DL), providing services suited for the archeological domain. With a growing collection of records in the DL, it is a challenge to present them in an organized and meaningful way. We have designed a new visual browsing interface called ETANA-CMV that aims to provide users a richer and more insightful browsing experience. ETANA-CMV allows users to navigate through the records in ETANA-DL that are multidimensional, hierarchical, and categorical in nature. ETANA-CMV was designed to be scalable, flexible, and easy to learn. This interface employs a data cube based browsing index to counter performance issues that usually limit the interactivity of visual browsing interfaces to DLs. The interface has been integrated with the existing Browse Interface and the search service in ETANA-DL. Formative evaluation of the new visual interface led to several improvements in the interface. It appears that users were able to detect trends in the DL collections more accurately using visualization based strategies than with the existing textual browse interface.
- Examining the Continued Usage of Electronic Knowledge Repositories: An Integrated ModelLin, Hui (Virginia Tech, 2008-03-19)Knowledge has long been recognized as one of the most valuable assets in an organization. Managing and organizing knowledge has become an important corporate strategy for organizations to gain and maintain competitive advantages in the information age. Electronic knowledge repositories (EKRs) have become increasingly popular knowledge sharing tools implemented by organizations to promote knowledge reuse. The goal of this study is to develop and test a research model that explains users' continued usage behavior of EKRs in public accounting firms. Theoretically grounded in the expectation-confirmation model (ECM) and commitment-based model, the research model presented in this study integrates both of these theoretical perspectives to study users' EKR continuance intentions. This study surveyed 230 EKR users from four large public accounting firms. Partial least squares regression was used to test the hypotheses and the explanatory power of the model. Results indicate that perceived usefulness and commitment exhibit a sustained positive influence on continuance intention. Additionally, subjective norms are positively related to calculative commitment and moral commitment. Organizational identification is positively related to affective commitment and moral commitment. Perceived usefulness is positively related to affective commitment and calculative commitment. The model comparisons with the technology acceptance model (TAM) and ECM demonstrated that the integrated model presented in this research explained 1.6% and 0.8% additional variance in continuance intention than both ECM and TAM respectively. Additional multi-group analyses were also conducted to examine the differences between knowledge seekers and contributors and the differences between knowledge novices and experts. This study raises theoretical implications in the area of knowledge management in general and EKRs in particular. It represents one of the first attempts to empirically examine users' continuance intention of knowledge management applications. This study has presented a different perspective on technology acceptance/continued usage by introducing commitment to explain continued IS usage. By integrating commitment and ECM, this study offers a useful framework for future studies on technology use. It demonstrates that both user commitment and perceived usefulness are strong predictors of EKR continuance intention. The results also raise interesting implications for practitioners interested in knowledge management and particularly for public accounting firms how to leverage EKRs to gain a competitive advantage.
- Exploring Hybrid Dynamic and Static Techniques for Software VerificationCheng, Xueqi (Virginia Tech, 2010-02-04)With the growing importance of software on which human lives increasingly depend, the correctness requirement of the underlying software becomes especially critical. However, the increasing complexities and sizes of modern software systems pose special challenges on the effectiveness as well as efficiency of software verification. Two major obstacles include the quality of test generation in terms of error detection in software testing and the state space explosion problem in software formal verification (model checking). In this dissertation, we investigate several hybrid techniques that explore dynamic (with program execution), static (without program execution) as well as the synergies of multiple approaches in software verification from the perspectives of testing and model checking. For software testing, a new simulation-based internal variable range coverage metric is proposed with the goal of enhancing the error detection capability of the generated test data when applied as the target metric. For software model checking, we utilize various dynamic analysis methods, such as data mining, swarm intelligence (ant colony optimization), to extract useful high-level information from program execution data. Despite being incomplete, dynamic program execution can still help to uncover important program structure features and variable correlations. The extracted knowledge, such as invariants in different forms, promising control flows, etc., is then used to facilitate code-level program abstraction (under-approximation/over-approximation), and/or state space partition, which in turn improve the performance of property verification. In order to validate the effectiveness of the proposed hybrid approaches, a wide range of experiments on academic and real-world programs were designed and conducted, with results compared against the original as well as the relevant verification methods. Experimental results demonstrated the effectiveness of our methods in improving the quality as well as performance of software verification. For software testing, the newly proposed coverage metric constructed based on dynamic program execution data is able to improve the quality of test cases generated in terms of mutation killing — a widely applied measurement for error detection. For software model checking, the proposed hybrid techniques greatly take advantage of the complementary benefits from both dynamic and static approaches: the lightweight dynamic techniques provide flexibility in extracting valuable high-level information that can be used to guide the scope and the direction of static reasoning process. It consequently results in significant performance improvement in software model checking. On the other hand, the static techniques guarantee the completeness of the verification results, compensating the weakness of dynamic methods.
- A framework for finding and summarizing product defects, and ranking helpful threads from online customer forums through machine learningJiao, Jian (Virginia Tech, 2013-06-05)The Internet has revolutionized the way users share and acquire knowledge. As important and popular Web-based applications, online discussion forums provide interactive platforms for users to exchange information and report problems. With the rapid growth of social networks and an ever increasing number of Internet users, online forums have accumulated a huge amount of valuable user-generated data and have accordingly become a major information source for business intelligence. This study focuses specifically on product defects, which are one of the central concerns of manufacturing companies and service providers, and proposes a machine learning method to automatically detect product defects in the context of online forums. To complement the detection of product defects , we also present a product feature extraction method to summarize defect threads and a thread ranking method to search for troubleshooting solutions. To this end, we collected different data sets to test these methods experimentally and the results of the tests show that our methods are very promising: in fact, in most cases, they outperformed the current state-of-the-art methods.
- Identifying Product Defects from User Complaints: A Probabilistic Defect ModelZhang, Xuan; Qiao, Zhilei; Tang, Lijie; Fan, Weiguo Patrick; Fox, Edward A.; Wang, Gang Alan (Department of Computer Science, Virginia Polytechnic Institute & State University, 2016-03-02)The recent surge in using social media has created a massive amount of unstructured textual complaints about products and services. However, discovering and quantifying potential product defects from large amounts of unstructured text is a nontrivial task. In this paper, we develop a probabilistic defect model (PDM) that identifies the most critical product issues and corresponding product attributes, simultaneously. We facilitate domain-oriented key attributes (e.g., product model, year of production, defective components, symptoms, etc.) of a product to identify and acquire integral information of defect. We conduct comprehensive evaluations including quantitative evaluations and qualitative evaluations to ensure the quality of discovered information. Experimental results demonstrate that our proposed model outperforms existing unsupervised method (K-Means Clustering), and could find more valuable information. Our research has significant managerial implications for mangers, manufacturers, and policy makers.
- Intelligent Fusion of Evidence from Multiple Sources for Text ClassificationZhang, Baoping (Virginia Tech, 2006-06-20)Automatic text classification using current approaches is known to perform poorly when documents are noisy or when limited amounts of textual content is available. Yet, many users need access to such documents, which are found in large numbers in digital libraries and in the WWW. If documents are not classified, they are difficult to find when browsing. Further, searching precision suffers when categories cannot be checked, since many documents may be retrieved that would fail to meet category constraints. In this work, we study how different types of evidence from multiple sources can be intelligently fused to improve classification of text documents into predefined categories. We present a classification framework based on an inductive learning method -- Genetic Programming (GP) -- to fuse evidence from multiple sources. We show that good classification is possible with documents which are noisy or which have small amounts of text (e.g., short metadata records) -- if multiple sources of evidence are fused in an intelligent way. The framework is validated through experiments performed on documents in two testbeds. One is the ACM Digital Library (using a subset available in connection with CITIDEL, part of NSF's National Science Digital Library). The other is Web data, in particular that portion associated with the Cadê Web directory. Our studies have shown that improvement can be achieved relative to other machine learning approaches if genetic programming methods are combined with classifiers such as kNN. Extensive analysis was performed to study the results generated through the GP-based fusion approach and to understand key factors that promote good classification.
- Iterative Computing over a Unified Relationship Matrix for Information IntegrationXi, Wensi (Virginia Tech, 2006-06-20)In this dissertation I use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects and their inter-relationships. I argue that integrated and iterative computations over the Unified Relationship Matrix can help overcome the data sparseness problem (a common situation in various information application scenarios), and detect latent relationships (such as latent term associations discovered by LSI) among heterogeneous data objects. Thus, this kind of computation can be used to improve the quality of various information applications that require combining information from heterogeneous data sources. To support the argument, I further develop a unified link analysis algorithm, the Link Fusion algorithm, and a unified similarity-calculating algorithm, the SimFusion algorithm. Both algorithms attempt to better integrate information from heterogeneous sources by iteratively computing over the Unified Relationship Matrix in order to calculate some specific property of data object(s); such as the importance of a data object (as in the Link Fusion algorithm) and the similarity between a pair of data objects (as in the SimFusion algorithm). Then, I develop two set of experiments on real-world datasets to investigate whether the algorithms proposed in this dissertation can better integrate information from multiple sources. The performance of the algorithms is compared to that of traditional link analysis and similarity-calculating algorithms. Experimental results show that the algorithms developed can significantly outperform the traditional link analysis and similarity-calculating algorithms. I further investigate various pruning technologies aiming at improving efficiency and investigating the scalability of the algorithms designed. Experimental results showed that pruning technology can effectively be used to improve the efficiency of the algorithms.
- A Machine Learning Approach for Data Unification and Its Application in Asset Performance ManagementHe, Bin (Virginia Tech, 2016-03-28)The amount of data is growing fast with the advance of data capturing and management technologies. However, data from different source are often isolated and not ready to be analyzed together as one data set. The effort of connecting pieces of isolated data into a unified data set is time consuming and often costly in terms of cognitive load and programming time. To address this problem, here we proposed an approach using machine learning to augment human intelligence in the data unification process, especially complex categorical data value unification. Many aspects of useful information are extracted from supervised machine learning models, then used to amplify intelligence of human experts in various aspects of the data unification process. An empirical study is performed applying the proposed methodology to the field of Asset Performance Management, specifically focus only on the performance of equipment asset. The experiments show that machine learning helps experts in the unification standard generation, unified value suggestion, batch data unification. We conclude that machine learning models contain valuable information that can facilitate the data unification process.
- Mitigating the Effects of Interruption on Audit Efficiency and EffectivenessLong, James Harvey (Virginia Tech, 2009-03-27)This dissertation examined the effects of interruption on auditor efficiency and effectiveness for one simple and two complex tasks within the audit domain. I evaluated these effects for novice and experienced auditors. In addition, I considered two ways in which the negative effects of interruption might be mitigated: varying an individual's interruption response strategy (immediate vs. negotiated) and the presence or absence of a memory-aid (notes). I investigated these phenomena using an internet-based experimental instrument. Subjects included both students and practicing auditors. My findings indicate that interruption hindered performance on certain complex audit tasks, and that it differentially affected auditor performance at two levels of experience. When interrupted, inexperienced auditors completed complex audit tasks less efficiently; experienced auditors completed them less effectively. In addition, experienced auditors who negotiated interruption completed a complex audit task more efficiently and effectively than those that responded to the interruption immediately. Furthermore, note-taking increased experienced auditor task efficiency on a complex audit task requiring judgment. These results suggest that auditors should limit task interruption when they are engaged in complex audit tasks. When task interruption cannot be avoided, auditors should consider negotiating a delay in the onset of an interruption. Finally, auditors who are interrupted while they complete a complex task requiring judgment should consider using notes to mitigate the deleterious effect of interruption on task efficiency. Participants also completed a post-experimental questionnaire which provided evidence about interruptions in the audit environment. The responses confirmed that auditors are frequently interrupted in practice. In addition, auditors preferred differing interruption response strategies dependent upon both the level of primary task complexity (easy vs. difficult), and the medium through which the interruption occurred (electronic vs. interpersonal). They chose interruption response strategies according to their place in the social hierarchy relative to the interrupter (client/boss vs. subordinate /friends/family). Finally, I found that interruption influences affect. Auditors reported significantly more positive affect reactions to interruption on easy tasks (e.g., alert, cheerful, friendly, happy and relaxed) and substantially negative affect reactions to interruption on difficult tasks (e.g., angry, hostile, irritated, nervous and tense).
- A Novel Hybrid Focused Crawling Algorithm to Build Domain-Specific CollectionsChen, Yuxin (Virginia Tech, 2007-02-05)The Web, containing a large amount of useful information and resources, is expanding rapidly. Collecting domain-specific documents/information from the Web is one of the most important methods to build digital libraries for the scientific community. Focused Crawlers can selectively retrieve Web documents relevant to a specific domain to build collections for domain-specific search engines or digital libraries. Traditional focused crawlers normally adopting the simple Vector Space Model and local Web search algorithms typically only find relevant Web pages with low precision. Recall also often is low, since they explore a limited sub-graph of the Web that surrounds the starting URL set, and will ignore relevant pages outside this sub-graph. In this work, we investigated how to apply an inductive machine learning algorithm and meta-search technique, to the traditional focused crawling process, to overcome the above mentioned problems and to improve performance. We proposed a novel hybrid focused crawling framework based on Genetic Programming (GP) and meta-search. We showed that our novel hybrid framework can be applied to traditional focused crawlers to accurately find more relevant Web documents for the use of digital libraries and domain-specific search engines. The framework is validated through experiments performed on test documents from the Open Directory Project. Our studies have shown that improvement can be achieved relative to the traditional focused crawler if genetic programming and meta-search methods are introduced into the focused crawling process.
- Online Knowledge Community Mining and Modeling for Effective Knowledge ManagementLiu, Xiaomo (Virginia Tech, 2013-05-08)More and more in recent years, activities that people once did in the real world they now do in virtual space. In particular, online communities have become popular and efficient media for people all over the world to seek and share knowledge in domains that interest them. Such communities are called online knowledge communities (OKCs). Large-scale OKCs may comprise thousands of community members and archive many more online messages. As a result, problems such as how to identify and manage the knowledge collected and how to understand people\'s knowledge-sharing behaviors have become major challenges for leveraging online knowledge to sustain community growth. In this dissertation I examine three important factors of managing knowledge in OKCs. First, I focus on how to build successful profiles for community members that describe their domain expertise. These expertise profiles are potentially important for directing questions to the right people and, thus, can improve the community\'s overall efficiency and efficacy. To address this issue, I present a comparative study of models of expertise profiling in online communities and identify the model combination that delivers the best results. Next, I investigate how to automatically assess the information helpfulness of user postings. Due to the voluntary nature of online participation, there is no guarantee that all user-generated content (UGC) will be helpful. It is also difficult, given the sheer amount of online postings, for knowledge seekers to find information quickly that satisfies their informational needs. Therefore, I propose a theory-driven text classification framework based on the knowledge adoption model (KAM) for predicting the helpfulness of UGC in OKCs. I test the effectiveness of this framework at both the thread level and the post level of online messages. Any given OKC generally has a huge number of individuals participating in online discussions, but exactly what, where, when and how they seek and share knowledge are still not fully understood or documented. In the last part of the dissertation, I describe a multi-level study of the knowledge-sharing behaviors of users in OKCs. Both exploratory data analysis and network analysis are applied to thread, forum and community levels of online data. I present a number of interesting findings on social dynamics in knowledge sharing and diffusion. These findings potentially have important implications for both the theory and practice of online community knowledge management.
- Political Participation in a Digital Age: An Integrated Perspective on the Impacts of the Internet on Voter TurnoutCarter, Lemuria D. (Virginia Tech, 2006-04-12)E-government is the use of information technology, especially telecommunications, to enable and improve the efficiency with which government services and information are provided to its constituents. Internet voting is an emerging e-government initiative. It refers to the submission of votes securely and secretly over the Internet. In the United States some areas have already used Internet voting systems for local and state elections. Many researchers argue that one of the most important social impacts of Internet voting is the effect it could have on voter participation. Numerous studies have called for research on the impact of technology on voter turnout; however, existing literature has yet to develop a comprehensive model of the key factors that influence Internet voting adoption. In light of the gradual implementation of I-voting systems and the need for research on I-voting implications this study combines political science and information systems constructs to present an integrated model of Internet voter participation. The proposed model of Internet voting adoption posits that a combination of technical, political and demographic factors amalgamate to influence the adoption of I-voting services. The study was conducted by surveying 372 citizens ranging in age from 18-75. The findings indicate that an integrated model of I-voting adoption is superior to existing models that explore political science or technology adoption constructs in isolation. Implications of this study for research and practice are presented.