Virginia Tech
    • Log in
    View Item 
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    •   VTechWorks Home
    • ETDs: Virginia Tech Electronic Theses and Dissertations
    • Masters Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Active Learning with Combinatorial Coverage

    Thumbnail
    View/Open
    Katragadda_S_T_2022.pdf (7.040Mb)
    Downloads: 33
    Date
    2022-08-04
    Author
    Katragadda, Sai Prathyush
    Metadata
    Show full item record
    Abstract
    Active learning is a practical field of machine learning as labeling data or determining which data to label can be a time consuming and inefficient task. Active learning automates the process of selecting which data to label, but current methods are heavily model reliant. This has led to the inability of sampled data to be transferred to new models as well as issues with sampling bias. Both issues are of crucial concern in machine learning deployment. We propose active learning methods utilizing Combinatorial Coverage to overcome these issues. The proposed methods are data-centric, and through our experiments we show that the inclusion of coverage in active learning leads to sampling data that tends to be the best in transferring to different models and has a competitive sampling bias compared to benchmark methods.
    General Audience Abstract
    Machine learning (ML) models are being used frequently in a variety of applications. For the model to be able to learn, data is required. Processing this data is often one of the most, if not the most, time consuming aspects of utilizing ML. One especially burdensome aspect of data processing is data labeling, or determining what each data point corresponds to in terms of real world class. For example, determining if a data point that is an image contains a plane or bird. This way the ML model can learn from the data. Active learning is a sub-field of machine learning which aims to ease this burden by allowing the model to select which data would be most beneficial to label, so that the entirety of the dataset does not need to be labeled. The issue with current active learning methods is that they are highly model dependent. In machine learning deployment the model being used may change while data stays the same, so this model dependency can cause for data points we label with respect to one model to not be ideal for another model. This model dependency has led to sampling bias issues as well; points which are chosen to be labeled may all be similar or not representative of all data resulting in the ML model not being as knowledgeable as possible. Relevant work has focused on the sampling bias issue, and several methods have been proposed to combat this issue. Few of the methods are applicable to any type of ML model though. The issue of sampled points not generalizing to different models has been studied but no solutions have been proposed. In this work we present active learning methods using Combinatorial Coverage. Combinatorial Coverage is a statistical technique from the field of Design of Experiments, and has commonly been used to design test sets. The extension of Combinatorial Coverage to ML is newer, and provides a way to focus on the data. We show that this data focused approach to active learning achieves a better performance when the sampled data is used for a different model and that it achieves a competitive sampling bias compared to benchmark methods.
    URI
    http://hdl.handle.net/10919/111467
    Collections
    • Masters Theses [21205]

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us
     

     

    VTechWorks

    AboutPoliciesHelp

    Browse

    All of VTechWorksCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    Log inRegister

    Statistics

    View Usage Statistics

    If you believe that any material in VTechWorks should be removed, please see our policy and procedure for Requesting that Material be Amended or Removed. All takedown requests will be promptly acknowledged and investigated.

    Virginia Tech | University Libraries | Contact Us