Active Learning for Microarray based Leukemia Classification
In machine learning, data labeling is assumed to be easy and cheap. However, in real word cases especially clinical field, data sets are rare and expensive to obtain. Active learning is an approach that can query the most informative data for the training. This leads to an alternative to deal with the concern mentioned above. The Sampling method is one of the key parts in active learning because it minimizes the training cost of the classifier. By different query method, models with considerable difference could be produced. The difference in model could lead to significant difference in training cost and final accuracy outcome. The approaches that were used to in this experiment is uncertainty sampling, diversity sampling and query by committee. In the experiment, active learning is applied on the microarray data with improving results. The classification on two types leukemia (acute myeloid leukemia and acute lymophoblastic leukemia) indicates a boost in accuracy with the same number of samples compared to passive machine learning. The experiments leads to the conclusion that with small number of samples with randomness in the field of leukemia classification, active learning produce an more active model. Additionally, active learning with query by committee finds the most informative sample with fewest trials.