Browsing by Author "Wang, Xuan"
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- Accepted Tutorials at The Web Conference 2022Tommasini, Riccardo; Basu Roy, Senjuti; Wang, Xuan; Wang, Hongwei; Ji, Heng; Han, Jiawei; Nakov, Preslav; Da San Martino, Giovanni; Alam, Firoj; Schedl, Markus; Lex, Elisabeth; Bharadwaj, Akash; Cormode, Graham; Dojchinovski, Milan; Forberg, Jan; Frey, Johannes; Bonte, Pieter; Balduini, Marco; Belcao, Matteo; Della Valle, Emanuele; Yu, Junliang; Yin, Hongzhi; Chen, Tong; Liu, Haochen; Wang, Yiqi; Fan, Wenqi; Liu, Xiaorui; Dacon, Jamell; Lye, Lingjuan; Tang, Jiliang; Gionis, Aristides; Neumann, Stefan; Ordozgoiti, Bruno; Razniewski, Simon; Arnaout, Hiba; Ghosh, Shrestha; Suchanek, Fabian; Wu, Lingfei; Chen, Yu; Li, Yunyao; Liu, Bang; Ilievski, Filip; Garijo, Daniel; Chalupsky, Hans; Szekely, Pedro; Kanellos, Ilias; Sacharidis, Dimitris; Vergoulis, Thanasis; Choudhary, Nurendra; Rao, Nikhil; Subbian, Karthik; Sengamedu, Srinivasan; Reddy, Chandan; Victor, Friedhelm; Haslhofer, Bernhard; Katsogiannis- Meimarakis, George; Koutrika, Georgia; Jin, Shengmin; Koutra, Danai; Zafarani, Reza; Tsvetkov, Yulia; Balachandran, Vidhisha; Kumar, Sachin; Zhao, Xiangyu; Chen, Bo; Guo, Huifeng; Wang, Yejing; Tang, Ruiming; Zhang, Yang; Wang, Wenjie; Wu, Peng; Feng, Fuli; He, Xiangnan (ACM, 2022-04-25)This paper summarizes the content of the 20 tutorials that have been given at The Web Conference 2022: 85% of these tutorials are lecture style, and 15% of these are hands on.
- Applications of Machine Learning in Source Attribution and Gene Function PredictionChinnareddy, Sandeep (Virginia Tech, 2024-06-07)This research investigates the application of machine learning techniques in computational genomics across two distinct domains: (1) the predicting the source of bacterial pathogen using whole genome sequencing data, and (2) the functional annotation of genes using single- cell RNA sequencing data. This work proposes the development of a bioinformatics pipeline tailored for identifying genomic variants, including gene presence/absence and single nu- cleotide polymorphism. This methodology is applied to specific strains such as Salmonella enterica serovar Typhimurium and the Ralstonia solanacearum species complex. Phylo- genetic analyses along with pan-genome and positive selection studiesshow that genomic variants and evolutionary patterns of S. Typhimurium vary across sources, which suggests that sources can be accurately attributed based on genomic variants empowered by machine learning. We benchmarked seven traditional machine learning algorithms, achieving a no- table accuracy of 94.6% in host prediction for S. Typhimurium using the Random Forest model, underscored by SHAP value analyses which elucidated key predictive features. Next, the focus is shifted to the prediction of Gene Ontology terms for Arabidopsis genes using single-cell RNA-seq data. This analysis offers a detailed comparison of gene expression in root versus shoot tissues, juxtaposed with insights from bulk RNA-seq data. The integration of regulatory network data from DAP-seq significantly enhances the prediction accuracy of gene functions.
- Identifying sensors-based parameters associated with fall risk in community-dwelling older adults: an investigation and interpretation of discriminatory parametersWang, Xuan; Cao, Junjie; Zhao, Qizheng; Chen, Manting; Luo, Jiajia; Wang, Hailiang; Yu, Lisha; Tsui, Kwok-Leung; Zhao, Yang (2024-02-01)Background: Falls pose a severe threat to the health of older adults worldwide. Determining gait and kinematic parameters that are related to an increased risk of falls is essential for developing effective intervention and fall prevention strategies. This study aimed to investigate the discriminatory parameter, which lay an important basis for developing effective clinical screening tools for identifying high-fall-risk older adults. Methods: Forty-one individuals aged 65 years and above living in the community participated in this study. The older adults were classified as high-fall-risk and low-fall-risk individuals based on their BBS scores. The participants wore an inertial measurement unit (IMU) while conducting the Timed Up and Go (TUG) test. Simultaneously, a depth camera acquired images of the participants’ movements during the experiment. After segmenting the data according to subtasks, 142 parameters were extracted from the sensor-based data. A t-test or Mann-Whitney U test was performed on the parameters for distinguishing older adults at high risk of falling. The logistic regression was used to further quantify the role of different parameters in identifying high-fall-risk individuals. Furthermore, we conducted an ablation experiment to explore the complementary information offered by the two sensors. Results: Fifteen participants were defined as high-fall-risk individuals, while twenty-six were defined as low-fall-risk individuals. 17 parameters were tested for significance with p-values less than 0.05. Some of these parameters, such as the usage of walking assistance, maximum angular velocity around the yaw axis during turn-to-sit, and step length, exhibit the greatest discriminatory abilities in identifying high-fall-risk individuals. Additionally, combining features from both devices for fall risk assessment resulted in a higher AUC of 0.882 compared to using each device separately. Conclusions: Utilizing different types of sensors can offer more comprehensive information. Interpreting parameters to physiology provides deeper insights into the identification of high-fall-risk individuals. High-fall-risk individuals typically exhibited a cautious gait, such as larger step width and shorter step length during walking. Besides, we identified some abnormal gait patterns of high-fall-risk individuals compared to low-fall-risk individuals, such as less knee flexion and a tendency to tilt the pelvis forward during turning.
- Modeling geomagnetic induction in submarine cablesChakraborty, Shibaji; Boteler, David H.; Shi, Xueling; Murphy, Benjamin S.; Hartinger, Michael D.; Wang, Xuan; Lucas, Greg; Baker, Joseph B. H. (Frontiers, 2022-10)Submarine cables have become a vital component of modern infrastructure, but past submarine cable natural hazard studies have mostly focused on potential cable damage from landslides and tsunamis. A handful of studies examine the possibility of space weather effects in submarine cables. The main purpose of this study is to develop a computational model, using Python, of geomagnetic induction on submarine cables. The model is used to estimate the induced voltage in the submarine cables in response to geomagnetic disturbances. It also utilizes newly acquired knowledge from magnetotelluric studies and associated investigations of geomagnetically induced currents in power systems. We describe the Python-based software, its working principle, inputs/outputs based on synthetic geomagnetic field data, and compare its operational capabilities against analytical solutions. We present the results for different model inputs, and find: 1) the seawater layer acts as a shield in the induction process: the greater the ocean depth, the smaller the seafloor geoelectric field; and 2) the model is sensitive to the Ocean-Earth layered conductivity structure.
- OntoType: Ontology-Guided and Pre-Trained Language Model Assisted Fine-Grained Entity TypingKomarlu, Tanay; Jiang, Minhao; Wang, Xuan; Han, Jiawei (ACM, 2024-08-25)Fine-grained entity typing (FET), which assigns entities in text with context-sensitive, fine-grained semantic types, is a basic but important task for knowledge extraction from unstructured text. FET has been studied extensively in natural language processing and typically relies on human-annotated corpora for training, which is costly and difficult to scale. Recent studies explore the utilization of pre-trained language models (PLMs) as a knowledge base to generate rich and context-aware weak supervision for FET. However, a PLM still requires direction and guidance to serve as a knowledge base as they often generate a mixture of rough and fine-grained types, or tokens unsuitable for typing. In this study, we vision that an ontology provides a semantics-rich, hierarchical structure, which will help select the best results generated by multiple PLM models and head words. Specifically, we propose a novel annotation-free, ontology-guided FET method, OntoType, which follows a type ontological structure, from coarse to fine, ensembles multiple PLM prompting results to generate a set of type candidates, and refines its type resolution, under the local context with a natural language inference model. Our experiments on the Ontonotes, FIGER, and NYT datasets using their associated ontological structures demonstrate that our method outperforms the state-of-the-art zero-shot fine-grained entity typing methods as well as a typical LLM method, ChatGPT. Our error analysis shows that refinement of the existing ontology structures will further improve fine-grained entity typing.
- Temporal Topic Embeddings with a CompassPalamarchuk, Daniel Andrew (Virginia Tech, 2024-05-22)Aligning Word2vec word embeddings using a compass in a system of Compass-aligned Distributional Embeddings (CADE) creates stable and accurate temporal word embeddings. This thesis seeks to expand the CADE framework into the area of dynamic topic modeling (DTM), where temporal word2vec embeddings can be used to describe temporally and unsupervised evolving topics. It also seeks to improve upon the CADE framework through a theoretical and experimental exploration of compass parameters, cluster and topic generation techniques, and topic descriptor creation. This method of Temporal Topic Embeddings with a Compass (TTEC) will be compared to other DTM techniques in the ability to create coherent and diverse clusters and will be shown to be competitive compared to traditional and transformer-aided DTM architectures. In addition to a qualitative discussion of results, there will be a political theoretical overview of the nature of this technique and potential use cases, with interviews from political actors of various backgrounds as to how the technique and machine learning as a whole can be used in the organizational setting.
- Towards Generalizable Information Extraction with Limited SupervisionWang, Sijia (Virginia Tech, 2024-09-18)Supervised approaches, especially those employing deep neural networks, have showcased impressive performance, relying on a significant volume of manual annotations. However, their effectiveness encounters challenges when attempting to generalize to new languages, domains, or types, particularly in the absence of sufficient annotations. Current methods fall short in effectively addressing information extraction (IE) under limited supervision. In this dissertation, we approach information extraction with limited supervision from three perspectives. Firstly, we refine the previous classification-based extraction paradigm by introducing a query-and-extract framework, which uses target information as natural language queries to extract candidate information from the input text. Additionally, we leverage the excellent generation capability of large language models (LLMs) to produce high-quality annotation data, enriching IE semantics within limited annotation data. We also utilize LLMs' instruction-following capability to iteratively refine and optimize solutions through a debating process. Beyond text-only IE, we define a new multimodal IE task that links an entity mention within heterogeneous information sources to a knowledge base with limited annotation data. We demonstrate that excellent multimodal IE performance can be achieved, even with limited annotation data, by leveraging monomodal external information. These combined efforts aim to make optimal use of limited knowledge, ensuring more robust and generalizable solutions.