Browsing by Author "Mahajan, Yash"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
- Integration and Implementation (INT) CS 5604 F2020Hicks, Alexander; Thazhath, Mohit; Gupta, Suraj; Long, Xingyu; Poland, Cherie; Hsieh, Hsinhan; Mahajan, Yash (Virginia Tech, 2020-12-18)The first major goal of this project is to build a state-of-the-art information storage, retrieval, and analysis system that utilizes the latest technology and industry methods. This system is leveraged to accomplish another major goal, supporting modern search and browse capabilities for a large collection of tweets from the Twitter social media platform, web pages, and electronic theses and dissertations (ETDs). The backbone of the information system is a Docker container cluster running with Rancher and Kubernetes. Information retrieval and visualization is accomplished with containers in a pipelined fashion, whether in the cluster or on virtual machines, for Elasticsearch and Kibana, respectively. In addition to traditional searching and browsing, the system supports full-text and metadata searching. Search results include facets as a modern means of browsing among related documents. The system supports text analysis and machine learning to reveal new properties of collection data. These new properties assist in the generation of available facets. Recommendations are also presented with search results based on associations among documents and with logged user activity. The information system is co-designed by five teams of Virginia Tech graduate students, all members of the same computer science class, CS 5604. Although the project is an academic exercise, it is the practice of the teams to work and interact as though they are groups within a company developing a product. The teams on this project include three collection management groups -- Electronic Theses and Dissertations (ETD), Tweets (TWT), and Web-Pages (WP) -- as well as the Front-end (FE) group and the Integration (INT) group to help provide the overarching structure for the application. This submission focuses on the work of the Integration (INT) team, which creates and administers Docker containers for each team in addition to administering the cluster infrastructure. Each container is a customized application environment that is specific to the needs of the corresponding team. Each team will have several of these containers set up in a pipeline formation to allow scaling and extension of the current system. The INT team also contributes to a cross-team effort for exploring the use of Elasticsearch and its internally associated database. The INT team administers the integration of the Ceph data storage system into the CS Department Cloud and provides support for interactions between containers and the Ceph filesystem. During formative stages of development, the INT team also has a role in guiding team evaluations of prospective container components and workflows. The INT team is responsible for the overall project architecture and facilitating the tools and tutorials that assist the other teams in deploying containers in a development environment according to mutual specifications agreed upon with each team. The INT team maintains the status of the Kubernetes cluster, deploying new containers and pods as needed by the collection management teams as they expand their workflows. This team is responsible for utilizing a continuous integration process to update existing containers. During the development stage the INT team collaborates specifically with the collection management teams to create the pipeline for the ingestion and processing of new collection documents, crossing services between those teams as needed. The INT team develops a reasoner engine to construct workflows with information goal as input, which are then programmatically authored, scheduled, and monitored using Apache Airflow. The INT team is responsible for the flow, management, and logging of system performance data and making any adjustments necessary based on the analysis of testing results. The INT team has established a Gitlab repository for archival code related to the entire project and has provided the other groups with the documentation to deposit their code in the repository. This repository will be expanded using Gitlab CI in order to provide continuous integration and testing once it is available. Finally, the INT team will provide a production distribution that includes all embedded Docker containers and sub-embedded Git source code repositories. The INT team will archive this distribution on the Virginia Tech Docker Container Registry and deploy it on the Virginia Tech CS Cloud. The INT-2020 team owes a sincere debt of gratitude to the work of the INT-2019 team. This is a very large undertaking and the wrangling of all of the products and processes would not have been possible without their guidance in both direct and written form. We have relied heavily on the foundation they and their predecessors have provided for us. We continue their work with systematic improvements, but also want to acknowledge their efforts Ibid. Without them, our progress to date would not have been possible.
- PRADA-TF: Privacy-Diversity-Aware Online Team FormationMahajan, Yash (Virginia Tech, 2021-06-14)In this work, we propose a PRivAcy-Diversity-Aware Team Formation framework, namely PRADA-TF, that can be deployed based on the trust relationships between users in online social networks (OSNs). Our proposed PRADA-TF is mainly designed to reflect team members' domain expertise and privacy preserving preferences when a task requires a wide range of diverse domain expertise for its successful completion. The proposed PRADA-TF aims to form a team for maximizing its productivity based on members' characteristics in their diversity, privacy preserving, and information sharing. We leveraged a game theory called Mechanism Design in order for a mechanism designer as a team leader to select team members that can maximize the team's social welfare, which is the sum of all team members' utilities considering team productivity, members' privacy preserving, and potential privacy loss caused by information sharing. To screen a set of candidate teams in the OSN, we built an expert social network based on real co-authorship datasets (i.e., Netscience) with 1,590 scientists, used the semi-synthetic datasets to construct a trust network based on a belief model called Subjective Logic, and identified trustworthy users as candidate team members. Via our extensive simulation experiments, we compared the seven different TF schemes, including our proposed and existing TF algorithms, and analyzed the key factors that can significantly impact the expected and actual social welfare, expected and actual potential privacy leakout, and team diversity of a selected team.
- Privacy-Preserving and Diversity-Aware Trust-based Team Formation in Online Social NetworksMahajan, Yash; Cho, Jin-Hee; Chen, Ing-Ray (ACM, 2024-07)As online social networks (OSNs) become more prevalent, a new paradigm for problem-solving through crowd-sourcing has emerged. By leveraging the OSN platforms, users can post a problem to be solved and then form a team to collaborate and solve the problem. A common concern in OSNs is how to form effective collaborative teams, as various tasks are completed through online collaborative networks. A team's diversity in expertise has received high attention to producing high team performance in developing team formation (TF) algorithms. However, the effect of team diversity on performance under different types of tasks has not been extensively studied. Another important issue is how to balance the need to preserve individuals' privacy with the need to maximize performance through active collaboration, as these two goals may conflict with each other. This research has not been actively studied in the literature. In this work, we develop a team formation (TF) algorithm in the context of OSNs that can maximize team performance and preserve team members' privacy under different types of tasks. Our proposed PRivAcy-Diversity-Aware Team Formation framework, called PRADA-TF, is based on trust relationships between users in OSNs where trust is measured based on a user's expertise and privacy preference levels. The PRADA-TF algorithm considers the team members' domain expertise, privacy preferences, and the team's expertise diversity in the process of team formation. Our approach employs game-theoretic principles Mechanism Design to motivate self-interested individuals within a team formation context, positioning the mechanism designer as the pivotal team leader responsible for assembling the team. We use two real-world datasets (i.e., Netscience and IMDb) to generate different semi-synthetic datasets for constructing trust networks using a belief model (i.e., Subjective Logic) and identifying trustworthy users as candidate team members. We evaluate the effectiveness of our proposed PRADA-TF scheme in four variants against three baseline methods in the literature. Our analysis focuses on three performance metrics for studying OSNs: social welfare, privacy loss, and team diversity.
- Privacy-Preserving and Diversity-Aware Trust-based Team Formation in Online Social NetworksMahajan, Yash; Guo, Zhen; Cho, Jin-Hee; Chen, Ing-Ray (2023-02)As online social networks (OSNs) become more prevalent, a new paradigm for problem solving through crowdsourcing has emerged. By leveraging the OSN platforms, users can post a problem to be solved and then form a team to collaborate and solve the problem. A common concern in OSNs is how to form effective collaborative teams, as various tasks are completed through online collaborative networks. A team’s diversity in expertise has received high attention to producing high team performance in developing team formation (TF) algorithms. However, the effect of team diversity on performance under different types of tasks has not been extensively studied. Another important issue is how to balance the need to preserve individuals’ privacy with the need to maximize performance through active collaboration, as these two goals may conflict with each other. This research has not been actively studied in the literature. In this work, we develop a team formation (TF) algorithm in the context of OSNs that can maximize team performance and preserve team members’ privacy under different types of tasks. Our proposed PRivAcy-Diversity-Aware Team Formation framework, called PRADA-TF, is based on trust relationships between users in OSNs where trust is measured based on a user’s expertise and privacy preference levels. The PRADA-TF algorithm considers the team members’ domain expertise, privacy preferences, and the team’s expertise diversity in the process of team formation. We leverage Mechanism Design as a game-theoretic technique in which the mechanism designer plays the role of team leader in forming a team. We use two realworld datasets (i.e., Netscience and IMDb) to generate different semi-synthetic datasets for constructing trust networks using a belief model (i.e., Subjective Logic) and identifying trustworthy users as candidate team member. We evaluate the effectiveness of our proposed PRADA-TF scheme in four variants against three baseline methods in the literature. Our analysis focuses on three performance metrics used in the study of OSNs: social welfare, privacy loss, and team diversity.