Browsing by Author "Chen, Yinlin"
Now showing 1 - 20 of 29
Results Per Page
Sort Options
- Analyzing WARC on serverless computingChen, Yinlin (2021-06-15)
- Architecting a Cloud-native Data Analysis Application for ETDsChen, Yinlin; Fox, Edward A. (2018)In this paper, we present a Cloud-native data analysis application and its architecture. This application was developed for librarians to explore useful information from the ETDs preserved in the Virginia Tech digital repository - VTechWorks. We realized the Cloud-native concepts by architecting a serverless architecture with microservices and managed services as backend, and deployed the entire application on Amazon Web Services (AWS). We detail our architecture strategies, decisions we made, and the best practices we followed. Furthermore, we share the lessons learned and cloud benefits we have gained. We believe that our proposed approach could be adopted by other ETD systems, e.g., NDLTD, and could be of benefit to the broader community.
- Are Repositories Impeding Big Data Reuse?Xie, Zhiwu; Galad, Andrej; Chen, Yinlin; Fox, Edward A. (Virginia Tech, 2016-06-14)In this intentionally provocative presentation, we question the scalability of popular digital repositories and whether they are suitable for big data reuse. Are the layers of API these repositories have painted over file system primitives necessary? How essential is it for the repository to insist on being the sole manager of the content, and arranging files in ways to prevent access other than from their own APIs? We explore these questions from the perspective of big data reuse, and describe controlled reuse experiments against Fedora 4 to evaluate the cost of these practices.
- Building a Culture of Reuse: An Analysis of Reusable Software and Policies for Institutional LibrariesKinnaman, Alex; Chen, Yinlin (The Digital Curation Centre, 2022-06-14)This paper will present findings from a multi-case study on the need for and valuable assets of reusable software and policies for digital library infrastructures. This paper supports the conference theme of reusability. Curating for reuse is a strategy that should not be limited to digital assets, and can extend to digital library software, policy, infrastructure, and implementation. Specifically, we seek to understand how data curators utilize reusable digital library software and policies and how we at Virginia Tech University Libraries can improve the reusability of our resources in order to promote openness, transparency, and reusability.
- Building a full-stack Serverless Web application with React and AWSChen, Yinlin (2021-06-10)Serverless computing allows you to build Web applications without managing or maintaining servers. Using AWS, we can build and deploy responsive applications in the cloud with built-in high availability and flexible scaling capabilities. In this workshop, we will learn how to build a full-stack serverless Web application using React and several AWS services, including AWS Amplify, Lambda, AppSync, DynamoDB, etc. We’ll start the workshop with a quick overview of serverless computing and AWS, followed by creating a React application, integrating with AWS managed services and deploying this application in AWS. Workshop Agenda: Introduction to AWS, Serverless, AWS Amplify, and React Section 1: Create your first React application and setup AWS Amplify Section 2: Setup access controls for your application Section 3: Introduction to GraphQL and AWS AppSync Section 4: Perform data mutations for your application Section 5: Introduction to multiple development environments; Wrap-up and discussion.
- A Cloud-based Serverless Microservices Application for Digital PreservationChen, Yinlin; Kinnaman, Alex; Tuttle, James (2020-11-12)Virginia Tech University Libraries is developing a cloud-based, serverless, microservice application to support digital asset management, preservation, and access. This presentation will outline the balance of cost-effectiveness and creating a trustworthy platform while relying on the cloud.
- Code4AIChen, Yinlin (2023-03-14)
- Deep Learning Approach for Cell Nuclear Pore Detection and Quantification over High Resolution 3D DataHe, Chongyu (Virginia Tech, 2023-12-21)The intricate task of segmenting and quantifying cell nuclear pores in high-resolution 3D microscopy data is critical for cellular biology and disease research. This thesis introduces a deep learning pipeline crafted to automate the segmentation and quantification of nuclear pores from high-resolution 3D cell organelle images. Our aim is to refine computational methods capable of handling the data's complexity and size, thus improving accuracy and reducing manual labor in biological image analysis. The developed pipeline incorporates data preprocessing, augmentation strategies, random block sampling, and a three-stage post-processing algorithm. It utilizes a 3D U-Net with a VGG-16 backbone, optimized through cyclical data augmentation and random block sampling to tackle the challenges posed by limited labeled data and the processing of large-scale 3D images. The pipeline has demonstrated its capability to effectively learn and predict nuclear pore structures, achieving improvements in validation metrics compared to baseline models. Our experiments suggest that cyclical augmentation helps prevent overfitting, and random block sampling contributes to managing data imbalance. The post-processing phase successfully automates the quantification of nuclear pores without the need for manual intervention. The proposed pipeline offers an efficient and scalable approach to segmenting and quantifying nuclear pores in 3D microscopy images. Despite the ongoing challenges of computational intensity and data volume, the techniques developed in this study provide insights into the automation of complex biological image analysis tasks, with potential applications extending beyond the detection of nuclear pores.
- Efficient development and deployment of Hydra projects using VagrantChen, Yinlin (2016-10-03)
- End to end Serverless Digital Repository Platform on the CloudChen, Yinlin (2022-06-08)
- Ensemble PDP-8: Eight Principles for Distributed PortalsFox, Edward A.; Chen, Yinlin; Akbar, Monika; Shaffer, Clifford A.; Edwards, Stephen H.; Brusilovsky, Peter; Garcia, Daniel D.; Delcambre, Lois M. L.; Decker, Felicia; Archer, David W.; Furuta, Richard; Shipman, Frank M., III; Carpenter, B. Stephen, II; Cassel, Lillian N. (2010)Ensemble, the National Science Digital Library (NSDL) Pathways project for Computing, builds upon a diverse group of prior NSDL, DL-I, and other projects. Ensemble has shaped its activities according to principles related to design, development, implementation, and operation of distributed portals. Here we articulate 8 key principles for distributed portals (PDPs). While our focus is on education and pedagogy, we expect that our experiences will generalize to other digital library application domains. These principles inform, facilitate, and enhance the Ensemble R&D and production activities. They allow us to provide a broad range of services, from personalization to coordination across communities. The eight PDPs can be briefly summarized as: (1) Articulation across communities using ontologies. (2) Browsing tailored to collections. (3) Integration across interfaces and virtual environments. (4) Metadata interoperability and integration. (5) Social graph construction using logging and metrics. (6) Superimposed information and annotation integrated across distributed systems. (7) Streamlined user access with IDs. (8) Web 2.0 multiple social network system interconnection.
- Evaluating Cost of Cloud Execution in a Data RepositoryXie, Zhiwu; Chen, Yinlin; Griffin, Julie; Walters, Tyler (ACM, 2016-06)In this paper, we utilize a set of controlled experiments to benchmark the cost associated with the cloud execution of typical repository functions such as ingestion, fixity checking, and heavy data processing. We focus on the repository service pattern where content is explicitly stored away from where it is processed. We measured the processing speed and unit cost of each scenario using a large sensor dataset and Amazon Web Services (AWS). The initial results reveal three distinct cost patterns: 1) spend more to buy up to proportionally faster services; 2) more money does not necessarily buy better performance; and 3) spend less, but faster. Further investigations into these performance and cost patterns will help repositories to form a more effective operation strategy.
- A High-quality Digital Library Supporting Computing Education: The Ensemble ApproachChen, Yinlin (Virginia Tech, 2017-08-28)Educational Digital Libraries (DLs) are complex information systems which are designed to support individuals' information needs and information seeking behavior. To have a broad impact on the communities in education and to serve for a long period, DLs need to structure and organize the resources in a way that facilitates the dissemination and the reuse of resources. Such a digital library should meet defined quality dimensions in the 5S (Societies, Scenarios, Spaces, Structures, Streams) framework - including completeness, consistency, efficiency, extensibility, and reliability - to ensure that a good quality DL is built. In this research, we addressed both external and internal quality aspects of DLs. For internal qualities, we focused on completeness and consistency of the collection, catalog, and repository. We developed an application pipeline to acquire user-generated computing-related resources from YouTube and SlideShare for an educational DL. We applied machine learning techniques to transfer what we learned from the ACM Digital Library dataset. We built classifiers to catalog resources according to the ACM Computing Classification System from the two new domains that were evaluated using Amazon Mechanical Turk. For external qualities, we focused on efficiency, scalability, and reliability in DL services. We proposed cloud-based designs and applications to ensure and improve these qualities in DL services using cloud computing. The experimental results show that our proposed methods are promising for enhancing and enriching an educational digital library. This work received support from ACM, as well as the National Science Foundation under Grant Numbers DUE-0836940, DUE-0937863, and DUE-0840719, and IMLS LG-71-16-0037-16.
- Introducing AI for LAMs: A Beginner Tutorial for Practical Generative AI Use CasesChen, Yinlin (2023-11-15)Generative AI and Large Language Models (LLMs) are transforming various fields, including libraries, archives, and museums (LAMs). This workshop is specifically designed to introduce LAM professionals to the fundamentals of Generative AI and LLMs, utilizing hands-on applications through platforms and frameworks like OpenAI API, Hugging Face, LangChain, and more. Participants will benefit from practical exercises and tutorials, as well as an in-depth demonstration of selected real-world projects that underscore the transformative potential of AI. Moreover, the workshop will include a focused discussion session to foster brainstorming on strategies, methodologies, and the challenges of seamlessly integrating AI into traditional LAM environments. Guided by a University Libraries professor experienced in teaching "Introduction to Artificial Intelligence" in Computer Science courses to over five hundred students, this half-day workshop offers a blend of academic insight and practical expertise. Participants will gain hands-on experience with AI tools, learning to apply these emerging technologies creatively and efficiently. Tailored to LAM professionals curious about AI and its potential applications, the session serves as an insightful introduction and a comprehensive guide for those eager to augment services within the LAM settings.
- Librarian-in-the-Loop Deep Learning to Curate Very Large Biomedical Image DatasetsXie, Zhiwu; Chen, Yinlin (2024-02-01)We present a research data management project where librarians from University of California, Riverside and Virginia Tech are deeply embedded in a research team at Yale School of Medicine to directly answer specific research questions by applying AI/Deep Learning techniques to very large biomedical images. Leveraging library resources and expertise, we have developed a prototype pipeline that identifies nuclear pores from whole cell images captured at 8 nanometer resolution by a cutting edge microscope, in the hope to reveal the cellular mechanism of one type of epilepsy and autism. This project exemplifies out data management approach that strives to engage in much earlier stages of research, e.g., even during ideation and data collection, instead of waiting till most research activities are completed to "consult" or "advice" on the very general questions on data storage or preservation. This project also highlights the importance of non generative AI approaches, which have already been widely used as research tools in a much more mature manner.
- LLMs for Semantic Web QueryChen, Yinlin (2023-11-09)The emergence of Large Language Models like GPT-4 offers unprecedented capabilities in understanding human intent and generating text. This tutorial explores the intersection of LLMs and semantic web applications, focusing on how these models can automatically generate queries that adhere to metadata standards. Participants will engage in hands-on exercises that demonstrate the integration of LLMs into a sample semantic web application. This session will offer conceptual understanding and practical skills for metadata practitioners, developers, and researchers. The aim is to enable attendees to leverage the capabilities of LLMs in enhancing semantic web applications. Target audience: Metadata practitioners, developers, researchers, and those interested in Large Language Models Expected learning outcomes: Understand LLMs and their capabilities. Gain hands-on experience and learn to generate metadata-compliant queries using LLMs. Discuss potential applications and limitations of LLMs in the semantic web. Tutorial style: Presentation, demonstration, hands-on practice, discussion and Q&A Prior knowledge required: Basic familiarity with semantic web technologies, such as RDF or SPARQL Some basic Python programming skills Participants are recommended to have: A dual-monitor setup or two computers to more easily follow along with hands-on exercises while also watching the presentation
- Multi-tenancy Cloud Access and PreservationTuttle, James; Chen, Yinlin; Jiang, Tingting; Hunter, Lee; Waldren, Andrea; Ghosh, Soumik; Ingram, William A. (ACM, 2020-08)Virginia Tech Libraries has developed a cloud-native, microservervices-based digital libraries platform to consolidate diverse access and preservation infrastructure into a set of flexible, independent microservices in Amazon Web Services. We have been an implementer and contributor to various community digital library and repository projects including DSpace1, Fedora2, and Samvera3. However, the complexity and cost of maintaining disparate application stacks have reduced our capacity to build new infrastructure.
- A Multi-Tenancy Cloud-Native Digital Library PlatformChen, Yinlin; Ingram, William A.; Tuttle, James (2019-06-11)Virginia Tech Libraries presents our next generation digital library platform. Our design and implementation addresses the maintainability, sustainability, modularity, and scalability of a digital repository using a Cloud- native architecture, in which the entire platform is deployed in a cloud environment - Amazon Web Services (AWS). Our next-gen digital library eschews the old model of multiple siloed systems and embraces a common, sustainable infrastructure. This approach facilitates a more maintainable approach to managing and providing access to collections allowing us to focus on content and user experience. This platform is composed of a suite of microservices and cloud services. Microservices implemented as Lambda functions handle specific tasks and communicate with each other and other cloud services using lightweight asynchronous messaging. Cloud-native application development embodies the future of digital asset management and content delivery. Shared infrastructure throughout the stack and a clear demarcation between front- and back-end makes the platform more generalizable and supports independent replacement of components. We share our experiences and lessons learned developing this digital library platform, including architecture design, microservice implementation, cloud integration, best practices, and practical strategies and directions for developing a Cloud-native repository.
- On-Demand Big Data Analysis in Digital RepositoriesXie, Zhiwu; Chen, Yinlin; Jiang, Tingting; Griffin, Julie; Walters, Tyler; Tarazaga, Pablo Alberto; Kasarda, Mary E. (Springer International Publishing, 2015-12-18)We describe a use and reuse driven digital repository integrated with lightweight data analysis capabilities provided by the Docker framework. Using building sensor data collected from the Virginia Tech Goodwin Hall Living La- boratory, we perform evaluations using Amazon EC2 and Container Service with a Fedora 4 repository backed with storage in Amazon S3. The results con- firm the viability and benefits of this approach.
- ProtocolsSingh, Ajeet; Chen, Yinlin; Santhanam, Srinivasa; Zhu, Weihua (2009-10-09)This module addresses the concepts, development and implementation of digital library protocols and covers the roles of protocols in information retrieval systems (IR) and Service Oriented Architectures (SOA).