Browsing by Author "Jiang, Tingting"
Now showing 1 - 6 of 6
Results Per Page
Sort Options
- ETDseer Concept PaperMa, Yufeng; Jiang, Tingting; Shrestha, Chandani (Virginia Tech, 2017-05-03)ETDSeer (electronic thesis and dissertation digital library connected with SeerSuite) will build on 15 years of collaboration between teams at Virginia Tech (VT) and Penn State University (PSU), since both have been leaders in the worldwide digital library (DL) community. VT helped launch the national and international efforts for ETDs more than 20 years ago, which have been led by the Networked Digital Library of Theses and Dissertations (NDLTD, directed by PI Fox); its Union Catalog has increased to 5 million records. PSU hosts CiteSeerX, which co-PI Giles launched almost 20 years ago, and which is connected with a wide variety of research results under the SeerSuite family. ETDs, typically in PDF, are a largely untapped international resource. Digital libraries with advanced services can effectively address the broad needs to discover and utilize ETDs of interest. Our research will leverage SeerSuite methods that have been applied mostly to short documents, plus a variety of exploratory studies at VT, and will yield a “web of graduate research”, rich knowledge bases, and a digital library with effective interfaces. References will be analyzed and converted to canonical forms, figures and tables will be recognized and re-represented for flexible searching, small sections (acknowledgments, biographical sketches) will be mined, and aids for researchers will be built especially from literature reviews and discussions of future work. Entity recognition and disambiguation will facilitate flexible use of a large graph of linked open data.
- Multi-tenancy Cloud Access and PreservationTuttle, James; Chen, Yinlin; Jiang, Tingting; Hunter, Lee; Waldren, Andrea; Ghosh, Soumik; Ingram, William A. (ACM, 2020-08)Virginia Tech Libraries has developed a cloud-native, microservervices-based digital libraries platform to consolidate diverse access and preservation infrastructure into a set of flexible, independent microservices in Amazon Web Services. We have been an implementer and contributor to various community digital library and repository projects including DSpace1, Fedora2, and Samvera3. However, the complexity and cost of maintaining disparate application stacks have reduced our capacity to build new infrastructure.
- On-Demand Big Data Analysis in Digital RepositoriesXie, Zhiwu; Chen, Yinlin; Jiang, Tingting; Griffin, Julie; Walters, Tyler; Tarazaga, Pablo Alberto; Kasarda, Mary E. (Springer International Publishing, 2015-12-18)We describe a use and reuse driven digital repository integrated with lightweight data analysis capabilities provided by the Docker framework. Using building sensor data collected from the Virginia Tech Goodwin Hall Living La- boratory, we perform evaluations using Amazon EC2 and Container Service with a Fedora 4 repository backed with storage in Amazon S3. The results con- firm the viability and benefits of this approach.
- Scaling IIIF Image Tiling in the CloudChen, Yinlin; Ghosh, Soumik; Jiang, Tingting; Tuttle, James (2020-02-17)The International Archive of Women in Architecture, established at Virginia Tech in 1985, collects books, biographical information, and published materials from nearly 40 countries that are divided into around 450 collections. In order to provide public access to these collections, we built an application using the IIIF APIs to pre-generate image tiles and manifests which are statically served in the AWS cloud. We established an automatic image processing pipeline using a suite of AWS services to implement microservices in Lambda and Docker. By doing so, we reduced the processing time for terabytes of images from weeks to days. In this article, we describe our serverless architecture design and implementations, elaborate the technical solution on integrating multiple AWS services with other techniques into the application, and describe our streamlined and scalable approach to handle extremely large image datasets. Finally, we show the significantly improved performance compared to traditional processing architectures along with a cost evaluation.
- Solr Project with IDEAL, in CS5604 (Information Storage and Retrieval)Xia, Long; Jiang, Tingting; Galad, Andrej; Maharshi, Shivam (2016-05-04)This submission describes the work of the Solr team as part of the IDEAL project with the main goal of designing and developing a distributed search infrastructure. It includes the project reports, final presentations, as well as the solutions (configuration files & Java code) developed. The main responsibility of our team was to configure Near Real Time Indexing and implement Custom Ranking for tweets and web page collections. The idea behind NRT Indexing is to help perform incremental updates from an HBase table into the Solr index, thereby optimizing time utilized and compute resources. The main motivation behind the Custom Ranking solution is to improve system precision and recall by transforming user queries with the use of the metadata provided by the other teams. The implementation leverages these three techniques: Query Expansion, Psuedo Relevance Feedback and Query Boosting. Throughout the semester we closely collaborated with several other teams both in getting requirements and the input data.
- VTechData: An Institutional Data RepositoryXie, Zhiwu; Griffin, Julie; Chen, Yinlin; Jiang, Tingting; Brittle, Collin; Mather, Paul (2016-06-14)We introduce VTechData, a Sufia/Fedora based institutional repository specifically implemented to meet the needs of research data management at Virginia Tech. Despite the rapid maturity of Hydra and Fedora code bases, the gaps between the released packages and a launched productionlevel service are still many and far from trivial. In this presentation we describe the strategy and efforts through which these gaps were filled and lessons learned in the process of creating our first Hydra/Sufiabased repository.