ETDseer Concept Paper

TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


ETDSeer (electronic thesis and dissertation digital library connected with SeerSuite) will build on 15 years of collaboration between teams at Virginia Tech (VT) and Penn State University (PSU), since both have been leaders in the worldwide digital library (DL) community. VT helped launch the national and international efforts for ETDs more than 20 years ago, which have been led by the Networked Digital Library of Theses and Dissertations (NDLTD, directed by PI Fox); its Union Catalog has increased to 5 million records. PSU hosts CiteSeerX, which co-PI Giles launched almost 20 years ago, and which is connected with a wide variety of research results under the SeerSuite family.

ETDs, typically in PDF, are a largely untapped international resource. Digital libraries with advanced services can effectively address the broad needs to discover and utilize ETDs of interest. Our research will leverage SeerSuite methods that have been applied mostly to short documents, plus a variety of exploratory studies at VT, and will yield a “web of graduate research”, rich knowledge bases, and a digital library with effective interfaces. References will be analyzed and converted to canonical forms, figures and tables will be recognized and re-represented for flexible searching, small sections (acknowledgments, biographical sketches) will be mined, and aids for researchers will be built especially from literature reviews and discussions of future work. Entity recognition and disambiguation will facilitate flexible use of a large graph of linked open data.



deep learning, domain independent digital library, information extraction (IE), information retrieval, natural language processing (NLP), NDLTD, CiteSeerX