14:17:03 Thank you for coming to my presentation is the IR storage or showcase. I'm Gail McMillan the director of Scholarly Communication at Virginia Tech's University Libraries. 14:17:14 For a while now I've been thinking about how to show the strength of my repository, VTechWorks. Today I'm going to share with you my ideas and how I put them into practice. 14:17:25 My goal with this presentation is to share my hypothesis with you, as well as my findings in the hope that some of you will be interested in testing this method at your institutions. 14:17:38 So that we start on a common ground, let's use the director of the Coalition for Network Information, Clifford Lynch's definition of an IR: A set of services that a university offers to the members of its community for the management and dissemination of 14:17:54 digital materials created by the institution and its community members. He added, a mature and fully realized institutional repository will contain the intellectual works of the faculty and students both research and teaching materials, and also documentation 14:18:12 of the activities at the institution itself in the form of recordings of events and performance, and the ongoing intellectual life of the institution. 14:18:24 Institutional repositories are often measured by data points, such as their size or who deposits, or how much has been deposited or how many times items in the IR have been used. These data, however, do not put the IR in a larger context, and they don't facilitate 14:18:42 comparing repositories across institutions. 14:18:46 I think we may be better stewards of our repositories if we evaluate how well the IR reflects or represents its host institution, that is, put the repository in the context of the institution. 14:19:02 VTechWorks is the institutional repository for Virginia Tech, a PhD granting institution with about 37,000 students and 2000 faculty. VTechWorks is populated by the community, including faculty deposits, either directly or through integrated systems like 14:19:21 Elements. By graduate students Electronic Thesis and Dissertations from the Graduate SchoolÕs online approval system, and through some courses that require students deposit their final projects. 14:19:34 VTechWorks staff also deposit formally and informally, for example, publications they read about in the daily online news or handling SWORD deposited articles. 14:19:48 I used vocabularies to try to create, to try to determine if the microcosm created by the vocabulary could show that the IR was a subset of the whole, and an accurate reflection of the university, 14:20:03 its intellectual works as well as its activities. 14:20:07 I compiled three vocabularies from academic and community sources, as well as consulting with community members. 14:20:14 While I tried to be comprehensive, you can be the judge of that. 14:20:19 Biases are bound to have crept in, but I intentionally did not exclude terms that are now offensive or outdated. A Progressive Style Guide by SumOfUs was very helpful. 14:20:32 The vocabularies I compile enabled me to create virtual microcosms of LGBTQ+, Latinx and Indigenous People. 14:20:41 I searched these vocabularies and VTechWorks and my university's website in February and March of this year, when there were about 84,500 items in 14:20:51 VTechWorks. 14:20:55 While I was at it, I decided to search a little deeper in the repository to see what I could find about who was using these terms. So I searched each vocabulary in the graduate studentsÕ ETDs, electronic thesis and dissertations collection 14:21:10 and in collections of faculty research publications. ETDs form the foundation of many institutional repositories, and ETDs can provide an important lens on how the IR presents the work that is being done across the university. 14:21:27 VTechWorks doesn't have faculty research collection per se, so for this study, I combined to three collections to make a virtual faculty research collection: of articles supported by the library's open access subvention fund, articles deposited largely through 14:21:44 the SWORD protocol from Biomed Central and SpringerOpen, Hindawi, and MDPI. 14:21:51 This collection also includes journals published by Virginia Tech publishing, and Works voluntarily deposited by faculty from the Symplectic Elements 14:22:04 Faculty Activity Reporting System. Since there are wide variations in the size of these collections as well as their ages, I decided to compare the percentage of hits in each collection, rather than the number of hits 14:22:18 A complicating factor was that VTechWorks is included in the search at my institutions website, vt.edu. 14:22:29 To eliminate this duplication, I searched Virginia Tech and VTechWorks directly through Google indirectly through Google. 14:22:37 You can see here how I formed my search my Google search strategies. 14:22:41 However, while Google has indexed VTechWorks for many years, the same strategy did not work to search deeper into the repository collections. Therefore I searched directly within VTechWorks to compare graduate studentsÕ ETDs and faculty research. 14:22:58 Yet another complicating factor was the fact that VTechWorks runs on DSpace, which uses the SOLR search platform. This means that there can be fuzzy or close matches. 14:23:11 For example, when I searched First Nations. 14:23:15 It included the First National Bank. 14:23:19 Solar also stems. That is, it expands words with common endings to include plurals, past tenses and the like. So when I searched, for example, Dinˇ, dining was included. 14:23:32 Now let me share with you the results of some of my searches. The LGBTQ+ vocabulary was drawn from the county of San Mateo, CaliforniaÕs LGBTQ Glossary. 14:23:46 The University of Massachusetts Amherst Stonewall CenterÕs LGBTQIA Terminology, and the Virginia Tech Safe Zone Training 101 Core Vocabulary. 14:23:58 To give you some context, at Virginia Tech 111 at these terms got about 10,000 hits. In VTechWorks, 130 terms got nearly 8000 hits. 14:24:12 89.6% of the terms had less than 1% difference in the percentage of hits between Virginia Tech and VTechWorks. While 96.3% had less than 2% difference. 14:24:28 Here are the 13 terms that had more than 1% difference. Denim Day is a special occasion to show support for gay rights, and Ex Lapide is the LGBTQ+ alumni and allies association. The purple terms appeared more frequently in 14:24:45 VTechWorks, and the black terms had more hits in Virginia Tech 14:24:54 VTechWorks direct searches revealed another problem. Three 14:24:59 terms got thousands of hits each because they were used in the Virginia Cooperative ExtensionÕs non-discrimination statement, which appears in many of the nearly 8000 VCE publications. 14:25:13 Therefore, I removed sexual orientation, gender identity and gender expression from the comparison in the VTechWorks direct searches. 14:25:22 I did not remove them from the Google searches because while their hit numbers were among the highest, hits for these terms were not out of line with the other big hit terms such as lesbian and asexual. 14:25:38 Looking at the ETDs and faculty research collections 12 broad terms got the most hits in both of these collections: Bisexual, gender equity, gender bias, gender equality, gender inequality, gender neutral, heterosexual, homosexual, lesbian, sexism, transgender 14:26:02 and underrepresented groups. Asexual was a real outlier as you can see in the word cloud 14:26:07 getting 8.6% of the hits in faculty research, but only 0.1% in the ETD search. 14:26:17 The Indigenous People vocabulary came largely from the Native American Tribes of Virginia website, and the Virginia Tech University CouncilÕs resolution to observe Indigenous Peoples Day. The Indigenous People vocabulary originally contained 145 terms and 14:26:35 phrases focusing on people, organizations and activities. 14:26:40 I removed named tribes and nations, and for comparison's sake, just used their tribal names, that is Powhatan not Powhatan Nation, Mattaponi not Mattaponi Indians, and Chickahominy not Chickahominy Tribe. 14:26:57 This reduced the vocabulary for comparison to these 87 terms 14:27:03 Here to you will see some terms are weighted in bias and some are offensive or outdated. 14:27:10 Those appeared in frequently, getting only 2.1% of the total hits in VTechWorks. 14:27:19 Among the terms with the largest differences were five tribes, but interestingly, only two of them are local tribesŃMonacan and Tutelo, and 2 local initiatives, the American Indian and Indigenous Community Center is a gathering place 14:27:36 in the student union, and Nativer@VT is an organization dedicated to enhancing the visibility of American Indians and other indigenous people and to raise awareness of the issues that confront them. 14:27:51 Turning to VTechWorks direct searches, you can see that there are striking similarities between the faculty research, and ETD collections. 14:28:01 The top two terms with the most hits were also the same for both graduate students and faculty researchers. They were Native Americans and American Indian. 14:28:13 Turning now to the third microcosm, Latinx. The Latinx vocabulary had 75 terms, all of which appeared in VTEchWorks, and 70 appeared in the Virginia Tech website. 14:28:28 Some of these terms are also waited and bias and some are offensive and outdated. 14:28:36 From this word cloud you can see the terms with the biggest differences in percentage of hits in the Google searches between Virginia Tech, the black terms, and VTEchWorks the purple terms. But, we have not yet determined what constitutes a big difference. 14:28:54 The four terms with more than 2% difference where Espa–ol, immigration policy, Mexican American, and undocumented students. 14:29:04 94.7% of the terms had less than 2% difference. 14:29:11 In the direct VTechWorks searches, the 14 terms used the most in ETDs got 88% of the total hits. 14:29:19 13 of the top 14 hits were also in the terms most used in faculty research. In the table on the right, you can see that nine out of 13 terms were used more percentage-wise by faculty them by graduate students. 14:29:37 We began with a quote from Clifford Lynch and I thought we ended with one from him also: 14:29:44 ŅAt the most basic and fundamental level and institutional repository is a recognition that the intellectual life and scholarship of our universities will increasingly be represented, documented and shared in digital 14:30:00 form. 14:30:02 And that a primary responsibility of our universities, is to exercise stewardship over these riches, both to make them available and preserve them.Ó 14:30:12 Look at these microcosms and you see how closely matched their vocabularies are when searched at the university website and the digital repository. 14:30:22 I think that an average match of 95%, with less than 2% different speaks very well for VTechWorks accurately reflecting its institutional host, Virginia Tech. 14:30:37 If this were a live presentation, 14:30:39 I'd be conducting a poll now, because I want to know whether you think I've suggested an appropriate way to evaluate our institutional repositories. 14:30:47 I'd be asking what percentage of match would determine that the IR effectively reflects its host institution, and I'd be asking if you'd be interested in doing a similar study of your IR and sharing your results. Instead, IÕll just ask you to think about these questions and email me at gailmac@vt.edu with your reflections. Thank you for listening. I hope to hear from you about your thoughts on this methodology and my findings for these microcosms.