Paleontology Topic Trends

dc.contributor.authorWilson, Jamesen
dc.contributor.authorMartin, Josephen
dc.contributor.authorCruz, Rudyen
dc.contributor.authorWeiler, Ericen
dc.date.accessioned2018-05-11T01:10:30Zen
dc.date.available2018-05-11T01:10:30Zen
dc.date.issued2018-04-03en
dc.description.abstractThe purpose of the project was to run modern data analysis on abstracts created by the Society of Vertebrate Paleontology. The Society of Vertebrate Paleontology has a yearly convention in which members from all over the world gather together and present their studies from the appropriate year. Our client, Professor Sterling Nesbit, provided our group with a collection of abstracts dating back to 1987. Our job was to take all of the abstracts from each year and run analyses to see the trends and patterns spanning over all the years that the Society of Vertebrate Paleontology had been publishing abstracts in collections. The method the team has employed changed throughout the span of the project. In the beginning, the team planned on using Latent Dirichlet Allocation or LDA to summarize the abstracts. This would find the topics prevalent in the collection, and show the mix of those topics found in each of the abstracts. After further discussion with our client, the team decided on providing more straightforward analysis, based off graphing hierarchies in the abstracts. In order to properly run the graphing analysis on the abstracts our team had to scrape the abstracts to ensure the most useful data was not overlooked in the analysis. The process of scraping the abstracts began with removing all the hypertext markup tags from the abstract text files (which were converted from PDF). Then the team eliminated any English stop words in the text files to remove words that are not commonly needed for analysis. The next step was to customize and add words to this list of stop words, based on yearly differences. For example, in some years the Society of Vertebrate Paleontology required its members to create their abstracts referencing the United States as “The United States of America” while in other years they were required to reference it as “United States.” These slight changes required our team to alter our method of stop word elimination to be specific to each year. Once the scraping was done, the team created graphing scripts to produce graphs based off Vertebrate Paleontology hierarchies. After meeting with our client multiple times to further refine our analysis, we created the final analysis script version. These graphs helped our client visualize the patterns in findings made by the Society of Vertebrate Paleontology. The project should be further developed to automatically extract abstracts from the convention’s PDF collection, as well as some sort of update to stop words based off of the society’s yearly modifications.en
dc.description.notesFiles And Descriptions 1. PaleontologyTopicTrendsReport.pdf: This is our main report 2. PaleontologyTopicTrendsPresentation.pptx: This is our final presentation of our report 3. PaleontologyTopicTrendsFigures.zip: This is a zip file containing our figures generated in our project 4. PaleontologyTopicTrendsOther.zip: This is a zip file containing our script files, as well as the skeleton directory for the abstracts, as well as some examples from our code and its usage. 5. PaleontologyTopicTrendsReport.docx: Our main report in .docx form 6. PaleontologyTopicTrendsPresenation.pdf: This our final presentation in .pdf form.en
dc.identifier.urihttp://hdl.handle.net/10919/83210en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectPaleontologyen
dc.subjectWord Cloudsen
dc.subjectTopic Analysisen
dc.subjectPythonen
dc.subjectData Analysisen
dc.subjectGraphical Analysisen
dc.titlePaleontology Topic Trendsen
dc.typeDataseten
dc.typePresentationen
dc.typeReporten
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 6
Name:
PaleontologyTopicTrendsFigures.zip
Size:
570.22 KB
Format:
Name:
PaleontologyTopicTrendsPresentation.pptx
Size:
324.07 KB
Format:
Microsoft Powerpoint XML
Name:
PaleontologyTopicTrendsOther.zip
Size:
9.06 KB
Format:
Loading...
Thumbnail Image
Name:
PaleontologyTopicTrendsPresentation.pdf
Size:
335.28 KB
Format:
Adobe Portable Document Format
Description:
Name:
PaleontologyTopicTrendsReport.docx
Size:
2.76 MB
Format:
Microsoft Word XML
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: