Blog and Forum Collection for Trail Study

Abstract

This project is focused on the culture and trends of the Triple Crown Trails (Appalachian Trail, Pacific Crest Trail, and Continental Divide Trail). The goal of this project is to create a large collection of forum and blog posts that relate to the previously stated trails through the use of web crawling and internet searching. One reason for this project is to assist our client with her Master’s Thesis. Our client, Abigail Bartolome is focusing her thesis on the different trends and different ways of life on the Triple Crown Trails, and the use of our tool will help her. The impact of our project is that it will allow our client to be able to sift through information much faster in order to find what she does and does not need for her thesis, instead of wasting time searching through countless entries with non-relevant information. Abigail will also be able to sift through what kind of information she wants specifically through the use of our tagging system. We have provided the dates, titles, and author of each post so she can immediately see if the article has relevant information and was posted in a time frame that is applicable.

The project will have two main focuses, the frontend and the backend. The frontend is an easy-to-use interface for Abigail. It will allow her to to search for specific tags, which will filter the blog posts based on what information she seeks. The tags are generated automatically based on the content of all of the forums and blogs together, making them very specific which is good for searching for the kind of content desired by our client. When she finishes adding tags, she can then search for blogs or forums that relate to the topics tagged. The page will display them in a neat format with the title of the article that is hyperlink-embedded so she can click on it to see the information from the article, as well as the author, date, and source of the post.

The backend is where all the heavy lifting will be done, but obviously is invisible to the client. This is where we will go through each of the blog or forum websites fed into the web crawler to store all of the relevant information into our database. The backend is also where the tagging system is implemented and where tags are generated and applied to blog posts. WordPress and BlogSpot (for the most part) have a uniform way of going through blogs, so our web crawler acts accordingly based on which website it is, and is able to go through until there are no more blogs on that site. All of the blog posts, contents, pictures, tags, URLs, etc. are stored in the backend database and then linked to our frontend so that we can display it neatly and organized to the liking of Abigail. From 31 sources we have collected 3,423 blog posts to which have been assigned 87,618 tags.

Together, the frontend and the backend provide Abigail with a method to both search and view blog post content in an efficient manner.

Description

Keywords

Trail, Blogs, Web Scraping, Django, Python, Presentation, Report, Application, Trail Culture

Citation