Opinion Mining Summarization

Abstract

Opinion Mining Summarization is a Multimedia, Hypertext, and Information Access capstone project proposed by Xuan Zhang to Dr. Edward Fox. The purpose of the project is to generate a suite of tools that can generate useful data about products sold on the internet. The final goal is a suite of tools that, when given a product, can scrape the web for review data on that product and create easily accessible summaries of these reviews. This will allow the user to see the general opinion of online consumers on a given product.

In our design phase, we divided the overall project into four main deliverables: a web-crawler, database, web application, and a suite of summarization tools (see inventory for more details). To begin our development process, we identified open source libraries that performed some of the functionality our tools would need. From these libraries, we were able to begin developing tools specific to the needs we identified during our research phase.

We practiced test-driven development, frequently testing our tools on example websites and sample data, in order to ensure correctness and identify any needed design changes. For example, as the project progressed, simulated user testing identified the need for a more user-friendly way to interact with the tools. This led us to design a web application to provide a GUI for the program. Through this web application, it was planned that the user would be able to generate and browse product review summarizations, as well as start web-crawling requests in real time.

At the conclusion of the project, we have a full, cohesive tool. Through the web application, the web-crawler Python scripts can be used, review and summarization data is stored in a MySQL database, a variety of Python summarization scripts can be run on review sets, and the results can be cleanly viewed.

Throughout the process of this project, we learned a great deal about full-stack development. Everything we interacted with provided us with a new opportunity for learning and growth, whether it was Python scripting or the .NET framework. As well, integrating multiple tools written in different languages provided a new challenge for our team, beyond what we had experienced in previous classes. Overall, the start-to-finish completion of a major project was an excellent learning experience that will serve us well as we approach graduation and our future careers.

Description
Keywords
crawler, summarization, opinion mining, product reviews, LDA, extract
Citation