Downloading patent data for service firms and analyzing the data
Files
TR Number
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The primary task was to create a database in Python, using information from either the United States Patent and Trademark Office or Google Patents, which allows efficient lookups using information such as patent assignee and the patent number. Google Patents was chosen because it contained international patent information rather than being limited to just the United States. The Jupyter Notebook was made to use Beautiful Soup to scrape data from Google Patents. The workflow of the code is to start with a user-defined comma-separated values file that specifies names, e.g., of restaurants and hotel firms, that are relevant to the analysis the user wants to conduct. The first tasks were to read in the query, create a dictionary of company names with associated patent numbers, scrape websites for lxml data, and write raw data to JSON and Excel.
The next task was to analyze the stored information qualitatively or quantitatively. Here qualitative analysis was chosen in the form of Natural Language Processing (NLP). The goal was to classify the patents using NLP. The key steps included noise removal, stop word removal, and lemmatization.
With this database, we can perform numerous types of analyses to study the effect of patents on the total valuation of companies. It is anticipated that Dr. Zach and future Computer Science students will build upon the current work and conduct additional forms of analysis.