Downloading patent data for service firms and analyzing the data

The primary task was to create a database in Python, using information from either the United States Patent and Trademark Office or Google Patents, which allows efficient lookups using information such as patent assignee and the patent number. Google Patents was chosen because it contained international patent information rather than being limited to just the United States. The Jupyter Notebook was made to use Beautiful Soup to scrape data from Google Patents. The workflow of the code is to start with a user-defined comma-separated values file that specifies names, e.g., of restaurants and hotel firms, that are relevant to the analysis the user wants to conduct. The first tasks were to read in the query, create a dictionary of company names with associated patent numbers, scrape websites for lxml data, and write raw data to JSON and Excel.

The next task was to analyze the stored information qualitatively or quantitatively. Here qualitative analysis was chosen in the form of Natural Language Processing (NLP). The goal was to classify the patents using NLP. The key steps included noise removal, stop word removal, and lemmatization.

With this database, we can perform numerous types of analyses to study the effect of patents on the total valuation of companies. It is anticipated that Dr. Zach and future Computer Science students will build upon the current work and conduct additional forms of analysis.

Keywords

NLP, Patent, Google Patents, USPTO, United States Patent and Trademark Office, Web Scraping, Patents, Python, Jupyter, Notebook, Jupyter Notebook, BeautifulSoup, Data Science, BS4

Persistent link

http://hdl.handle.net/10919/103276

Collections

CS4624: Multimedia, Hypertext, and Information Access

Full item page

Downloading patent data for service firms and analyzing the data

Files

TR Number

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

Persistent link

Collections