Downloading patent data for service firms and analyzing the data

dc.contributor.authorJha, Abhisheken
dc.contributor.authorTian, Boyanen
dc.contributor.authorCooper, Matthewen
dc.contributor.authorWang, Zifanen
dc.date.accessioned2021-05-14T04:30:12Zen
dc.date.available2021-05-14T04:30:12Zen
dc.date.issued2021-05-14en
dc.description.abstractThe primary task was to create a database in Python, using information from either the United States Patent and Trademark Office or Google Patents, which allows efficient lookups using information such as patent assignee and the patent number. Google Patents was chosen because it contained international patent information rather than being limited to just the United States. The Jupyter Notebook was made to use Beautiful Soup to scrape data from Google Patents. The workflow of the code is to start with a user-defined comma-separated values file that specifies names, e.g., of restaurants and hotel firms, that are relevant to the analysis the user wants to conduct. The first tasks were to read in the query, create a dictionary of company names with associated patent numbers, scrape websites for lxml data, and write raw data to JSON and Excel. The next task was to analyze the stored information qualitatively or quantitatively. Here qualitative analysis was chosen in the form of Natural Language Processing (NLP). The goal was to classify the patents using NLP. The key steps included noise removal, stop word removal, and lemmatization. With this database, we can perform numerous types of analyses to study the effect of patents on the total valuation of companies. It is anticipated that Dr. Zach and future Computer Science students will build upon the current work and conduct additional forms of analysis.en
dc.description.notesThe two versions of the final report are in PatentDataAnalysisReport.docx (Word) and PatentDataAnalysisReport.pdf (PDF). The two versions of the final presentation are in PatentDataAnalysisPresentation.odp (OpenDocument Presentation) and PatentDataAnalysisPresentation.pdf (PDF).en
dc.identifier.urihttp://hdl.handle.net/10919/103276en
dc.language.isoen_USen
dc.publisherVirginia Techen
dc.subjectNLPen
dc.subjectPatenten
dc.subjectGoogle Patentsen
dc.subjectUSPTOen
dc.subjectUnited States Patent and Trademark Officeen
dc.subjectWeb Scrapingen
dc.subjectPatentsen
dc.subjectPythonen
dc.subjectJupyteren
dc.subjectNotebooken
dc.subjectJupyter Notebooken
dc.subjectBeautifulSoupen
dc.subjectData Scienceen
dc.subjectBS4en
dc.titleDownloading patent data for service firms and analyzing the dataen
dc.typePresentationen
dc.typeReporten

Files

Original bundle
Now showing 1 - 4 of 4
Name:
PatentDataAnalysisReport.docx
Size:
1.21 MB
Format:
Microsoft Word XML
Loading...
Thumbnail Image
Name:
PatentDataAnalysisReport.pdf
Size:
885.23 KB
Format:
Adobe Portable Document Format
Name:
PatentDataAnalysisPresentation.pptx
Size:
1.56 MB
Format:
Microsoft Powerpoint XML
Loading...
Thumbnail Image
Name:
PatentDataAnalysisPresentation.pdf
Size:
607.08 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: