CTRimages

Hakimov, Kurban; Hartwell, Andrew; Ulmet, Robert

CTRimages

dc.contributor.author	Hakimov, Kurban	en
dc.contributor.author	Hartwell, Andrew	en
dc.contributor.author	Ulmet, Robert	en
dc.date.accessioned	2013-05-18T16:53:13Z	en
dc.date.available	2013-05-18T16:53:13Z	en
dc.date.issued	2013-05-18	en
dc.description	parse_images.py – a Python script that finds all URLs inside of HTML image tags and creates a text document with URLs and another with ALT tags. bannedUrls.txt – a list of URLs from which no images will be downloaded. ctrfilter – a bash file that runs the script on all .html and .htm files in the current directory and its subdirectories. filter_images.py – a Python script that filters our URLs for download based on banned URLs, image dimensions, ALT tags, and file types. It also downloads the images into a specified folder. CS4624_Documentation.docx – documentation for the project. ImageProperties.xlsx – Excel spreadsheet that has information on all images on the group of webpages we were provided with. FinalPresentation.pptx – the final presentation given during class.	en
dc.description.abstract	CTRnet (Crisis, Tragedy, and Recovery network) is an NSF funded project that focuses on crawling/scanning the Internet regarding tragic events and creating digital libraries of information on those crises. CTRnet downloads webpages in regards to these events to ensure that this information is saved. As an example, CTRnet has over 440 gigabytes of webpages saved just for the Hurricane Sandy event. Our group was assigned with creating a script to walk through the downloaded webpages, finding relevant images, and downloading them. We also researched gallery modules to create a Drupal gallery for our downloaded images.	en
dc.description.sponsorship	Kiran Chitturi	en
dc.description.sponsorship	Seungwon Yang	en
dc.identifier.uri	http://hdl.handle.net/10919/22062	en
dc.language.iso	en_US	en
dc.rights	Creative Commons CC0 1.0 Universal Public Domain Dedication	en
dc.rights.uri	http://creativecommons.org/publicdomain/zero/1.0/	en
dc.subject	python script	en
dc.subject	image parsing	en
dc.subject	image filtering	en
dc.subject	CTR	en
dc.subject	Drupal gallery	en
dc.subject	Crisis, Tragedy, and Recovery Network Project	en
dc.title	CTRimages	en
dc.type	Software	en

Files

Original bundle

Now showing 1 - 5 of 9

Name:: parse_images.py
Size:: 2.08 KB
Format:: Unknown data format
Description:: Python script that parses a list of HTML files for image tags.

Download

Name:: bannedUrls.txt
Size:: 193 B
Format:: Plain Text
Description:: List of banned URLs used by the filter.

Download

Name:: ctrfilter
Size:: 200 B
Format:: Unknown data format
Description:: Bash file that runs all scripts in one step.

Download

Name:: filter_images.py
Size:: 1.46 KB
Format:: Unknown data format
Description:: Python script that filters list of URLs and downloads appropriate images.

Download

Name:: CS4624_Documentation.pdf
Size:: 455.21 KB
Format:: Adobe Portable Document Format
Description:: Documentation and user manual for the CTRimages project in PDF format.

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.5 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

CS4624: Multimedia, Hypertext, and Information Access