CTRimages
dc.contributor.author | Hakimov, Kurban | en |
dc.contributor.author | Hartwell, Andrew | en |
dc.contributor.author | Ulmet, Robert | en |
dc.date.accessioned | 2013-05-18T16:53:13Z | en |
dc.date.available | 2013-05-18T16:53:13Z | en |
dc.date.issued | 2013-05-18 | en |
dc.description | parse_images.py – a Python script that finds all URLs inside of HTML image tags and creates a text document with URLs and another with ALT tags. bannedUrls.txt – a list of URLs from which no images will be downloaded. ctrfilter – a bash file that runs the script on all .html and .htm files in the current directory and its subdirectories. filter_images.py – a Python script that filters our URLs for download based on banned URLs, image dimensions, ALT tags, and file types. It also downloads the images into a specified folder. CS4624_Documentation.docx – documentation for the project. ImageProperties.xlsx – Excel spreadsheet that has information on all images on the group of webpages we were provided with. FinalPresentation.pptx – the final presentation given during class. | en |
dc.description.abstract | CTRnet (Crisis, Tragedy, and Recovery network) is an NSF funded project that focuses on crawling/scanning the Internet regarding tragic events and creating digital libraries of information on those crises. CTRnet downloads webpages in regards to these events to ensure that this information is saved. As an example, CTRnet has over 440 gigabytes of webpages saved just for the Hurricane Sandy event. Our group was assigned with creating a script to walk through the downloaded webpages, finding relevant images, and downloading them. We also researched gallery modules to create a Drupal gallery for our downloaded images. | en |
dc.description.sponsorship | Kiran Chitturi | en |
dc.description.sponsorship | Seungwon Yang | en |
dc.identifier.uri | http://hdl.handle.net/10919/22062 | en |
dc.language.iso | en_US | en |
dc.rights | Creative Commons CC0 1.0 Universal Public Domain Dedication | en |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | en |
dc.subject | python script | en |
dc.subject | image parsing | en |
dc.subject | image filtering | en |
dc.subject | CTR | en |
dc.subject | Drupal gallery | en |
dc.subject | Crisis, Tragedy, and Recovery Network Project | en |
dc.title | CTRimages | en |
dc.type | Software | en |
Files
Original bundle
1 - 5 of 9
- Name:
- parse_images.py
- Size:
- 2.08 KB
- Format:
- Unknown data format
- Description:
- Python script that parses a list of HTML files for image tags.
- Name:
- bannedUrls.txt
- Size:
- 193 B
- Format:
- Plain Text
- Description:
- List of banned URLs used by the filter.
- Name:
- ctrfilter
- Size:
- 200 B
- Format:
- Unknown data format
- Description:
- Bash file that runs all scripts in one step.
- Name:
- filter_images.py
- Size:
- 1.46 KB
- Format:
- Unknown data format
- Description:
- Python script that filters list of URLs and downloads appropriate images.
Loading...
- Name:
- CS4624_Documentation.pdf
- Size:
- 455.21 KB
- Format:
- Adobe Portable Document Format
- Description:
- Documentation and user manual for the CTRimages project in PDF format.
License bundle
1 - 1 of 1
- Name:
- license.txt
- Size:
- 1.5 KB
- Format:
- Item-specific license agreed upon to submission
- Description: