CTRimages

dc.contributor.authorHakimov, Kurbanen
dc.contributor.authorHartwell, Andrewen
dc.contributor.authorUlmet, Roberten
dc.date.accessioned2013-05-18T16:53:13Zen
dc.date.available2013-05-18T16:53:13Zen
dc.date.issued2013-05-18en
dc.descriptionparse_images.py – a Python script that finds all URLs inside of HTML image tags and creates a text document with URLs and another with ALT tags. bannedUrls.txt – a list of URLs from which no images will be downloaded. ctrfilter – a bash file that runs the script on all .html and .htm files in the current directory and its subdirectories. filter_images.py – a Python script that filters our URLs for download based on banned URLs, image dimensions, ALT tags, and file types. It also downloads the images into a specified folder. CS4624_Documentation.docx – documentation for the project. ImageProperties.xlsx – Excel spreadsheet that has information on all images on the group of webpages we were provided with. FinalPresentation.pptx – the final presentation given during class.en
dc.description.abstractCTRnet (Crisis, Tragedy, and Recovery network) is an NSF funded project that focuses on crawling/scanning the Internet regarding tragic events and creating digital libraries of information on those crises. CTRnet downloads webpages in regards to these events to ensure that this information is saved. As an example, CTRnet has over 440 gigabytes of webpages saved just for the Hurricane Sandy event. Our group was assigned with creating a script to walk through the downloaded webpages, finding relevant images, and downloading them. We also researched gallery modules to create a Drupal gallery for our downloaded images.en
dc.description.sponsorshipKiran Chitturien
dc.description.sponsorshipSeungwon Yangen
dc.identifier.urihttp://hdl.handle.net/10919/22062en
dc.language.isoen_USen
dc.rightsCreative Commons CC0 1.0 Universal Public Domain Dedicationen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/en
dc.subjectpython scripten
dc.subjectimage parsingen
dc.subjectimage filteringen
dc.subjectCTRen
dc.subjectDrupal galleryen
dc.subjectCrisis, Tragedy, and Recovery Network Projecten
dc.titleCTRimagesen
dc.typeSoftwareen

Files

Original bundle
Now showing 1 - 5 of 9
Name:
parse_images.py
Size:
2.08 KB
Format:
Unknown data format
Description:
Python script that parses a list of HTML files for image tags.
Name:
bannedUrls.txt
Size:
193 B
Format:
Plain Text
Description:
List of banned URLs used by the filter.
Name:
ctrfilter
Size:
200 B
Format:
Unknown data format
Description:
Bash file that runs all scripts in one step.
Name:
filter_images.py
Size:
1.46 KB
Format:
Unknown data format
Description:
Python script that filters list of URLs and downloads appropriate images.
Loading...
Thumbnail Image
Name:
CS4624_Documentation.pdf
Size:
455.21 KB
Format:
Adobe Portable Document Format
Description:
Documentation and user manual for the CTRimages project in PDF format.
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: