LucidWorks Vectorize Module for the Digital Library Curriculum Initiative

dc.contributor.authorKniphuisen, Daviden
dc.contributor.authorTran, Alanen
dc.date.accessioned2013-05-18T13:59:20Zen
dc.date.available2013-05-18T13:59:20Zen
dc.date.issued2013-05-18en
dc.descriptionThis is a module for students wishing to learn about the LucidWorks Vectorize workflow. It walks them through the steps to use this tool and teaches them the purpose of the vector space model. There are also suggestions for discussion upon completion of the module. The LucidWorks Overview Module should be completed prior to starting this module. The ModuleIntro.(pdf/pptx/mp4) is an introduction to the module. It gives an overview of the concepts as well as learning goals for students. The FinalReport.pdf is an informative piece about the creation of this module. It includes user and developer manuals, as well as descriptions of some of the problems we encountered. The VectorizeModule.(pdf/docx) is the LucidWorks module itself. It includes all of the information concerning the module, as well as prerequisites and references. User's should either watch the introductory video or PowerPoint prior to starting on the Vectorize Module pdf file.en
dc.description.abstractThe goal of our project was to create a learning module for students who are interested in converting a large number of documents of data into a usable form for machine learning, information retrieval, and related purposes. In order to complete this task, we wrote a module that gives information about how LucidWorks Big Data software handles the task of vectorizing documents using a workflow. This module details the approach that LucidWorks implements, and gives detailed instructions on how to create a collection, start the workflow, check the status of the workflow, and finally access the results after the workflow completes. Upon completion of our module, users will be able to test their understanding using the example documents provided by the LucidWorks software, and be familiar with Hadoop’s distributed file system. After users are familiar with how the software works, they will be able to create their own vectorized representations of documents. Our module also provides information about the installation of LucidWorks software on a virtual machine; if the users have no access to the software they will then be able to create their own instance of it. The module will be available also through http://en.wikiversity.org/wiki/Curriculum_on_Digital_Libraries.en
dc.identifier.urihttp://hdl.handle.net/10919/22061en
dc.language.isoen_USen
dc.rightsIn Copyrighten
dc.rights.urihttp://rightsstatements.org/vocab/InC/1.0/en
dc.subjectLucidWorksen
dc.subjectvectorizeen
dc.subjectmoduleen
dc.subjectworkflowen
dc.titleLucidWorks Vectorize Module for the Digital Library Curriculum Initiativeen
dc.typeArticleen
dc.typePresentationen
dc.typeVideoen

Files

Original bundle
Now showing 1 - 5 of 10
Loading...
Thumbnail Image
Name:
ModuleIntro.pdf
Size:
83.58 KB
Format:
Adobe Portable Document Format
Description:
Module introduction (PDF)
Name:
ModuleIntro.pptx
Size:
36.57 KB
Format:
Microsoft Powerpoint XML
Description:
Module introduction (PowerPoint)
Loading...
Thumbnail Image
Name:
ModuleIntro.mp4
Size:
4.49 MB
Format:
MP4 Container format for video files
Description:
Module introduction (MP4 video)
Name:
ModuleIntro.webm
Size:
3.85 MB
Format:
The webm video container format
Description:
Name:
ModuleIntro.mp4-en.vtt
Size:
3.65 KB
Format:
Closed caption or subtitle file for HTML5 video
Description:
License bundle
Now showing 1 - 1 of 1
Name:
license.txt
Size:
1.5 KB
Format:
Item-specific license agreed upon to submission
Description: