Now showing items 39-45 of 45

    • SEDNA XML Database 

      Vijay, Sony; El Meligy Abdelhamid, Sherif; Malayattil, Sarosh (2010-12-09)
      The module introduces the use of SEDNA XML database for XML retrieval. The primary focus of the module is to describe the architecture of SEDNA database and how standard XML queries can be used to retrieve data from it.
    • Text Classification Using Mahout 

      Alam, Maksudul; Arifuzzaman, S. M.; Bhuiyan, Md Hasanuzzaman (2012-11-06)
      This module focuses on classification of text using Apache Mahout. After successful completion of this module, students will be able to explain and apply methods of classification, correctly classify a set of documents ...
    • Text Clustering Using LucidWorks and Apache Mahout 

      Chen, Liangzhe; Lin, Xiao; Wood, Andrew (2012-11-17)
      This module introduces algorithms and evaluation metrics for flat clustering. We focus on the usage of LucidWorks big data analysis software and Apache Mahout, an open source machine learning library in clustering of ...
    • Web Archiving 

      Lee, Spencer; Kanan, Tarek; Jiao, Jian (2009-10-09)
      This module covers the ideas, approaches, problems and needs of web archiving to build a static and long term collection consisting of web pages.
    • Web Publishing 

      Karia, Pratik (2009-09-08)
      This module covers the general principles of web publishing and the various paradigms that can be used for storing and retrieving content within digital libraries. This module introduces various techniques to publish ...
    • Weka 

      Peddi, Bhanu; Xiong, Huijun; ElSherbiny, Noha (2010-12-10)
      This module stresses the methods of text classification used in information retrieval. We focus on the usage of Weka, a data mining toolkit, in data processing with three classification algorithms: Naive Bayes [1], k Nearest ...
    • WordNet 

      Fouh, Eric; Poirel, Christopher (2010-10-25)
      This module covers the use of a thesaurus in several information retrieval (IR) techniques: index construction (e.g., tokenization, stemming, and lemmatization), robustness to query typographical errors (e.g., the use of ...