Labs

Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared a virtual machine (VM) which you can download and use it. You can download the machine from here (size: ~7.46GB). The username/password is csdeptucy.

Week Description Material
2 Inverted Index and the Boolean Model using NLTK and Apache OpenNLP LAB01.pdf,
lab1.py
OpenNLP.zip
 
3 Apache Lucene LAB02.pdf,
dataset.zip
Lucene 1 Solution
Lucene 2 Solution
 
4 Apache Solr LAB03.pdf  
5 ElasticSearch LAB04.pdf
lab4.zip
elasticJava.zip
 
6 Apache Hadoop 1 LAB05.pdf
Hadoop 1 Source Code
Dataset
Hadoop 1 Solution
 
7 Apache Hadoop 2   LAB06.pdf
Hadoop 2 Source Code -- WordCount.java
Hadoop 2 Solution
 
8 Apache Hadoop 3   LAB07.pdf
Hadoop 3 Source Code
Dataset
Hadoop 3 Solution
9 Apache Nutch LAB08.pdf  
10 Apache Tika LAB09.pdf
LAB09.zip
 
11 Text Clustering and Classification in Python LAB10
Lab10-description.pdf
labeledTrainData.tsv
 
12 Apache Spark LAB11.pdf  
13 Projects Presentations