Labs

Installing and configuring all the software needed for this course on your machine might be tedious. We have prepared a virtual machine (VM) with the majority of tools which you can download and use it.

If you use VMWare Workstation Player (recommended, free for personal use from here) you can download the VM from here (size: ~8.7GB).

If you use Oracle VirtualBox (latest version 7.0.4, free from here) you can download the VM from here (size: ~9.3GB).

The username/password is csdeptucy.

PLEASE INSTALL THE VM BEFORE THE FIRST LAB.

If you want to resize your VM please follow these instructions.

We would like to kindly ask you to bring your own laptop (with VM installed on it) in the lab.

Week Description Useful Links Material Exercises to deliver
1 Introduction to Apache Hadoop LAB01.pdf
Source Code
Dataset
 
2 Programming with Apache Hadoop   LAB02.pdf
WordCount.java
SalesJan2009.csv
🔴
3 Introduction to Python   LAB03.pdf
 
4 Data Manipulation LAB04.pdf,
Lab04.ipynb,
iris_data.csv,
iris_data2.csv
5 Data Visualization LAB05.pdf,
Lab05.ipynb,
iris.csv,
haberman.csv
6 Data Preparation I: Cleaning, Encoding, Scaling, Resampling Data   LAB06
Lab06.ipynb
NFL Play by Play 2009-2016 (v3).zip
house_prices_train.csv
shampoo.csv
7 Data Preparation II: Dimensionality Reduction: Feature Selection and Extraction LAB07
Lab07.ipynb
 
8 Machine Learning: Regression   LAB08
Lab8_LinearRegression.ipynb
Lab8_PolynomialRegression.ipynb
Advertising.csv
Boston.csv
9 Machine Learning: Regression (cont'd)   LAB08
Lab08.ipynb
Advertising.csv
Boston.csv
10 Machine Learning: Classification and Clustering   LAB09
Lab09-classification.ipynb
Lab09-clustering.ipynb
telco.csv
wine_data.csv
fleet_data.csv
WineAnalysis.ipynb
🔴
11 Introduction to Apache Spark   LAB10
kmeans-rdd.py
kmeans-dataframe.py
 
12 Programming with Apache Spark   LAB11
kmeans-fleet.py
🔴
13 No Lab Project Presentation Week