DS 4400: Machine Learning and Data Mining I

Spring 2024

Class Information

Calendar

Additional Reading

Other Resources

 

Instructors:

  • Instructor: David Liu (liu.davi@)
  • TAs: Dhanush Akula, Jai Amin, Caleb Lee

Class Schedule:

  • Tuesday and Friday 9:50-11:30am EST
  • Location: Churchill Hall 103

Office Hours: 

  • Jai: Tuesdays 2:00 – 3:30pm
  • Dhanush: Wednesdays 12:00 – 1:30pm
  • David: Mondays 3:00 – 4:30pm, Thursdays 2:00 – 3:30pm
  • Caleb: Fridays 4:00 – 5:30pm

Office hours held virtually on the Khoury Office Hours Application. If you do not have a Khoury account please apply for one here.

Class forum:  Piazza

Class policies:  Academic integrity policy is strictly enforced

Class description:

Machine learning is a fast-pacing and exciting field achieving human-level performance in tasks such as image classification, speech recognition, machine translation, precision medicine, and self-driving cars. Machine learning has already impacted greatly our daily lives and has the potential to transform the world even more in the near future. This course will provide a broad introduction to machine learning and cover the fundamental algorithms for supervised learning. We will cover topics related to regression, linear classification, non-linear classification, ensemble models, and deep learning. The class will also provide an introduction into ethics and fairness concerns of machine learning, as well as generative AI, such as Large Language Models (LLMs).

 

Pre-requisites:

  • Probability
  • Calculus
  • Linear algebra

Textbook

[ISL] Gareth JamesDaniela WittenTrevor Hastie and Robert Tibshirani. An Introduction to Statistical Learning with Applications in Python.

 

Labs for ISL textbook available on Github.

Grading

The grade will be based on:

 

-       Assignments – 25%

-       Final Project – 30%

-       Midterm Exam – 20%

-       Final Exam – 20%

-       Class Participation – 5%

All students have five late days to use, free of penalty, across the four assignments. After the five late days have been used, assignments will incur a 20% penalty per late day. Students are asked to keep track of late days themselves; however, please email if you are unsure of the number of late days remaining.

 

 Calendar (Tentative)

Slides will be posted online shortly after in-class lecture.

Unit

Week

Date

Topic

Readings

1

Tues

01/09

Course outline (syllabus, grading, policies) [slides]

[ISL] Chapters 1 and 2.1

Introduction and Review

Fri

01/12

Classification and regression

Bias-variance tradeoff [slides][recording]

 

[ISL] Chapters 2.2.1 and 2.2.2

Probability review from Stanford

Bias-variance lecture from Cornell CS 4780

2

Tues

01/16

Probability and linear algebra review [slides]

Linear algebra review from Stanford

Fri

01/19

Simple linear regression

Closed from solution. Correlation [slides][recording]

 

[ISL] Chapter 3.1

Linear regression

3

Tues

01/23

Multiple linear regression

Closed form solution [slides][recording]

 

[ISL] Chapter 3.2

Fri

01/26

Gradient descent [slides][recording]

 

Lecture notes from Stanford on linear regression, part 1.1

Regularization and cross-validation

4

Tues

01/30

Regularization.

Lasso and ridge regression

Homework 1 Due

[slides][recording]

 

[ISL] Chapter 6.2

Fri

02/02

k-Nearest Neighbors (kNN).

Cross-validation

Linear classification. Logistic regression

[slides][recording]

 

[ISL] Chapter 5.1

[ISL] Chapter 4.1, 4.2, and 4.3 (except 4.3.5)

 

 

Linear   Classification

5

Tues

02/06

Logistic regression

[Lab for cross validation]

[slides][recording]

 

Lecture notes from Stanford on linear regression, part 2

 

 

Fri

02/09

Gradient descent for logistic regression

[Lab for logistic regression]

[slides][recording]

 

 

Generative Models

6

Tues

02/13

Evaluation of ML

Midterm Prep

[slides][recording]

 

 

Ethics in AI

Fri

02/16

Generative Models

LDA

[slides][recording]

*class projector experienced technical difficulties so slides are not fully synced in the recording.

Homework 2 Due

 

[ISL] Chapter 4.4.1 LDA

7

Tues

02/20

Ethics in AI

[Overview]

[Hand-on Activity led by Samantha Dies]

*Overview slides created by Irene Y. Chen.

 

 

Fri

02/23

Midterm Exam

 

 

Tree and Ensemble Classification

 

8

Tues

02/27

Naïve Bayes

Decision trees

[slides][recording]

 

Chapter 8.1.2

 

Fri

03/01

Decision trees

Information Gain

Ensemble learning

[slides][recording]

Project proposal due

 

 

 

 

Tues

03/05

Spring break

 

 

 

Fri

03/08

Spring break

 

SVM

9

Tues

03/12

Ensemble learning

Bagging

Boosting

[slides][recording]

 

Chapter 8.2

 

Fri

03/15

Ensemble learning

Boosting

Deep learning introduction.

[slides][recording]

 

MIT Introduction to Deep Learning Lab

 

 

Mon

03/18

Homework 3 Due

 

 

Deep learning

 

 

10

Tues

03/19

Deep learning

Feed-Forward Networks

[slides][recording]

Stanford notes on deep learning, parts 1 and 2

Optional: Chapter 4 from Dive into Deep Learning

 

Fri

03/22

Midterm Solution Review

Feed-Forward Networks

Convolutional Neural Networks

[slides][recording]

 

Optional: Chapter 6 from Dive into Deep Learning

 

Midterm solution review posted to Canvas under Modules >> Exams

 

11

Tues

03/26

Convolutional Neural Networks

Backpropagation

[slides][recording]

 

 

Fri

03/29

Backpropagation

Regularization in Neural Networks

Slides: refer to slides listed under March 26

[recording]

 

MIT Deep Learning Facial Recognition Demo with TensorFlow

12

Tues

04/02

Backpropagation Review

Slides: refer to slides listed under March 26

[recording]

 

 

Fri

04/05

No in-person class. Online office hours for final projects.

 

 

 

 

Sun

04/07

Homework 4 Due

 

 

Supplemental Lectures

13

Tues

04/09

Panel on careers in machine learning

Session not recorded

 

 

Fri

04/12

Final Exam Review

[recording][slides]

 

 

Projects

14

Tues

04/16

Project Presentation Videos

 

 

 

Additional reading

 

Other resources

 

Books: