Data Mining in Engineering
IE7275 • Summer 2020 • Northeastern University • Boston
This course covers the theory and applications of data mining in engineering. It reviews fundamentals and key concepts of data mining, discusses important data mining techniques, and presents algorithms for implementing these techniques. In specific, this course covers data mining techniques for data preprocessing, association rule extraction, classification, prediction, clustering, and complex data exploration. Data mining applications in several areas including manufacturing, healthcare, medicine, business, and other service sectors are discussed.
- Class: Thuesday 1:20pm – 03:00pm (EDT)
- Office hour: Monday 3:30pm to 4:30pm on zoom
- Location: Zoom
- Dates: 05/04/2020 – 06/25/2020
- Administration: Class/HW/project questions, or discussion will be only posted via Piazza.
- Prerequisites: IE6200 (familiar with R and necessary packages/libraries, e.g. tidyverse)
- Preparation for the course:
- Linear Algebra: G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press, 2009. Ch 1-4.
- Calculus: G. Strang. Calculus. Wellesley-Cambridge Press, 2010. Old edition of the book (1991)
- [ Materials ]
- [ Video lectures]
- Instructor Zhenyuan Lu
- Email:
- Office hours: Tue 01:00pm to 02:00pm on Zoom
- TA Dongning Li
- Email:
- Office hours:
Table of contents
- Table of contents
- Course goals
- Textbooks
- Policies
- Accommodations for Students with Disabilities
- Take care of yourself
- Course Evaluation
- Schedule
Course goals
Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl Jr., Data Mining for Business Analytics: Concepts, Techniques, and Applications in R
Textbooks
The required textbook:
- [DMBA] Galit Shmueli, Peter C. Bruce, Inbal Yahav, Nitin R. Patel, Kenneth C. Lichtendahl Jr., Data Mining for Business Analytics: Concepts, Techniques, and Applications in R, Wiley, 1st Edition,ISBN-10: 1118879368, ISBN-13: 978-1118879368.
Additional textbooks:
R:
- R For Data Science (R4DS), Wickham, Hadley, and Garrett Grolemund
- R Markdown (RMD), Xie, Yihui, et al.
Data Mining:
- Mohammed J. Zaki, Wagner Meira, Jr., Data Mining and Analysis: Fundamental Concepts and Algorithms (DMA), Cambridge University Press, May 2014 (NEU Library Link)
- Tan, Pang-Ning, et al. Introduction to Data Mining (DM). Pearson Education, 2006. (Book website)
Statistical Modeling:
- [ISLR] James, Gareth, et al. An Introduction to Statistical Learning: with Applications in R. Springer, 2017. Open License, Book Website
- [ESL]Hastie, T., Friedman, J., & Tisbshirani, R. (2017). The Elements of statistical learning: data mining, inference, and prediction. New York: Springer. Open License
Machine learning and Deep learning:
- [ML] Machine Learning, Mitchell, Tom M. Book Website
- [DL] Deep learning, Goodfellow, Ian, et al. Open License
Policies
Please post questions, and discussion only via Piazza. The visibility of questions and discussion are expected to set for public view (to the Entire class on Piazza). Please feel free to send instructor/TAs emails regarding any personal or other private issues/concerns. All students are expected to attend the class through the entire semester. Please send me an email 24 hours before the coming class regarding the absence for any medical, or emergency reasons. You will be granted one homework extension of two calendar days, to be used at your discretion, without having to ask. Plagiarism, cheating, and any form of unauthorized collaboration will not be tolerated and will be handled in accordance with University policies described in the Student Handbook. For additional information on Northeastern University’s Academic Integrity Policy
Accommodations for Students with Disabilities
If you have a disability, I encourage you to contact Disability Resource Center to register and request the accommodations. Also please discuss your needs with me as early in the semester as possible.
Take care of yourself
Eating healthy food, having regular exercises, avoiding alcohol and drugs, getting adequate sleep and taking time to relax. This will help you achieve your goals and tame stress.
If you have difficulty to keep up with any materials or homework for personal reasons please let me know early. If you or your friends/classmates who appears to be struggling, or having trouble coping with stress. We strongly encourage you to seek support at the We Care program at NEU. At Northeastern, a student is never alone when struggling with a demanding situation.
Course Evaluation
- Homework 35%
- Midterm Exam 20%
- Final Exam 20%
- Project 15%
- Class Participation 10%
Schedule
(subject to change)
Date | Lecture | Content | Logistics | |
---|---|---|---|---|
Week 1 | ||||
5/4 |
Introduction
|
|
||
5/5 |
Basic of R
|
|
||
5/6, 5/7 |
Exploratory Data Analysis and Data Transformation
|
|
||
Week 2 | ||||
5/11, 5/12 |
Dimension Reduction
|
|
||
5/13, 5/14 |
Evaluating Predictive Performance
|
|
HW1 Due 5/14, 11:59pm |
|
Week 3 | ||||
5/18, 5/19 |
Multiple Linear Regression
|
|
||
5/20 |
Midterm Review |
|
||
5/21 |
k-Nearest Neighbors
|
|
HW2 Due 5/21, 11:59pm |
|
Week 4 | ||||
5/25, No Class, enjoy! | ||||
5/26, 5/27 |
Naïve Bayes Classifier
|
|
||
5/28, No class, hw3 due 5/28 11:59 pm ET; midterm Due 5/29 11:59 pm, GOOD LUCK! | ||||
Week 5 | ||||
6/1, 6/2 |
Logistic Regression, Generative vs. Discriminative models
|
|
||
6/3, 6/4 |
Decision Tree
|
|
||
Week 6 | ||||
6/8, No class, No class, University's decision | ||||
6/9 |
Linear Discriminant Analysis
|
|
||
6/10, 6/11, 6/15 |
Neural Networks and Deep Learning
|
|
HW4 Due 6/13, 11:59pm |
|
Week 7 | ||||
6/16 |
Support Vector Machine
|
|
||
6/17 |
Association Rule and Clustering Anaysis Q&A Workshop Instructor: Guest Lecturer - Jinhui Zhao, Qibin Tan |
|
Project Slides and files due 6/17, 11:59pm |
|
6/18 |
Neural Networks and Deep Learning
|
|
HW5 Due 6/20, 11:59pm |
|
No Class On Final Week | ||||
Final Exam due 6/25 11:59 pm | ||||
Project Report due 6/25 11:59 pm |