Preprocessing and Selection
Week 06, Fall 2023
- Start: Monday, September 25
- End: Friday, September 29
Summary
This week we will focus on model selection. In particular, we will finally introduce cross-validation as one of our go-to tools for investigating possible values of tuning parameters for our models. Additionally, we will spend some time disucssing preprocessing, including how to handle categorical data. Importantly, we will utilize sklearn
pipelines to efficently applying these processes to real data.
Learning Objectives
After completing this week, you are expected to be able to:
- Use cross-validation to select values of tuning parameters.
- Understand the effects of preprocessing data.
- Prepare data using
pandas
that can then be passed tonumpy
andsklean
. - Use feature engineering to create new and useful features.
- Use dummy or one-hot encoding to handle categorical feature variables.
- Use
sklearn
pipelines to streamline preprocessing and selection while avoiding data leakage.
Reading
Link | Source |
---|---|
Week 06 Concept Scribbles | Course Website |
Week 06 Notebook [ Rendered Notebook ] | Course Website |
Video
Head to ClassTranscribe to watch lecture recordings. They are arranged by date in the Lecture Capture Recordings playlist.
Assignments
No new assignments this week as we prepare for the exam! Use this time to get caught up on previous assignments! Also, be aware that each day leading to the exam, we will release an Additional Practice assignment on PrairieLearn. These assignments will have no effect on your grade, but may be used to practice for the exam.
Office Hours
Staff | Day | Time | Location |
---|---|---|---|
David | Monday | 11:00 AM - 12:00 PM | 2328 Siebel Center |
Lahari | Wednesday | 4:00 PM - 5:00 PM | Siebel Center, Second Floor [ Queue ] |
David | Wednesday | 5:00 PM - 6:00 PM | Zoom |
Eunice | Thursday | 3:00 PM - 4:00 PM | Siebel Center, Second Floor [ Queue ] |