Preprocessing and Selection

Week 06, Fall 2023

Summary

This week we will focus on model selection. In particular, we will finally introduce cross-validation as one of our go-to tools for investigating possible values of tuning parameters for our models. Additionally, we will spend some time disucssing preprocessing, including how to handle categorical data. Importantly, we will utilize sklearn pipelines to efficently applying these processes to real data.

Learning Objectives

After completing this week, you are expected to be able to:

  • Use cross-validation to select values of tuning parameters.
  • Understand the effects of preprocessing data.
  • Prepare data using pandas that can then be passed to numpy and sklean.
  • Use feature engineering to create new and useful features.
  • Use dummy or one-hot encoding to handle categorical feature variables.
  • Use sklearn pipelines to streamline preprocessing and selection while avoiding data leakage.

Reading

Link Source
Week 06 Concept Scribbles Course Website
Week 06 Notebook [ Rendered Notebook ] Course Website

Video

Head to ClassTranscribe to watch lecture recordings. They are arranged by date in the Lecture Capture Recordings playlist.

Assignments

No new assignments this week as we prepare for the exam! Use this time to get caught up on previous assignments! Also, be aware that each day leading to the exam, we will release an Additional Practice assignment on PrairieLearn. These assignments will have no effect on your grade, but may be used to practice for the exam.

Office Hours

Staff Day Time Location
David Monday 11:00 AM - 12:00 PM 2328 Siebel Center
Lahari Wednesday 4:00 PM - 5:00 PM Siebel Center, Second Floor [ Queue ]
David Wednesday 5:00 PM - 6:00 PM Zoom
Eunice Thursday 3:00 PM - 4:00 PM Siebel Center, Second Floor [ Queue ]