CS 135 Intro to ML
咨询 Alpha 小助手,获取更多课业帮助
1. Overview
Intermediate deadlines for Problem 1 and Problem 2 code/experimentation and writeup
Team Formation
Encouraged to work in teams of 2, but can work individually
Fill out ProjectA Team Formation Form by 10/6
If need help finding a teammate, post on Piazza
Work to Complete
One semi-open problem (Problem 1) and one completely open problem (Problem 2)
Practice ML development cycle for both problems
Maintain leaderboards on Gradescope
2. What to Turn In
PDF Report
One report covering all problems, 4 - 6 pages
Manually graded
Mark subproblems via Gradescope annotation tool
ZIP Files of Predictions
One ZIP file for Problem 1 and one for Problem 2
Each contains a single plain text file with float probabilities for test set predictions
Reflection Form
Each individual turns in a reflection form after completing the report
3. Starter Code and Code Restrictions
Starter Code Repo
https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/projectA
Provides scripts to load data, but no other code
Code Usage
Can use any Python package
Understand and cite third-party code
4. Background
Dataset
From research work in KDD 2015 paper
Thousands of single-sentence reviews from imdb.com, amazon.com, yelp.com
Training set of 2400 examples, test set of 600 examples in CSV format
Binary labels indicating sentiment
Performance Metric
Area under the ROC curve (AUROC)
5. Problem 1: Bag-of-Words Feature Representation
Background on Bag-of-Words
Represent documents as count vectors of a fixed vocabulary
Many design decisions involved
Goals and Tasks
Develop BoW representation and binary classifier pipeline
Experiment with preprocessing
Use LogisticRegression classifier
Use hyperparameter selection techniques with cross-validation
Report Sections
1A: Describe BoW design decisions
1B: Describe cross-validation design
1C: Describe hyperparameter selection for classifier
1D: Analyze predictions of best classifier
1E: Report test set performance on leaderboard
6. Problem 2: Open-ended challenge
Goals and Tasks
Use any feature representation, classifier, and hyperparameter selection procedure
Try various methods to improve performance
Report Sections
2A: Describe feature representation
2B: Describe cross-validation or equivalent process
2C: Describe classifier and hyperparameter search
2D: Analyze errors of best classifier
2E: Report test set performance on leaderboard
7. Grading
Overall Grade Breakdown
87%: Report performance
10%: Leaderboard submissions
3%: Completion of reflection