cs135 代写｜硅谷团队

CS 135 Intro to ML

咨询 Alpha 小助手，获取更多课业帮助

1. Overview

Intermediate deadlines for Problem 1 and Problem 2 code/experimentation and writeup

Team Formation

Encouraged to work in teams of 2, but can work individually

Fill out ProjectA Team Formation Form by 10/6

If need help finding a teammate, post on Piazza

Work to Complete

One semi-open problem (Problem 1) and one completely open problem (Problem 2)

Practice ML development cycle for both problems

Maintain leaderboards on Gradescope

2. What to Turn In

PDF Report

One report covering all problems, 4 - 6 pages

Manually graded

Mark subproblems via Gradescope annotation tool

ZIP Files of Predictions

One ZIP file for Problem 1 and one for Problem 2

Each contains a single plain text file with float probabilities for test set predictions

Reflection Form

Each individual turns in a reflection form after completing the report

3. Starter Code and Code Restrictions

Starter Code Repo

https://github.com/tufts-ml-courses/cs135-24f-assignments/tree/main/projectA

Provides scripts to load data, but no other code

Code Usage

Can use any Python package

Understand and cite third-party code

4. Background

Dataset

From research work in KDD 2015 paper

Thousands of single-sentence reviews from imdb.com, amazon.com, yelp.com

Training set of 2400 examples, test set of 600 examples in CSV format

Binary labels indicating sentiment

Performance Metric

Area under the ROC curve (AUROC)

5. Problem 1: Bag-of-Words Feature Representation

Background on Bag-of-Words

Represent documents as count vectors of a fixed vocabulary

Many design decisions involved

Goals and Tasks

Develop BoW representation and binary classifier pipeline

Experiment with preprocessing

Use LogisticRegression classifier

Use hyperparameter selection techniques with cross-validation

Report Sections

1A: Describe BoW design decisions

1B: Describe cross-validation design

1C: Describe hyperparameter selection for classifier

1D: Analyze predictions of best classifier

1E: Report test set performance on leaderboard

6. Problem 2: Open-ended challenge

Goals and Tasks

Use any feature representation, classifier, and hyperparameter selection procedure

Try various methods to improve performance

Report Sections

2A: Describe feature representation

2B: Describe cross-validation or equivalent process

2C: Describe classifier and hyperparameter search

2D: Analyze errors of best classifier

2E: Report test set performance on leaderboard

7. Grading

Overall Grade Breakdown

87%: Report performance

10%: Leaderboard submissions

3%: Completion of reflection