计算机视觉代写|CSC 代写


University of Toronto 多伦多大学

CSC420 Intro to Image Understanding

Part II: Implementation Tasks (80 marks)
In this question, we train (or fine-tune) a few different neural network models to classify
dog breeds. We also investigate their dataset bias and cross-dataset performances. All the
tasks should be implemented using Python with a deep learning package of your choice, e.g.
PyTorch or TensorFlow.
We use two datasets in this assignment.
1. Stanford Dogs Dataset
2. Dog Breed Images
The Stanford Dogs Dataset (SDD) contains over 20,000 images of 120 different dog breeds.
The annotations available for this dataset include class labels (i.e. dog breed name) and
bounding boxes. In this assignment, we’ll only be using the class labels. Further, we will
only use a small portion of the dataset (as described below) so you can train your models on
Colab. Dog Breed Images (DBI) is a smaller dataset containing images of 10 different dog breeds.
To prepare the data for the implementation tasks, follow these steps:
1- Download both datasets and unzip them. There are 7 dog breeds that appear in both
datasets:
Bernese mountain dog
Border collie
Chihuahua
Golden retriever
Labrador retriever
Pug
Siberian husky
2- Delete the folders associated with the remaining dog breeds in both datasets. You can
also delete the folders associated with the bounding boxes in the SDD.
3- For the 7 breeds that are present in both datasets, the names might be written slightly
differently (e.g. Labrador Retriever vs. Labrador). Manually rename the folders so the
names match (e.g. make them both labrador retriever ).
4- Rename the folders to indicate that they are subsets of the original datasets (to avoid
potential confusion if you later want to use them for another project). For example, SDDsub
set and DBIsubset. Each of these should now contain 7 subfolders (e.g. border collie, pug,
etc.) and the names should match.
5- Zip the two folders (e.g. SDDsubset.zip and DBIsubset.zip) and upload them to your
Google Drive (if you want to use Google Colab).
You can find sample code working with the SDD on the internet. If you want, you are
welcome to look at these examples and use them as your starting code or use code snippets
from them. You will need to modify the code as our questions are asking you to do different
tasks, which are not the same as the ones in these online examples. But using and copying
code snippets from these resources is fine. If you choose to use one of these online examples
as your starting code, please acknowledge them in your submission. We also suggest that
before starting to modify the starting code, you run them as is on your data (e.g. DBIsubset)
to 1) make sure your dataset setup is correct and 2) to make sure you fully understand the
starter code before you start modifying it.

Task I - Inspection (5 marks):
Look at the images in both datasets, and briefly explain if you observe any systematic dif
ferences between images in one dataset vs. the other.

Task II - simple CNN Training on the DBI (10 marks):
Construct a simple convolutional neural network (CNN) for classifying the images in DBI.
For example, you can construct a network as follow:
convolutional layer - 16 filters of size 3×3
batch normalization
convolutional layer - 16 filters of size 3×3
max pooling (2×2)
convolutional layer - 8 filters of size 3×3
batch normalization
convolutional layer - 8 filters of size 3×3
max pooling (2×2)
dropout (e.g. 0.5)
fully connected (32)
dropout (0.5)
softmax
If you want, you can change these specifications; but if you do so, please specify them in your
submission. Use RELU as your activation function, and cross-entropy as your cost function.
Train the model with the optimizer of your choice, e.g., SGD, Adam, RMSProp, etc. Use
random cropping, random horizontal flipping, random colour jitter, and random rotations
for augmentation. Make sure to tune the parameters of your optimizer for getting the best
performance on the validation set.
Plot the training, and test accuracy over the first 10 epochs. Note that the accuracy is
different from the loss function; the accuracy is defined as the percentage of images classified correctly.
Train the same CNN model again; this time, without dropout. Plot the training and test
accuracy over the first 10 epochs; and compare them with the model trained with dropout.
Report the impact of dropout on the training and its generalization to the test set.

Task III - ResNet Training on the DBI (15 marks):
[III.a] (10 marks) ResNet models were proposed in the “Deep Residual Learning for Image
Recognition” paper. These models have had great success in image recognition on benchmark
datasets. In this task, we use the ResNet-18 model for the classification of the images in the
DBI dataset. To do so, use the ResNet-18 model from PyTorch, modify the input/output
layers to match your dataset, and train the model from scratch; i.e., do not use the pre
trained ResNet. Plot the training, validation, and testing accuracy, and compare those with
the results of your CNN model.
[III.b] (5 marks) Run the trained model on the entire SDD dataset and report the accuracy.
Compare the accuracy obtained on the (test set of) DBI, vs. the accuracy obtained on the
SDD. Which is higher? Why do you think that might be? Explain very briefly, in one or two sentences.

Task IV - Fine-tuning on the DBI (20 marks):
Similar to the previous task, use the following models from PyTorch (within torchvision):
ResNet18, ResNet34, ResNeXt50, SwinTransformer (tiny), and a fifth model of your choosing
from torchvision or timm. ResNet18, ResNet34, and ResNeXt50 are convolutional networks
and Swin is a transformer-based architecture. For fine-tuning, you will need to replace the
final layer so the output matches the number of classes in your dataset. Hint: The final
layer might have a different name in each model.
This time you are supposed to use the pre-trained models and fine-tune the input/output
layers on DBI training data. Report the accuracy of these fine-tuned models on DBI test
dataset, and also the entire SDD dataset.
Discuss the cross-dataset performance of these trained models. Which models generalized to
the new dataset better? For example, are there cases in which two different models perform
equally well on the test portion of the DBI but have significant performance differences when
evaluated on the SDD? Are there models for which the performance gap between the SSD
and test portion of DBI are very small?

Task V - Dataset detection (15 marks):
Train a model that – instead of classifying dog breeds – can distinguish whether a given
image is more likely to belong to SDD or DBI. To do so, first, you need to divide your data
into training and test data (and possibly validation if you need those for tuning the hyper
parameters of your model). You need to either reorganize the datasets (to load the images
using torchvision.datasets.ImageFolder ) or write your own data loader function. You can
start from a pre-trained model (of your choice) and fine-tune it on the training portion of the
dataset. Include your network model specifications in the report, and make sure to include
your justifications for that choice. Report your model’s accuracy on the test portion of the dataset.

Task VI - How to improve performance on SDD? (10 marks):
If our goal were to have good performance on the SDD dataset, briefly discuss how to work
towards this goal in each of the following cases: (you don’t need to implement these, just
briefly discuss each case in 2-3 sentences)
At training time, we have access to the entire DBI dataset, but none of the SDD dataset.
All we know is a high level description of SDD and its differences with DBI (similar to
the answer you provided for Task I of this question).
At training time, we have access to the entire DBI dataset and a small portion (e.g. 10%)
of the SDD dataset.
At training time, we have access to the entire DBI dataset and a small portion (e.g.
10%) of the SDD dataset but without the SDD labels for this subset.

Task VII - Discussion (5 marks):
Briefly discuss how some of the issues that were examined in this exercise can have implica
tions in real application, e.g. as related to bias or performance. For example, consider the
case where available training datasets are collected in one setting (e.g. a university) and the
goal is to deploy trained models in another setting (e.g. a retirement home).
 

Background Story

You are programming a robot that helps hospital staff in carrying out some oftheir duties. The robot is required to visit the patient rooms and check whetherthe patient is present. If that is the case, the robot must ask the patient if theyhave taken their assigned medication. At the end of the visits, the robot reportsto the doctor which patients were not in the room, and which did not take theirmedication.Important NoteEvery implementation point described below is associated a number of marks. Inprevious years, we noticed that students frequently try to farm partial marks by writingsome amount of code for every section, even though none of it can even be executed.This is not the way to develop any complex system, and we intend to disincentivise it.For the reason described above, if a node implementing a piece of functionality doesnot execute, at most half the marks for that ROS node can be awarded. Marks arerounded up. Therefore, if an item is worth 5 marks, the most that a non-executablecode can get is 3.By non-executable, we mean that the code immediately terminates with an error dueto syntactic issues in the file, or wrong import statements. Runtime exceptions or bugsthat do not happen in the early stages of the execution will not be considered as non-executable, and therefore will not incur the penalty.Initialization• Create a package called "resit_coursework". Remember to maintain thecorrect dependencies in package.xml and CMakeLists.txt during development




咨询 Alpha 小助手,获取更多课业帮助