数据挖掘代写|美国代写


Case Western Reserve University

CSDS 435/335 Data Mining


1. (Total 5 pts) We try to build a decision tree using the same training data in the table on page one.

A) (1pt) What is the Gini index of the root node?

B) (3pts) We will check each of the three attributes (Gender, Car Type, Shirt size), and calculate the Gini index of child nodes (weighted sum of the Gini index of the child nodes) for each of the attributes. What are the values of the Gini index of child nodes when we use Gender, Car Type, Shirt Size as the splitting attribute, respectively?

2. (4pts) Consider the task of building a classifier from random data, where the attribute values are generated randomly irrespective of the class labels. Assume the data set contains records from two classes, “+” and “−.” Half of the data set is used for training while the remaining half is used for testing.A. Suppose there are an equal number of positive and negative records in the data and the decision tree classifier predicts every test record to be positive. What is the expected error rate of the classifier on the test data?B. Repeat the previous analysis in A, assuming that the classifier predicts each test record to be positive class with probability 0.8 and negative class with probability 0.2.C. Suppose two-thirds of the data belong to the positive class and the remaining one-third belong to the negative class. What is the expected error of a classifier that predicts every test record to be positive?D. Repeat the previous analysis in C, assuming that the classifier predicts each test record to be positive class with probability 2/3 and negative class with probability 1/3.Assume the total number of samples are 2n (so the number of test data is n). For each of the above, please provide the (expected) confusion matrix on the test data, based on which, you can calculate the error rate.

3. (total 11pts) You are asked to evaluate the performance of two classification models, M1 and M2. The test set you have chosen contains 26 binary attributes, labeled as A through Z. The table below shows the posterior probabilities obtained by applying the models to the test set. (Only the posterior probabilities for the positive class are shown). As this is a two-class problem, P(−) = 1 − P(+) and P(−|A, . . ., Z) = 1 − P(+|A, . . . , Z). Assume that we are mostly interested in detecting instances from the positive class.

A. (5pts) Plot the ROC curve for both M1 and M2. 

B. (2pts) For model M1, suppose you choose the cutoff threshold to be t = 0.5. In other words, any test instances whose posterior probability is greater than t will be classified as a positive example. Compute the precision, recall, and F-measure for the model at this threshold value.

C. (2pts) Repeat the analysis for part (B) using the same cutoff threshold on model M2. Compare the F-measure results for both models. Which model is better? Are the results consistent with what you expect from the ROC curve?

D. (2pts) Repeat part (B) for model M1 using the threshold t = 0.1. Which threshold do you prefer, t = 0.5 or t = 0.1? Are the results consistent with what you expect from the ROC curve?

 have taken their assigned medication. At the end of the visits, the robot reportsto the doctor which patients were not in the room, and which did not take theirmedication.Important NoteEvery implementation point described below is associated a number of marks. Inprevious years, we noticed that students frequently try to farm partial marks by writingsome amount of code for every section, even though none of it can even be executed.This is not the way to develop any complex system, and we intend to disincentivise it.For the reason described above, if a node implementing a piece of functionality doesnot execute, at most half the marks for that ROS node can be awarded. Marks arerounded up. Therefore, if an item is worth 5 marks, the most that a non-executablecode can get is 3.By non-executable, we mean that the code immediately terminates with an error dueto syntactic issues in the file, or wrong import statements. Runtime exceptions or bugsthat do not happen in the early stages of the execution will not be considered as non-executable, and therefore will not incur the penalty.Initialization• Create a package called "resit_coursework". Remember to maintain thecorrect dependencies in package.xml and CMakeLists.txt during development




咨询 Alpha 小助手,获取更多课业帮助