数据集介绍:
mpg,miles per gallon即油耗,这个数据集来自卡内基梅隆大学维护的StatLib库。1983年美国统计协会博览会使用了该数据集。这个数据集是对StatLib库中提供的数据集稍加修改的版本。根据Ross Quinlan(1993)在预测属性"mpg"中的使用,删除了 8 个原始实例,因为它们的"mpg"属性值未知。原始数据集在"auto-mpg.data-original"文件中。
该数据集共计9个特征,398个样本,用于回归任务。"该数据涉及城市周期燃料消耗(单位为每加仑英里),将根据3个多值离散和5个连续属性进行预测。"(昆兰,1993)
序号 | 英文名 | 中文名 | 类型 | 备注 |
---|---|---|---|---|
1 | mpg | 油耗, miles | per gallon | continuous |
2 | cylinders | 气缸数量 | multi-valued discrete | |
3 | displacement | 排气量/排量 | continuous | |
4 | horsepower | 马力 | continuous | 存在6个缺失值 |
5 | weight | 重量 | continuous | |
6 | acceleration | 加速度 | continuous | |
7 | model_year | 出厂时间 | multi-valued discrete | |
8 | origin | 产地 | multi-valued | discrete |
9 | name | 车品牌,比如bmw 320i | string (unique for each instance) |
数据集下载
mpg汽车油耗数据集的下载
下载链接:https://github.com/mwaskom/seaborn-data/blob/master/mpg.csv
mpg汽车油耗数据集的使用方法
相关文章
ML之PFI(eli5):基于mpg汽车油耗数据集利用RF随机森林算法和PFI置换特征重要性算法实现模型特征可解释性排序
实验预测油耗的高低(基于R语言)------logistic回归、LDA、QDA&KNN实现
Auto(汽车数据集):建立模型预测油耗的高低。
a:建立一个二元变量mpg01,1表示mpg位于中位数之上,0表示位于中位数之下。
{r}
library(ISLR)
summary(Auto)
attach(Auto)
mpg01 = rep(0, length(mpg))
mpg01[mpg>median(mpg)] = 1
Auto = data.frame(Auto, mpg01)
输出结果:
{r}
mpg cylinders displacement horsepower
Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0
1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0
Median :22.75 Median :4.000 Median :151.0 Median : 93.5
Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5
3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0
Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0
weight acceleration year origin
Min. :1613 Min. : 8.00 Min. :70.00 Min. :1.000
1st Qu.:2225 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000
Median :2804 Median :15.50 Median :76.00 Median :1.000
Mean :2978 Mean :15.54 Mean :75.98 Mean :1.577
3rd Qu.:3615 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000
Max. :5140 Max. :24.80 Max. :82.00 Max. :3.000
name
amc matador : 5
ford pinto : 5
toyota corolla : 5
amc gremlin : 4
amc hornet : 4
chevrolet chevette: 4
(Other) :365
b:探究mpg01与其他特征之间的关系
{r
cor(Auto[,-9])
pairs(Auto) # doesn't work well since mpg01 is 0 or 1
输出结果:
cor(Auto[,-9])
{r}
mpg cylinders displacement horsepower weight
mpg 1.0000000 -0.7776175 -0.8051269 -0.7784268 -0.8322442
cylinders -0.7776175 1.0000000 0.9508233 0.8429834 0.8975273
displacement -0.8051269 0.9508233 1.0000000 0.8972570 0.9329944
horsepower -0.7784268 0.8429834 0.8972570 1.0000000 0.8645377
weight -0.8322442 0.8975273 0.9329944 0.8645377 1.0000000
acceleration 0.4233285 -0.5046834 -0.5438005 -0.6891955 -0.4168392
year 0.5805410 -0.3456474 -0.3698552 -0.4163615 -0.3091199
origin 0.5652088 -0.5689316 -0.6145351 -0.4551715 -0.5850054
mpg01 0.8369392 -0.7591939 -0.7534766 -0.6670526 -0.7577566
acceleration year origin mpg01
mpg 0.4233285 0.5805410 0.5652088 0.8369392
cylinders -0.5046834 -0.3456474 -0.5689316 -0.7591939
displacement -0.5438005 -0.3698552 -0.6145351 -0.7534766
horsepower -0.6891955 -0.4163615 -0.4551715 -0.6670526
weight -0.4168392 -0.3091199 -0.5850054 -0.7577566
acceleration 1.0000000 0.2903161 0.2127458 0.3468215
year 0.2903161 1.0000000 0.1815277 0.4299042
origin 0.2127458 0.1815277 1.0000000 0.5136984
mpg01 0.3468215 0.4299042 0.5136984 1.0000000
pairs(Auto)
分析:油耗与气缸、重量、排量、马力负相关。(当然是英里/小时)
Anti-correlated with cylinders, weight, displacement, horsepower.
(mpg, of course)
c:将数据集划分为训练集与测试集
{r}
train = (year %% 2 == 0) # if the year is even
test = !train
Auto.train = Auto[train,]
Auto.test = Auto[test,]
mpg01.test = mpg01[test]
d:LDA预测
{r}
# LDA
library(MASS)
lda.fit = lda(mpg01~cylinders+weight+displacement+horsepower,
data=Auto, subset=train)
lda.pred = predict(lda.fit, Auto.test)
mean(lda.pred$class != mpg01.test)
输出:
{r}
[1] 0.1263736
分析:测试错误率为 12.6%、12.6% test error rate.
预测
{r}
# LDA预测
lda.pred
输出结果
{r}
$class
[1] 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 0 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 1 0 0 0 0 0 0 0
Levels: 0 1
$posterior
0 1
30 0.0045910561 9.954089e-01
31 0.0065617312 9.934383e-01
32 0.0055306751 9.944693e-01
34 0.4700231377 5.299769e-01
35 0.9246391446 7.536086e-02
36 0.8994586798 1.005413e-01
37 0.9095449455 9.045505e-02
38 0.8886590688 1.113409e-01
39 0.9996075013 3.924987e-04
40 0.9997884093 2.115907e-04
41 0.9996158412 3.841588e-04
42 0.9995756128 4.243872e-04
43 0.9999566445 4.335551e-05
44 0.9999247439 7.525613e-05
45 0.9999781595 2.184047e-05
46 0.6845523698 3.154476e-01
47 0.0144177336 9.855823e-01
48 0.8842517121 1.157483e-01
49 0.8532696513 1.467303e-01
50 0.0062127971 9.937872e-01
51 0.0042266707 9.957733e-01
52 0.0053134766 9.946865e-01
53 0.0045957219 9.954043e-01
54 0.0021365551 9.978634e-01
55 0.0011643117 9.988357e-01
56 0.0027617173 9.972383e-01
57 0.0035131295 9.964869e-01
86 0.9993272140 6.727860e-04
87 0.9982707887 1.729211e-03
88 0.9994166130 5.833870e-04
89 0.9996020974 3.979026e-04
90 0.9987616453 1.238355e-03
91 0.9999364862 6.351381e-05
92 0.9998626849 1.373151e-04
93 0.9997924091 2.075909e-04
94 0.9997356947 2.643053e-04
95 0.9998207680 1.792320e-04
96 0.9998948108 1.051892e-04
97 0.9982614357 1.738564e-03
98 0.8082637585 1.917362e-01
99 0.8828691717 1.171308e-01
100 0.7160149184 2.839851e-01
101 0.7964168372 2.035832e-01
102 0.7146692448 2.853308e-01
103 0.0051825938 9.948174e-01
104 0.9999770863 2.291369e-05
105 0.9999582582 4.174182e-05
106 0.9999027355 9.726451e-05
107 0.9998079177 1.920823e-04
108 0.5988520215 4.011480e-01
109 0.0075512257 9.924488e-01
110 0.0140873470 9.859127e-01
111 0.0093708282 9.906292e-01
112 0.0006271827 9.993728e-01
113 0.0085329437 9.914671e-01
114 0.3353060101 6.646940e-01
115 0.0069543538 9.930456e-01
116 0.9995745243 4.254757e-04
117 0.9989773896 1.022610e-03
118 0.0038727359 9.961273e-01
119 0.0061505155 9.938495e-01
120 0.0191761103 9.808239e-01
121 0.0340153619 9.659846e-01
122 0.9956046786 4.395321e-03
123 0.0178011055 9.821989e-01
124 0.5449767863 4.550232e-01
125 0.9968340831 3.165917e-03
153 0.8901396280 1.098604e-01
154 0.9269990700 7.300093e-02
155 0.9535406793 4.645932e-02
156 0.8910079960 1.089920e-01
157 0.9999022002 9.779979e-05
158 0.9998721607 1.278393e-04
159 0.9998900080 1.099920e-04
160 0.9999349592 6.504077e-05
161 0.9817475104 1.825249e-02
162 0.9822415939 1.775841e-02
163 0.9662660098 3.373399e-02
164 0.9790103253 2.098967e-02
165 0.7443902358 2.556098e-01
166 0.9962798160 3.720184e-03
167 0.9935318245 6.468176e-03
168 0.0065845245 9.934155e-01
169 0.0256060393 9.743940e-01
170 0.6943705183 3.056295e-01
171 0.0238834085 9.761166e-01
172 0.0254731042 9.745269e-01
173 0.0084647415 9.915353e-01
174 0.0152263317 9.847737e-01
175 0.7663463969 2.336536e-01
176 0.0033119914 9.966880e-01
177 0.8798920740 1.201079e-01
178 0.0258574413 9.741426e-01
179 0.0671780329 9.328220e-01
180 0.0549156423 9.450844e-01
181 0.0169554063 9.830446e-01
182 0.0027560511 9.972439e-01
217 0.0048691169 9.951309e-01
218 0.0056241068 9.943759e-01
219 0.0028399014 9.971601e-01
220 0.0068330682 9.931669e-01
221 0.0034242383 9.965758e-01
222 0.9992097175 7.902825e-04
223 0.9997777693 2.222307e-04
224 0.9996642311 3.357689e-04
225 0.9998492845 1.507155e-04
226 0.9346200911 6.537991e-02
227 0.9207223981 7.927760e-02
228 0.9621407741 3.785923e-02
229 0.9470647910 5.293521e-02
230 0.9994764827 5.235173e-04
231 0.9995039180 4.960820e-04
232 0.9995625862 4.374138e-04
233 0.9998048110 1.951890e-04
234 0.0028877526 9.971122e-01
235 0.0322766192 9.677234e-01
236 0.0090077095 9.909923e-01
237 0.0337949867 9.662050e-01
238 0.0054141298 9.945859e-01
239 0.0041576296 9.958424e-01
240 0.0040582954 9.959417e-01
241 0.0066630758 9.933369e-01
242 0.6576528019 3.423472e-01
243 0.0145988174 9.854012e-01
244 0.0032352022 9.967648e-01
281 0.8421745935 1.578254e-01
282 0.7985809380 2.014191e-01
283 0.0530394374 9.469606e-01
284 0.8977826055 1.022174e-01
285 0.8961554200 1.038446e-01
286 0.9993025736 6.974264e-04
287 0.9989955407 1.004459e-03
288 0.9994217829 5.782171e-04
289 0.9992001292 7.998708e-04
290 0.9998011701 1.988299e-04
291 0.9995556216 4.443784e-04
292 0.9986607432 1.339257e-03
293 0.9992428529 7.571471e-04
294 0.0031312461 9.968688e-01
295 0.0041211977 9.958788e-01
296 0.0025623025 9.974377e-01
297 0.0305312353 9.694688e-01
298 0.7985134354 2.014866e-01
299 0.9994452247 5.547753e-04
300 0.1705698782 8.294301e-01
301 0.9986514799 1.348520e-03
302 0.0078216014 9.921784e-01
303 0.0066201649 9.933798e-01
304 0.0047968503 9.952031e-01
305 0.0064161291 9.935839e-01
306 0.0248345809 9.751654e-01
307 0.3933926435 6.066074e-01
308 0.4799314963 5.200685e-01
309 0.0170675289 9.829325e-01
339 0.0155152020 9.844848e-01
340 0.0245026156 9.754974e-01
341 0.0202427011 9.797573e-01
342 0.5225253302 4.774747e-01
343 0.0109538014 9.890462e-01
344 0.0022460963 9.977539e-01
345 0.0029999889 9.970000e-01
346 0.0022007869 9.977992e-01
347 0.0053030344 9.946970e-01
348 0.0041265903 9.958734e-01
349 0.0055546032 9.944454e-01
350 0.0040204006 9.959796e-01
351 0.0092711452 9.907289e-01
352 0.0051271359 9.948729e-01
353 0.0156331189 9.843669e-01
354 0.0070629716 9.929370e-01
356 0.0074023063 9.925977e-01
357 0.0117799660 9.882200e-01
358 0.0182332197 9.817668e-01
359 0.0301638639 9.698361e-01
360 0.1675777688 8.324222e-01
361 0.8981173412 1.018827e-01
362 0.6412669001 3.587331e-01
363 0.6550586441 3.449414e-01
364 0.9115022704 8.849773e-02
365 0.9992934606 7.065394e-04
366 0.8264574559 1.735425e-01
367 0.9498089617 5.019104e-02
$x
LD1
30 1.65155995
31 1.52899567
32 1.58769048
34 -0.14325405
35 -1.03986617
36 -0.93205729
37 -0.97194222
38 -0.89311410
39 -2.86078063
40 -3.07171525
41 -2.86811332
42 -2.83411096
43 -3.61278415
44 -3.42456709
45 -3.84679996
46 -0.44864383
47 1.25762519
48 -0.87816827
49 -0.78505050
50 1.54776443
51 1.67990756
52 1.60143808
53 1.65121168
54 1.91345441
55 2.12096870
56 1.82564709
57 1.74325810
86 -2.67676857
87 -2.35423701
88 -2.72545831
89 -2.85611202
90 -2.46835353
91 -3.48246665
92 -3.21930370
93 -3.07822981
94 -2.99577803
95 -3.12837027
96 -3.31027311
97 -2.35239282
98 -0.67525532
99 -0.87358197
100 -0.49983888
101 -0.64975465
102 -0.49758349
103 1.60999486
104 -3.83042818
105 -3.62572971
106 -3.33700749
107 -3.10473431
108 -0.32097318
109 1.48072027
110 1.26565121
111 1.40641345
112 2.33228716
113 1.43866914
114 0.04930967
115 1.50902753
116 -2.83323636
117 -2.53375773
118 1.70987536
119 1.55122437
120 1.15863691
121 0.95782768
122 -2.03494818
123 1.18450820
124 -0.24579176
125 -2.14734418
153 -0.89825092
154 -1.05159436
155 -1.21545358
156 -0.90129203
157 -3.33513424
158 -3.24371021
159 -3.29503411
160 -3.47435824
161 -1.54425679
162 -1.55379421
163 -1.32921343
164 -1.49561650
165 -0.54902968
166 -2.09209475
167 -1.90238034
168 1.52780438
169 1.05770527
170 -0.46429505
171 1.08207661
172 1.05952825
173 1.44143140
174 1.23872207
175 -0.58960200
176 1.76344832
177 -0.86386321
178 1.05428279
179 0.71364946
180 0.78689187
181 1.20141357
182 1.82634996
217 1.63139620
218 1.58194112
219 1.81609281
220 1.51507381
221 1.75203505
222 -2.62179417
223 -3.05496745
224 -2.91407795
225 -3.18752026
226 -1.09201774
227 -1.02112532
228 -1.28837968
229 -1.16859307
230 -2.76243326
231 -2.78081364
232 -2.82378826
233 -3.09925755
234 1.81037382
235 0.97634834
236 1.42002633
237 0.96012380
238 1.59499906
239 1.68555201
240 1.69383903
241 1.52373007
242 -0.40703413
243 1.25330270
244 1.77148053
281 -0.75570644
282 -0.65432806
283 0.79943264
284 -0.92577826
285 -0.91976905
286 -2.66448423
287 -2.53987604
288 -2.72849796
289 -2.61767507
290 -3.09294883
291 -2.81839474
292 -2.44158583
293 -2.63642369
294 1.78266262
295 1.68856825
296 1.85129340
297 0.99593632
298 -0.65418485
299 -2.74263046
300 0.35554813
301 -2.43923020
302 1.46862108
303 1.52594983
304 1.63652424
305 1.53670393
306 1.06841567
307 -0.03642437
308 -0.15681475
309 1.19912522
339 1.23220783
340 1.07312457
341 1.13979216
342 -0.21499681
343 1.35259820
344 1.89635306
345 1.79732221
346 1.90332352
347 1.60211303
348 1.68812011
349 1.58620891
350 1.69705377
351 1.41009769
352 1.61368559
353 1.22958296
354 1.50370098
356 1.48756928
357 1.32749691
358 1.17617243
359 1.00019709
360 0.36281697
361 -0.92702493
362 -0.38246687
363 -0.40310889
364 -0.98014188
365 -2.66005059
366 -0.71687769
367 -1.18774787
e:QDA预测
{r}
# QDA
qda.fit = qda(mpg01~cylinders+weight+displacement+horsepower,
data=Auto, subset=train)
qda.pred = predict(qda.fit, Auto.test)
mean(qda.pred$class != mpg01.test)
输出:
{r}
[1] 0.1318681
分析:测试错误率为 13.2%、13.2% test error rate.
预测
{r}
qda.pred
输出预测结果
{r}
$class
[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 0 0 0 0 0 0 0 0
Levels: 0 1
$posterior
0 1
30 0.003141092 9.968589e-01
31 0.056171335 9.438287e-01
32 0.006103666 9.938963e-01
34 0.999999765 2.349852e-07
35 0.999937089 6.291146e-05
36 0.999989341 1.065852e-05
37 0.999994439 5.561196e-06
38 0.999932571 6.742860e-05
39 1.000000000 4.597849e-24
40 1.000000000 4.731213e-28
41 1.000000000 2.434787e-23
42 1.000000000 4.183000e-22
43 1.000000000 5.316091e-25
44 1.000000000 8.913898e-26
45 1.000000000 5.587745e-25
46 0.999999979 2.146565e-08
47 0.029185430 9.708146e-01
48 0.999991832 8.168062e-06
49 0.999998050 1.950150e-06
50 0.007101010 9.928990e-01
51 0.007801693 9.921983e-01
52 0.003922023 9.960780e-01
53 0.002240972 9.977590e-01
54 0.001454622 9.985454e-01
55 0.001415413 9.985846e-01
56 0.004134561 9.958654e-01
57 0.001838639 9.981614e-01
86 1.000000000 1.209888e-25
87 1.000000000 1.790860e-22
88 1.000000000 1.742816e-23
89 1.000000000 1.976818e-21
90 1.000000000 1.133940e-22
91 1.000000000 2.730634e-30
92 1.000000000 3.121346e-26
93 1.000000000 4.402256e-23
94 1.000000000 3.363587e-22
95 1.000000000 1.451230e-36
96 1.000000000 3.483441e-38
97 1.000000000 9.817658e-29
98 0.999920434 7.956635e-05
99 0.999992035 7.965200e-06
100 0.999987187 1.281309e-05
101 0.999999392 6.082812e-07
102 0.999647660 3.523397e-04
103 0.012946481 9.870535e-01
104 1.000000000 4.866865e-24
105 1.000000000 6.183913e-25
106 1.000000000 6.926682e-24
107 1.000000000 7.420757e-25
108 0.999997681 2.318836e-06
109 0.004780299 9.952197e-01
110 0.029895878 9.701041e-01
111 0.005948967 9.940510e-01
112 0.037590570 9.624094e-01
113 0.005320933 9.946791e-01
114 0.999878312 1.216880e-04
115 0.004682893 9.953171e-01
116 1.000000000 4.821044e-23
117 1.000000000 7.075589e-39
118 0.006647799 9.933522e-01
119 0.004926801 9.950732e-01
120 0.010969836 9.890302e-01
121 0.138803726 8.611963e-01
122 1.000000000 9.558337e-25
123 0.031977981 9.680220e-01
124 0.999997323 2.677441e-06
125 1.000000000 9.234608e-30
153 0.999906344 9.365611e-05
154 0.999981634 1.836561e-05
155 0.999998898 1.101972e-06
156 0.999999495 5.051863e-07
157 1.000000000 3.429869e-26
158 1.000000000 3.196808e-22
159 1.000000000 6.209005e-23
160 1.000000000 1.727539e-22
161 0.999998645 1.354847e-06
162 0.999994889 5.111083e-06
163 0.999988418 1.158182e-05
164 0.999998415 1.584656e-06
165 0.999979973 2.002731e-05
166 1.000000000 6.484026e-20
167 1.000000000 4.932829e-23
168 0.002669363 9.973306e-01
169 0.012996380 9.870036e-01
170 0.999990433 9.566578e-06
171 0.015692737 9.843073e-01
172 0.011621765 9.883782e-01
173 0.005173678 9.948263e-01
174 0.008846562 9.911534e-01
175 0.999952071 4.792901e-05
176 0.001784978 9.982150e-01
177 0.999952865 4.713531e-05
178 0.025293225 9.747068e-01
179 0.239723948 7.602761e-01
180 0.164092158 8.359078e-01
181 0.056211513 9.437885e-01
182 0.004362813 9.956372e-01
217 0.002439806 9.975602e-01
218 0.003456614 9.965434e-01
219 0.001980983 9.980190e-01
220 0.008572793 9.914272e-01
221 0.001726118 9.982739e-01
222 1.000000000 9.491998e-22
223 1.000000000 8.835569e-24
224 1.000000000 8.567283e-22
225 1.000000000 4.246660e-22
226 0.999979582 2.041844e-05
227 0.999936669 6.333074e-05
228 0.999985848 1.415207e-05
229 0.999983425 1.657542e-05
230 1.000000000 6.180664e-31
231 1.000000000 9.456270e-25
232 1.000000000 4.140062e-31
233 1.000000000 1.452212e-22
234 0.002454168 9.975458e-01
235 0.022538609 9.774614e-01
236 0.004102509 9.958975e-01
237 0.014068151 9.859318e-01
238 0.003106094 9.968939e-01
239 0.002472400 9.975276e-01
240 0.002394298 9.976057e-01
241 0.002785093 9.972149e-01
242 0.999994247 5.752927e-06
243 0.025967101 9.740329e-01
244 0.833456636 1.665434e-01
281 0.999957877 4.212339e-05
282 0.999748165 2.518354e-04
283 0.027650266 9.723497e-01
284 0.999950875 4.912456e-05
285 0.999924718 7.528187e-05
286 1.000000000 6.627081e-21
287 1.000000000 7.103683e-21
288 1.000000000 2.351777e-23
289 1.000000000 1.745734e-21
290 1.000000000 7.634047e-23
291 1.000000000 4.492817e-23
292 1.000000000 6.719297e-21
293 1.000000000 4.012801e-25
294 0.001726392 9.982736e-01
295 0.002100703 9.978993e-01
296 0.003000331 9.969997e-01
297 0.016873672 9.831263e-01
298 0.997122381 2.877619e-03
299 1.000000000 4.593746e-23
300 0.790761916 2.092381e-01
301 1.000000000 4.280388e-20
302 0.003344246 9.966558e-01
303 0.003079015 9.969210e-01
304 0.002530531 9.974695e-01
305 0.003025689 9.969743e-01
306 0.024014889 9.759851e-01
307 0.999930210 6.979030e-05
308 0.999929412 7.058841e-05
309 0.036310870 9.636891e-01
339 0.009558216 9.904418e-01
340 0.029247669 9.707523e-01
341 0.049853045 9.501470e-01
342 0.999872127 1.278727e-04
343 0.012435098 9.875649e-01
344 0.001794496 9.982055e-01
345 0.001803231 9.981968e-01
346 0.001749372 9.982506e-01
347 0.002538754 9.974612e-01
348 0.002122605 9.978774e-01
349 0.003111854 9.968881e-01
350 0.001971181 9.980288e-01
351 0.005340345 9.946597e-01
352 0.002765281 9.972347e-01
353 0.015826666 9.841733e-01
354 0.002875981 9.971240e-01
356 0.003013826 9.969862e-01
357 0.004543917 9.954561e-01
358 0.013761229 9.862388e-01
359 0.019417090 9.805829e-01
360 0.676874483 3.231255e-01
361 0.999999997 3.064300e-09
362 0.999982918 1.708187e-05
363 0.999999878 1.220934e-07
364 0.999939795 6.020489e-05
365 1.000000000 6.847870e-24
366 0.999787966 2.120335e-04
367 0.999979705 2.029473e-05
f:Logistic Regression预测
{r}
# Logistic regression
glm.fit = glm(mpg01~cylinders+weight+displacement+horsepower,
data=Auto,
family=binomial,
subset=train)
glm.probs = predict(glm.fit, Auto.test, type="response")
glm.pred = rep(0, length(glm.probs))
glm.pred[glm.probs > 0.5] = 1
mean(glm.pred != mpg01.test)
输出:
{r}
[1] 0.1208791
分析:测试错误率为 12.1%、12.1% test error rate.
预测
{r}
glm.pred
输出结果
{r}
[1] 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 0 1 0 0 1 1 1 0 0 1 0 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 1 0 0 0 0 0 0 0
g:KNN预测(k=1 or 10 or 100)
g.a k=1情况:
{r}
library(class)
train.X = cbind(cylinders, weight, displacement, horsepower)[train,]
test.X = cbind(cylinders, weight, displacement, horsepower)[test,]
train.mpg01 = mpg01[train]
{r}
#设置随机种子
set.seed(1)
{r}
# KNN(k=1)
knn.pred = knn(train.X, test.X, train.mpg01, k=1)
mean(knn.pred != mpg01.test)
输出1:(K=1时)
{r}
[1] 0.1538462
分析:k=1时, 15.4% test error rate.
预测
{r}
knn.pred
输出结果
{r}
[1] 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 1 0 1 0 0 0 0 0
Levels: 0 1
g.b k=10情况:
{r}
# KNN(k=10)
knn2.pred = knn(train.X, test.X, train.mpg01, k=10)
mean(knn2.pred != mpg01.test)
输出2:k=10时
{r}
[1] 0.1648352
分析:k=10时, 16.5% test error rate.
预测
{r}
knn2.pred
输出结果
{r}
[1] 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 0 0 1 1 0 0 0 0
Levels: 0 1
g.c k=100情况:
{r}
# KNN(k=100)
knn3.pred = knn(train.X, test.X, train.mpg01, k=100)
mean(knn3.pred != mpg01.test)
输出3:k=100时
{r}
[1] 0.1428571
分析:k=100时, 14.3% test error rate.
预测
{r}
knn3.pred
输出结果
{r}
[1] 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 0 0 1 1 0 0 0 0
Levels: 0 1
g.d k=9时:
{r}
# KNN(k=9)
knn4.pred = knn(train.X, test.X, train.mpg01, k=9)
mean(knn4.pred != mpg01.test)
输出4:k=9时
{r}
[1] 0.1593407
分析:k=9时, 15.9% test error rate.
预测
{r}
knn4.pred
输出结果
{r}
[1] 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 0 0 1 1 0 0 0 0
Levels: 0 1
g.e k=99时:
{r}
# KNN(k=99)
knn5.pred = knn(train.X, test.X, train.mpg01, k=99)
mean(knn5.pred != mpg01.test)
输出5:k=99时
{r}
[1] 0.1428571
分析:k=99时, 14.28% test error rate.
预测
{r}
knn5.pred
输出结果
{r}
[1] 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0
[35] 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 1 1 0 0
[69] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1
[103] 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0
[137] 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[171] 1 1 1 1 0 0 1 1 0 0 0 0
Levels: 0 1
绘制k关于test error rate的图像
{r}
#绘制k关于test error rate的图
knn.error = rep(0, 100)
for (i in 1:100) {
knn.pred = knn(train.X, test.X, train.mpg01, k=i)
knn.error[i] = mean(knn.pred != mpg01.test)
}
plot(1:100, knn.error, type="l")
输出图像
预测错误率比较分析:
K of 100 seems to perform the best. 100 nearest neighbors.