一、COCO128 数据集
我们以最近大热的YOLOv8为例,回顾一下之前的安装过程:
python
%pip install ultralytics
import ultralytics
ultralytics.checks()
这里选择训练的数据集为:COCO128
COCO128是一个小型教程数据集,由COCOtrain2017中的前128个图像组成。
在YOLO中自带的coco128.yaml文件:
1)可选的用于自动下载的下载命令/URL,
2)指向培训图像目录的路径(或指向带有培训图像列表的*.txt文件的路径),
3)与验证图像相同,
4)类数,
5)类名列表:
python
# download command/URL (optional)
download: https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip
# train and val data as 1) directory: path/images/, 2) file: path/images.txt, or 3) list: [path1/images/, path2/images/]
train: ../coco128/images/train2017/
val: ../coco128/images/train2017/
# number of classes
nc: 80
# class names
names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
二、训练过程
python
!yolo train model = yolov8n.pt data = coco128.yaml epochs = 10 imgsz = 640
训练过程为:
python
from n params module arguments
0 -1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
1 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
2 -1 1 7360 ultralytics.nn.modules.block.C2f [32, 32, 1, True]
3 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
4 -1 2 49664 ultralytics.nn.modules.block.C2f [64, 64, 2, Tr
ue]
5 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2]
6 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True]
7 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2]
8 -1 1 460288 ultralytics.nn.modules.block.C2f [256, 256, 1, True]
9 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 1 37248 ultralytics.nn.modules.block.C2f [192, 64, 1]
16 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 1 123648 ultralytics.nn.modules.block.C2f [192, 128, 1]
19 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2]
20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1]
22 [15, 18, 21] 1 897664 ultralytics.nn.modules.head.Detect [80, [64, 128, 256]]
Model summary: 225 layers, 3157200 parameters, 3157184 gradients
python
Transferred 355/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train', view at http://localhost:6006/
AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n...
AMP: checks passed ✅
train: Scanning /kaggle/working/datasets/coco128/labels/train2017.cache... 126 i
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
val: Scanning /kaggle/working/datasets/coco128/labels/train2017.cache... 126 ima
Plotting labels to runs/detect/train/labels.jpg...
optimizer: AdamW(lr=0.000119, momentum=0.9) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0)
Image sizes 640 train, 640 val
Using 2 dataloader workers
Logging results to runs/detect/train
Starting training for 10 epochs...
Closing dataloader mosaic
albumentations: Blur(p=0.01, blur_limit=(3, 7)), MedianBlur(p=0.01, blur_limit=(3, 7)), ToGray(p=0.01), CLAHE(p=0.01, clip_limit=(1, 4.0), tile_grid_size=(8, 8))
python
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 2.61G 1.153 1.398 1.192 81 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.688 0.506 0.61 0.446
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
2/10 2.56G 1.142 1.345 1.202 121 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.678 0.525 0.63 0.456
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
3/10 2.57G 1.147 1.25 1.175 108 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.656 0.548 0.64 0.466
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
4/10 2.57G 1.149 1.287 1.177 116 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.684 0.568 0.654 0.482
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
5/10 2.57G 1.169 1.233 1.207 68 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.664 0.586 0.668 0.491
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
6/10 2.57G 1.139 1.231 1.177 95 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.66 0.613 0.677 0.5
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
7/10 2.57G 1.134 1.211 1.181 115 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.649 0.631 0.683 0.504
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
8/10 2.57G 1.114 1.194 1.178 71 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.664 0.634 0.69 0.513
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
9/10 2.57G 1.117 1.127 1.148 142 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.624 0.671 0.697 0.52
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
10/10 2.57G 1.085 1.133 1.172 104 640: 1
Class Images Instances Box(P R mAP50 m
all 128 929 0.631 0.676 0.704 0.522
python
10 epochs completed in 0.018 hours.
Optimizer stripped from runs/detect/train/weights/last.pt, 6.5MB
Optimizer stripped from runs/detect/train/weights/best.pt, 6.5MB
Validating runs/detect/train/weights/best.pt...
Ultralytics YOLOv8.0.128 🚀 Python-3.10.10 torch-2.0.0 CUDA:0 (Tesla P100-PCIE-16GB, 16281MiB)
Model summary (fused): 168 layers, 3151904 parameters, 0 gradients
python
Class Images Instances Box(P R mAP50 m
all 128 929 0.629 0.677 0.704 0.523
person 128 254 0.763 0.721 0.778 0.569
bicycle 128 6 0.765 0.333 0.391 0.321
car 128 46 0.487 0.217 0.322 0.192
motorcycle 128 5 0.613 0.8 0.906 0.732
airplane 128 6 0.842 1 0.972 0.809
bus 128 7 0.832 0.714 0.712 0.61
train 128 3 0.52 1 0.995 0.858
truck 128 12 0.597 0.5 0.547 0.373
boat 128 6 0.526 0.167 0.448 0.328
traffic light 128 14 0.471 0.214 0.184 0.145
stop sign 128 2 0.671 1 0.995 0.647
bench 128 9 0.675 0.695 0.72 0.489
bird 128 16 0.936 0.921 0.961 0.67
cat 128 4 0.818 1 0.995 0.772
dog 128 9 0.68 0.889 0.908 0.722
horse 128 2 0.441 1 0.828 0.497
elephant 128 17 0.742 0.848 0.933 0.71
bear 128 1 0.461 1 0.995 0.995
zebra 128 4 0.85 1 0.995 0.972
giraffe 128 9 0.824 1 0.995 0.772
backpack 128 6 0.596 0.333 0.394 0.257
umbrella 128 18 0.564 0.722 0.681 0.429
handbag 128 19 0.635 0.185 0.326 0.178
tie 128 7 0.671 0.714 0.758 0.522
suitcase 128 4 0.687 1 0.945 0.603
frisbee 128 5 0.52 0.8 0.799 0.689
skis 128 1 0.694 1 0.995 0.497
snowboard 128 7 0.499 0.714 0.732 0.589
sports ball 128 6 0.747 0.494 0.573 0.342
kite 128 10 0.539 0.5 0.504 0.181
baseball bat 128 4 0.595 0.5 0.509 0.253
baseball glove 128 7 0.808 0.429 0.431 0.318
skateboard 128 5 0.493 0.6 0.609 0.465
tennis racket 128 7 0.451 0.286 0.446 0.274
bottle 128 18 0.4 0.389 0.365 0.257
wine glass 128 16 0.597 0.557 0.675 0.366
cup 128 36 0.586 0.389 0.465 0.338
fork 128 6 0.582 0.167 0.306 0.234
knife 128 16 0.621 0.625 0.669 0.405
spoon 128 22 0.525 0.364 0.41 0.227
bowl 128 28 0.657 0.714 0.719 0.584
banana 128 1 0.319 1 0.497 0.0622
sandwich 128 2 0.812 1 0.995 0.995
orange 128 4 0.784 1 0.895 0.594
broccoli 128 11 0.431 0.273 0.339 0.26
carrot 128 24 0.553 0.833 0.801 0.504
hot dog 128 2 0.474 1 0.995 0.946
pizza 128 5 0.736 1 0.995 0.882
donut 128 14 0.574 1 0.929 0.85
cake 128 4 0.769 1 0.995 0.89
chair 128 35 0.503 0.571 0.542 0.307
couch 128 6 0.526 0.667 0.805 0.612
potted plant 128 14 0.479 0.786 0.784 0.545
bed 128 3 0.714 1 0.995 0.83
dining table 128 13 0.451 0.615 0.552 0.437
toilet 128 2 1 0.942 0.995 0.946
tv 128 2 0.622 1 0.995 0.846
laptop 128 3 1 0.452 0.863 0.738
mouse 128 2 1 0 0.0459 0.00459
remote 128 8 0.736 0.5 0.62 0.527
cell phone 128 8 0.0541 0.027 0.0731 0.043
microwave 128 3 0.773 0.667 0.913 0.807
oven 128 5 0.442 0.483 0.433 0.336
sink 128 6 0.378 0.167 0.336 0.231
refrigerator 128 5 0.662 0.786 0.778 0.616
book 128 29 0.47 0.336 0.402 0.23
clock 128 9 0.76 0.778 0.884 0.762
vase 128 2 0.428 1 0.828 0.745
scissors 128 1 0.911 1 0.995 0.256
teddy bear 128 21 0.551 0.667 0.805 0.515
toothbrush 128 5 0.768 1 0.995 0.65
Speed: 3.4ms preprocess, 1.9ms inference, 0.0ms loss, 2.4ms postprocess per image
Results saved to runs/detect/train
三、验证过程
python
!yolo val model = yolov8n.pt data = coco128.yaml
输出的结果为:
python
Class Images Instances Box(P R mAP50 m
all 128 929 0.64 0.537 0.605 0.446
person 128 254 0.797 0.677 0.764 0.538
bicycle 128 6 0.514 0.333 0.315 0.264
car 128 46 0.813 0.217 0.273 0.168
motorcycle 128 5 0.687 0.887 0.898 0.685
airplane 128 6 0.82 0.833 0.927 0.675
bus 128 7 0.491 0.714 0.728 0.671
train 128 3 0.534 0.667 0.706 0.604
truck 128 12 1 0.332 0.473 0.297
boat 128 6 0.226 0.167 0.316 0.134
traffic light 128 14 0.734 0.2 0.202 0.139
stop sign 128 2 1 0.992 0.995 0.701
bench 128 9 0.839 0.582 0.62 0.365
bird 128 16 0.921 0.728 0.864 0.51
cat 128 4 0.875 1 0.995 0.791
dog 128 9 0.603 0.889 0.785 0.585
horse 128 2 0.597 1 0.995 0.518
elephant 128 17 0.849 0.765 0.9 0.679
bear 128 1 0.593 1 0.995 0.995
zebra 128 4 0.848 1 0.995 0.965
giraffe 128 9 0.72 1 0.951 0.722
backpack 128 6 0.589 0.333 0.376 0.232
umbrella 128 18 0.804 0.5 0.643 0.414
handbag 128 19 0.424 0.0526 0.165 0.0889
tie 128 7 0.804 0.714 0.674 0.476
suitcase 128 4 0.635 0.883 0.745 0.534
frisbee 128 5 0.675 0.8 0.759 0.688
skis 128 1 0.567 1 0.995 0.497
snowboard 128 7 0.742 0.714 0.747 0.5
sports ball 128 6 0.716 0.433 0.485 0.278
kite 128 10 0.817 0.45 0.569 0.184
baseball bat 128 4 0.551 0.25 0.353 0.175
baseball glove 128 7 0.624 0.429 0.429 0.293
skateboard 128 5 0.846 0.6 0.6 0.41
tennis racket 128 7 0.726 0.387 0.487 0.33
bottle 128 18 0.448 0.389 0.376 0.208
wine glass 128 16 0.743 0.362 0.584 0.333
cup 128 36 0.58 0.278 0.404 0.29
fork 128 6 0.527 0.167 0.246 0.184
knife 128 16 0.564 0.5 0.59 0.36
spoon 128 22 0.597 0.182 0.328 0.19
bowl 128 28 0.648 0.643 0.618 0.491
banana 128 1 0 0 0.124 0.0379
sandwich 128 2 0.249 0.5 0.308 0.308
orange 128 4 1 0.31 0.995 0.623
broccoli 128 11 0.374 0.182 0.249 0.203
carrot 128 24 0.648 0.458 0.572 0.362
hot dog 128 2 0.351 0.553 0.745 0.721
pizza 128 5 0.644 1 0.995 0.843
donut 128 14 0.657 1 0.94 0.864
cake 128 4 0.618 1 0.945 0.845
chair 128 35 0.506 0.514 0.442 0.239
couch 128 6 0.463 0.5 0.706 0.555
potted plant 128 14 0.65 0.643 0.711 0.472
bed 128 3 0.698 0.667 0.789 0.625
dining table 128 13 0.432 0.615 0.485 0.366
toilet 128 2 0.615 0.5 0.695 0.676
tv 128 2 0.373 0.62 0.745 0.696
laptop 128 3 1 0 0.451 0.361
mouse 128 2 1 0 0.0625 0.00625
remote 128 8 0.843 0.5 0.605 0.529
cell phone 128 8 0 0 0.0549 0.0393
microwave 128 3 0.435 0.667 0.806 0.718
oven 128 5 0.412 0.4 0.339 0.27
sink 128 6 0.35 0.167 0.182 0.129
refrigerator 128 5 0.589 0.4 0.604 0.452
book 128 29 0.629 0.103 0.346 0.178
clock 128 9 0.788 0.83 0.875 0.74
vase 128 2 0.376 1 0.828 0.795
scissors 128 1 1 0 0.249 0.0746
teddy bear 128 21 0.877 0.333 0.591 0.394
toothbrush 128 5 0.743 0.6 0.638 0.374
Speed: 1.0ms preprocess, 8.5ms inference, 0.0ms loss, 1.6ms postprocess per image
Results saved to runs/detect/val
可视化的结果为: