1.加载数据
- Dataset
- 提供一种方式去获取数据及其Iabel
- Dataloader
- 为后面的网络提供不同的数据形式
Dataset
1.如何获取每一个数据及其label
2.告诉我们总共有多少的数据
python
from torch.utils.data import Dataset
from PIL import Image
import os
class myData(Dataset):
def __init__(self,root_dir,label_dir):
self.root_dir=root_dir
self.label_dir=label_dir
self.path=os.path.join(self.root_dir,self.label_dir)
self.img_path=os.listdir(self.path)
def __getitem__(self, idx):
img_name=self.img_path[idx]
img_item_path=os.path.join(self.root_dir,self.label_dir,img_name)
img=Image.open(img_item_path)
label=self.label_dir
return img,label
def __len__(self):
return len(self.img_path)
root_dir="hymenoptera_data/train"
ants_label_dir="ants"
bees_label_dir="bees"
ants_dataset=myData(root_dir,ants_label_dir)
bees_dataset=myData(root_dir,bees_label_dir)
train_dataset=ants_dataset+bees_dataset
1.1批量创建label文件
把图片的label 生成以图片名为文件名的txt文档
python
import os
root_dir = 'hymenoptera_data/train'
target_dir = 'bees_images' #可换ants_images
img_path = os.listdir(os.path.join(root_dir, target_dir))
label = target_dir.split('_')[0]
out_dir = 'bees_labels' #可换ants_images
for i in img_path:
file_name = i.split('.jpg')[0]
with open(os.path.join(root_dir, out_dir,"{}.txt".format(file_name)),'w') as f:
f.write(label)
2.Tensorboard 的使用
2.1 add_scalar()------显示Tensorboard
python
def add_scalar(
self,
tag,
scalar_value,
global_step=None,
walltime=None,
new_style=False,
double_precision=False,
):
"""Add scalar data to summary.
Args:
tag (str): Data identifier
scalar_value (float or string/blobname): Value to save
global_step (int): Global step value to record
walltime (float): Optional override default walltime (time.time())
with seconds after epoch of event
new_style (boolean): Whether to use new style (tensor field) or old
style (simple_value field). New style could lead to faster data loading.
python
from torch.utils.tensorboard import SummaryWriter
writer=SummaryWriter("logs")
# writer.add_image()
for i in range(100):
writer.add_scalar("y=x",i,i)
writer.close()
终端中运行
python
tensorboard --logdir=logs
若很多窗口都叫6006就很不方便,这时候指定窗口名
python
tensorboard --logdir=logs --port=6007
若标签未变
python
from torch.utils.tensorboard import SummaryWriter
writer=SummaryWriter("logs")
# writer.add_image()
for i in range(100):
writer.add_scalar("y=2x",3*i,i)
writer.close()
说明每一个writer中也记录了上一个的writer
解决:①擦除TensorBoard的日志文件并终止该进程,但它不是首选的,因为它会破坏有关您训练的历史信息。
②可以让每个新的培训工作写入(顶级日志的)新的子目录目录。然后TensorBoard将把每个作业视为新的"运行",并创建一个很好的比较视图,以便您可以看到模型的不同小版本之间的培训有何不同。
注意:
子文件夹,也就是说创建新的
Summarywriter("新文件夹")
2.2 add_image()
python
def add_image(
self, tag, img_tensor, global_step=None, walltime=None, dataformats="CHW"
):
"""Add image data to summary.
Note that this requires the ``pillow`` package.
Args:
tag (str): Data identifier
img_tensor (torch.Tensor, numpy.ndarray, or string/blobname): Image data
global_step (int): Global step value to record
walltime (float): Optional override default walltime (time.time())
seconds after epoch of event
dataformats (str): Image data format specification of the form
CHW, HWC, HW, WH, etc.
由于img_tensor (torch.Tensor, numpy.ndarray, or string/blobname): Image data
需要如上四种类型的图像,所以需要
- 利用Opencv读取图片,获得numpy型图片数据
- 利用numpy.array(),对PIL图片进行转换
python
from torch.utils.tensorboard import SummaryWriter
import numpy as np
from PIL import Image
writer=SummaryWriter("logs")
image_path="data/train/bees_image/16838648_415acd9e3f.jpg"
img = Image.open(image_path)
img_array = np.array(img)
print(type(img_array))
print(img_array.shape)
writer.add_image("test",img_array,2,dataformats='HWC')
writer.close()
注意:dataformats='HWC'