Azure - 机器学习:使用自动化机器学习训练计算机视觉模型的数据架构

目录

了解如何设置Azure中 JSONL 文件格式,以便在训练和推理期间在计算机视觉任务的自动化 ML 实验中使用数据。
关注TechLead,分享AI全维度知识。作者拥有10+年互联网服务架构、AI产品研发经验、团队管理经验,同济本复旦硕,复旦机器人智能实验室成员,阿里云认证的资深架构师,项目管理专业人士,上亿营收AI产品研发负责人。

一、用于训练的数据架构

Azure 机器学习的图像 AutoML 要求以 JSONL(JSON 行)格式准备输入图像数据。 本部分介绍多类图像分类、多标签图像分类、对象检测和实例分段的输入数据格式或架构。 我们还将提供最终训练或验证 JSON 行文件的示例。

图像分类(二进制/多类)

每个 JSON 行中的输入数据格式/架构:

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":"class_name",
}
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String "AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary "image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 图像类型(支持 Pillow 库中所有可用的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif","bmp", "tif", "tiff"} "jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer "400px" or 400
height 图像的高度
Optional, String or Positive Integer "200px" or 200
label 图像的类/标签
Required, String "cat"

多类图像分类的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details":{"format": "jpg", "width": "400px", "height": "258px"}, "label": "can"}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "397px", "height": "296px"}, "label": "milk_bottle"}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "1024px", "height": "768px"}, "label": "water_bottle"}

多标签图像分类

下面是每个 JSON 行中用于图像分类的输入数据格式/架构示例。

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":[
      "class_name_1",
      "class_name_2",
      "class_name_3",
      "...",
      "class_name_n"
        
   ]
}
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String "AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary "image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 图像类型(支持 Pillow 库中所有可用的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff"} "jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer "400px" or 400
height 图像的高度
Optional, String or Positive Integer "200px" or 200
label 图像中的类/标签列表
Required, List of Strings ["cat","dog"]

多标签图像分类的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details":{"format": "jpg", "width": "400px", "height": "258px"}, "label": ["can"]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "397px", "height": "296px"}, "label": ["can","milk_bottle"]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "1024px", "height": "768px"}, "label": ["carton","milk_bottle","water_bottle"]}

对象检测

下面是用于对象检测的示例 JSONL 文件。

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":[
      {
         "label":"class_name_1",
         "topX":"xmin/width",
         "topY":"ymin/height",
         "bottomX":"xmax/width",
         "bottomY":"ymax/height",
         "isCrowd":"isCrowd"
      },
      {
         "label":"class_name_2",
         "topX":"xmin/width",
         "topY":"ymin/height",
         "bottomX":"xmax/width",
         "bottomY":"ymax/height",
         "isCrowd":"isCrowd"
      },
      "..."
   ]
}

其中:

  • xmin = 边界框左上角的 x 坐标
  • ymin = 边界框左上角的 y 坐标
  • xmax = 边界框右下角的 x 坐标
  • ymax = 边界框右下角的 y 坐标
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String "AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary "image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 图像类型(支持 Pillow 库中提供的所有图像格式。但对于 YOLO,仅支持 opencv 允许的图像格式)
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff"} "jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer "499px" or 499
height 图像的高度
Optional, String or Positive Integer "665px" or 665
label(外部键) 边界框列表,其中每个框都是其左上方和右下方坐标的 label, topX, topY, bottomX, bottomY, isCrowd 字典
Required, List of dictionaries [{"label": "cat", "topX": 0.260, "topY": 0.406, "bottomX": 0.735, "bottomY": 0.701, "isCrowd": 0}]
label(内部键) 边界框中对象的类/标签
Required, String "cat"
topX 边界框左上角的 x 坐标与图像宽度的比率
Required, Float in the range [0,1] 0.260
topY 边界框左上角的 y 坐标与图像高度的比率
Required, Float in the range [0,1] 0.406
bottomX 边界框右下角的 x 坐标与图像宽度的比率
Required, Float in the range [0,1] 0.735
bottomY 边界框右下角的 y 坐标与图像高度的比率
Required, Float in the range [0,1] 0.701
isCrowd 指示边界框是否围绕对象群。 如果设置了此特殊标志,我们在计算指标时将跳过此特定边界框。
Optional, Bool 0

用于对象检测的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "can", "topX": 0.260, "topY": 0.406, "bottomX": 0.735, "bottomY": 0.701, "isCrowd": 0}]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "topX": 0.172, "topY": 0.153, "bottomX": 0.432, "bottomY": 0.659, "isCrowd": 0}, {"label": "milk_bottle", "topX": 0.300, "topY": 0.566, "bottomX": 0.891, "bottomY": 0.735, "isCrowd": 0}]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "topX": 0.0180, "topY": 0.297, "bottomX": 0.380, "bottomY": 0.836, "isCrowd": 0}, {"label": "milk_bottle", "topX": 0.454, "topY": 0.348, "bottomX": 0.613, "bottomY": 0.683, "isCrowd": 0}, {"label": "water_bottle", "topX": 0.667, "topY": 0.279, "bottomX": 0.841, "bottomY": 0.615, "isCrowd": 0}]}

实例分段

对于实例分段,自动化 ML 仅支持多边形作为输入和输出,不支持掩码。

下面是实例分段的示例 JSONL 文件。

{
   "image_url":"AmlDatastore://data_directory/../Image_name.image_format",
   "image_details":{
      "format":"image_format",
      "width":"image_width",
      "height":"image_height"
   },
   "label":[
      {
         "label":"class_name",
         "isCrowd":"isCrowd",
         "polygon":[["x1", "y1", "x2", "y2", "x3", "y3", "...", "xn", "yn"]]
      }
   ]
}
密钥 说明 示例
image_url Azure 机器学习数据存储中的图像位置
Required, String "AmlDatastore://data_directory/Image_01.jpg"
image_details 图像详细信息
Optional, Dictionary "image_details":{"format": "jpg", "width": "400px", "height": "258px"}
format 映像类型
Optional, String from {"jpg", "jpeg", "png", "jpe", "jfif", "bmp", "tif", "tiff" } "jpg" or "jpeg" or "png" or "jpe" or "jfif" or "bmp" or "tif" or "tiff"
width 图像的宽度
Optional, String or Positive Integer "499px" or 499
height 图像的高度
Optional, String or Positive Integer "665px" or 665
label(外部键) 掩码列表,其中每个掩码都是 label, isCrowd, polygon coordinates 的字典
Required, List of dictionaries [{"label": "can", "isCrowd": 0, "polygon": [[0.577, 0.689,
0.562, 0.681,
0.559, 0.686]]}]
label(内部键) 掩码中对象的类/标签
Required, String "cat"
isCrowd 指示掩码是否围绕对象群
Optional, Bool 0
polygon 对象的多边形坐标
Required, List of list for multiple segments of the same instance. Float values in the range [0,1] [[0.577, 0.689, 0.567, 0.689, 0.559, 0.686]]

实例分段的 JSONL 文件示例:

{"image_url": "AmlDatastore://image_data/Image_01.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "can", "isCrowd": 0, "polygon": [[0.577, 0.689, 0.567, 0.689, 0.559, 0.686, 0.380, 0.593, 0.304, 0.555, 0.294, 0.545, 0.290, 0.534, 0.274, 0.512, 0.2705, 0.496, 0.270, 0.478, 0.284, 0.453, 0.308, 0.432, 0.326, 0.423, 0.356, 0.415, 0.418, 0.417, 0.635, 0.493, 0.683, 0.507, 0.701, 0.518, 0.709, 0.528, 0.713, 0.545, 0.719, 0.554, 0.719, 0.579, 0.713, 0.597, 0.697, 0.621, 0.695, 0.629, 0.631, 0.678, 0.619, 0.683, 0.595, 0.683, 0.577, 0.689]]}]}
{"image_url": "AmlDatastore://image_data/Image_02.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "carton", "isCrowd": 0, "polygon": [[0.240, 0.65, 0.234, 0.654, 0.230, 0.647, 0.210, 0.512, 0.202, 0.403, 0.182, 0.267, 0.184, 0.243, 0.180, 0.166, 0.186, 0.159, 0.198, 0.156, 0.396, 0.162, 0.408, 0.169, 0.406, 0.217, 0.414, 0.249, 0.422, 0.262, 0.422, 0.569, 0.342, 0.569, 0.334, 0.572, 0.320, 0.585, 0.308, 0.624, 0.306, 0.648, 0.240, 0.657]]}, {"label": "milk_bottle",  "isCrowd": 0, "polygon": [[0.675, 0.732, 0.635, 0.731, 0.621, 0.725, 0.573, 0.717, 0.516, 0.717, 0.505, 0.720, 0.462, 0.722, 0.438, 0.719, 0.396, 0.719, 0.358, 0.714, 0.334, 0.714, 0.322, 0.711, 0.312, 0.701, 0.306, 0.687, 0.304, 0.663, 0.308, 0.630, 0.320, 0.596, 0.32, 0.588, 0.326, 0.579]]}]}
.
.
.
{"image_url": "AmlDatastore://image_data/Image_n.jpg", "image_details": {"format": "jpg", "width": "499px", "height": "666px"}, "label": [{"label": "water_bottle", "isCrowd": 0, "polygon": [[0.334, 0.626, 0.304, 0.621, 0.254, 0.603, 0.164, 0.605, 0.158, 0.602, 0.146, 0.602, 0.142, 0.608, 0.094, 0.612, 0.084, 0.599, 0.080, 0.585, 0.080, 0.539, 0.082, 0.536, 0.092, 0.533, 0.126, 0.530, 0.132, 0.533, 0.144, 0.533, 0.162, 0.525, 0.172, 0.525, 0.186, 0.521, 0.196, 0.521 ]]}, {"label": "milk_bottle", "isCrowd": 0, "polygon": [[0.392, 0.773, 0.380, 0.732, 0.379, 0.767, 0.367, 0.755, 0.362, 0.735, 0.362, 0.714, 0.352, 0.644, 0.352, 0.611, 0.362, 0.597, 0.40, 0.593, 0.444,  0.494, 0.588, 0.515, 0.585, 0.621, 0.588, 0.671, 0.582, 0.713, 0.572, 0.753 ]]}]}

二、用于推理的数据格式

在本部分中,我们将记录在使用部署的模型时进行预测所需的输入数据格式。 可以接受内容类型为 application/octet-stream 的任何上述图像格式。

输入格式

下面是使用特定于任务的模型终结点对任何任务生成预测所需的输入格式。 部署模型后,我们可以使用以下代码段来获取所有任务的预测。

# input image for inference
sample_image = './test_image.jpg'
# load image data
data = open(sample_image, 'rb').read()
# set the content type
headers = {'Content-Type': 'application/octet-stream'}
# if authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'
# make the request and display the response
response = requests.post(scoring_uri, data, headers=headers)

输出格式

根据任务类型,对模型终结点进行的预测遵循不同的结构。 本部分将探讨多类、多标签图像分类、对象检测和实例分段任务的输出数据格式。

图像分类

图像分类的终结点返回数据集中的所有标签及其在输入图像中的概率分数,格式如下:

{
   "filename":"/tmp/tmppjr4et28",
   "probs":[
      2.098e-06,
      4.783e-08,
      0.999,
      8.637e-06
   ],
   "labels":[
      "can",
      "carton",
      "milk_bottle",
      "water_bottle"
   ]
}
多标签图像分类

对于多标签图像分类,模型终结点返回标签及其概率。

{
   "filename":"/tmp/tmpsdzxlmlm",
   "probs":[
      0.997,
      0.960,
      0.982,
      0.025
   ],
   "labels":[
      "can",
      "carton",
      "milk_bottle",
      "water_bottle"
   ]
}
对象检测

对象检测模型返回多个框,其中包含缩放后的左上角和右下角坐标,以及框标签和置信度分数。

{
   "filename":"/tmp/tmpdkg2wkdy",
   "boxes":[
      {
         "box":{
            "topX":0.224,
            "topY":0.285,
            "bottomX":0.399,
            "bottomY":0.620
         },
         "label":"milk_bottle",
         "score":0.937
      },
      {
         "box":{
            "topX":0.664,
            "topY":0.484,
            "bottomX":0.959,
            "bottomY":0.812
         },
         "label":"can",
         "score":0.891
      },
      {
         "box":{
            "topX":0.423,
            "topY":0.253,
            "bottomX":0.632,
            "bottomY":0.725
         },
         "label":"water_bottle",
         "score":0.876
      }
   ]
}
实例分段

在实例分段中,输出包含多个框,其中包含缩放后的左上角和右下角坐标、标签、置信度和多边形(非掩码)。 此处,多边形值与我们在"架构"部分中讨论的格式相同。

{
   "filename":"/tmp/tmpi8604s0h",
   "boxes":[
      {
         "box":{
            "topX":0.679,
            "topY":0.491,
            "bottomX":0.926,
            "bottomY":0.810
         },
         "label":"can",
         "score":0.992,
         "polygon":[
            [
               0.82, 0.811, 0.771, 0.810, 0.758, 0.805, 0.741, 0.797, 0.735, 0.791, 0.718, 0.785, 0.715, 0.778, 0.706, 0.775, 0.696, 0.758, 0.695, 0.717, 0.698, 0.567, 0.705, 0.552, 0.706, 0.540, 0.725, 0.520, 0.735, 0.505, 0.745, 0.502, 0.755, 0.493
            ]
         ]
      },
      {
         "box":{
            "topX":0.220,
            "topY":0.298,
            "bottomX":0.397,
            "bottomY":0.601
         },
         "label":"milk_bottle",
         "score":0.989,
         "polygon":[
            [
               0.365, 0.602, 0.273, 0.602, 0.26, 0.595, 0.263, 0.588, 0.251, 0.546, 0.248, 0.501, 0.25, 0.485, 0.246, 0.478, 0.245, 0.463, 0.233, 0.442, 0.231, 0.43, 0.226, 0.423, 0.226, 0.408, 0.234, 0.385, 0.241, 0.371, 0.238, 0.345, 0.234, 0.335, 0.233, 0.325, 0.24, 0.305, 0.586, 0.38, 0.592, 0.375, 0.598, 0.365
            ]
         ]
      },
      {
         "box":{
            "topX":0.433,
            "topY":0.280,
            "bottomX":0.621,
            "bottomY":0.679
         },
         "label":"water_bottle",
         "score":0.988,
         "polygon":[
            [
               0.576, 0.680, 0.501, 0.680, 0.475, 0.675, 0.460, 0.625, 0.445, 0.630, 0.443, 0.572, 0.440, 0.560, 0.435, 0.515, 0.431, 0.501, 0.431, 0.433, 0.433, 0.426, 0.445, 0.417, 0.456, 0.407, 0.465, 0.381, 0.468, 0.327, 0.471, 0.318
            ]
         ]
      }
   ]
}

关注TechLead,分享AI全维度知识。作者拥有10+年互联网服务架构、AI产品研发经验、团队管理经验,同济本复旦硕,复旦机器人智能实验室成员,阿里云认证的资深架构师,项目管理专业人士,上亿营收AI产品研发负责人。

相关推荐
阿正的梦工坊16 分钟前
Pytorch详解 train() 和 eval() 模式会影响Layer Norm吗?(中英双语)
人工智能·pytorch·python
qq_2739002317 分钟前
pytorch 张量的unfold方法介绍
人工智能·pytorch·python
lyw_YTU_Sussex20 分钟前
深度学习day6|用pytorch实现VGG-16模型人脸识别
人工智能·pytorch·深度学习
四口鲸鱼爱吃盐21 分钟前
Pytorch | 利用MIG针对CIFAR10上的ResNet分类器进行对抗攻击
人工智能·pytorch·python·深度学习·计算机视觉
WongKyunban24 分钟前
机器学习、深度学习、神经网络之间的关系
深度学习·神经网络·机器学习
我爱学Python!1 小时前
大模型笔记!以LLAMA为例,快速入门LLM的推理过程
人工智能·深度学习·ai·自然语言处理·大模型·transformer·llama
少喝冰美式1 小时前
Llama + Dify,在你的电脑搭建一套AI工作流
人工智能·深度学习·ai·大模型·llm·llama·dify
KeyPan1 小时前
【视觉惯性SLAM:十一、ORB-SLAM2:跟踪线程】
人工智能·数码相机·算法·机器学习·计算机视觉
FreedomLeo12 小时前
Python机器学习笔记(十六、数据表示与特征工程-分类变量)
python·机器学习·数据表示与特征工程·分类变量·连续特征·分类特征
学术会议2 小时前
【火热征稿中-稳定检索】2025年计算机视觉、图像与数据管理国际会议 (CVIDM 2025)
大数据·人工智能·安全·计算机视觉