label-studio 使用机器学习后端进行预标注

使用label-studio 进行数据标注，可以多人协作，标注人员无需安装额外标注工具，无需将原图片拷贝到标注人员那，比较方面。本次使用label-studio 版本 1.20.0,label-studio-sdk 版本1.0.18

1 首先，生成lagacy token ,点击organization ,点击右侧api token settings ,使能legacy tokens ,后保存，

然后在个人页面的右上角 Account &Settings 最后可以看到legacy token

legacy token 是使用机器学习后端的前提，同时说明下，legacy token 不需要收费也可以获得，

然后添加环境变量 vim ~/.bashrc

export LABEL_STUDIO_URL=***

export LABEL_STUDIO_API_KEY=legacy token

这里的api key 即是legacy token

source ~/.bashrc #更新环境变量

2 克隆机器学习后端项目

复制代码

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/label_studio_ml/examples/{MODEL_NAME}
docker-compose up

MODEL_NAME 为需要使用的模型名字，这里有很多，比如segment_anything_2_image、segment_anything_2_video、timeseries_segmenter、yolo ，因为使用目标检测，这里选择yolo

请注意这里启动的是默认的yolo 模型，如果要使用个性化训练的yolo ,需要继续向下

3 使用个性化训练的模型进行预标注

（1）创建虚拟环境（这里使用conda ）

复制代码

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/
pip install -e .

安装上对应的环境，requirements.txt 最后的包是label-studio-sdk @ git+https://github.com/HumanSignal/label-studio-sdk.git，如果此步安装比较慢或报错，可以使用abel-studio-sdk @ git+https://kkgithub.com/HumanSignal/label-studio-sdk.git 试试，成功率会大些，安装后label-studio-ml 版本2.0.1.dev0

（2）创建机器学习后端

复制代码

label-studio-ml create my_ml_backend

名字可以根据需要进行更改，这里保留默认名字，回车后，输出

=========================

Welcome to Label Studio ML!

==========================

You don't specify script path: by default, "/***/label-studio-ml-backend-master/label_studio_ml/default_configs/model.py" is used

Congratulations! ML Backend has been successfully initialized in ./my_ml_backend

Here are the next steps:

Try it out by running:

label-studio-ml start ./my_ml_backend

You should be able to connect to it in Label Studio project Settings > Machine Learning > Add Model and provide with the following URL: http://localhost:9090

Go to ./my_ml_backend/model.py and modify it as you wish:

predict() - define your prediction logic here
fit() - define your training logic here (optional)

Deploy your model with docker:

cd ./my_ml_backend

docker-compose up

Have fun! :)

可以看到，机器学习后端默认地址为http://localhost:9090，同时在项目里链接上这个模型地址，再者需要更改 ./my_ml_backend/model.py 的perdict() 函数来进行适配，先看下默认的perdict() 函数是什么

复制代码

class NewModel(LabelStudioMLBase):
    """Custom ML Backend model
    """
    
    def setup(self):
        """Configure any parameters of your model here
        """
        self.set("model_version", "0.0.1")

    def predict(self, tasks: List[Dict], context: Optional[Dict] = None, **kwargs) -> ModelResponse:
        """ Write your inference logic here
            :param tasks: [Label Studio tasks in JSON format](https://labelstud.io/guide/task_format.html)
            :param context: [Label Studio context in JSON format](https://labelstud.io/guide/ml_create#Implement-prediction-logic)
            :return model_response
                ModelResponse(predictions=predictions) with
                predictions: [Predictions array in JSON format](https://labelstud.io/guide/export.html#Label-Studio-JSON-format-of-annotated-tasks)
        """
        print(f'''\
        Run prediction on {tasks}
        Received context: {context}
        Project ID: {self.project_id}
        Label config: {self.label_config}
        Parsed JSON Label config: {self.parsed_label_config}
        Extra params: {self.extra_params}''')

可以看到，创建了一个类，NewModel，继承于LabelStudioMLBase，setupmodel 是准备这个类，predict 是使用model 进行预测，请注意这里要转化为label-studio 支持的json 格式，json 格式示例如下

{

"id": 1,

"created_at":"2021-03-09T21:52:49.513742Z",

"updated_at":"2021-03-09T22:16:08.746926Z",

"project":83,

"data": {

"image": "https://example.com/opensource/label-studio/1.jpg"

"annotations": [

{

"id": "1001",

"result": [

{

"from_name": "tag",

"id": "Dx_aB91ISN",

"source": "$image",

"to_name": "img",

"type": "rectanglelabels",

"value": {

"height": 10.458911419423693,

"rectanglelabels": [

"Moonwalker"

"rotation": 0,

"width": 12.4,

"x": 50.8,

"y": 5.869797225186766

}

"was_cancelled":false,

"ground_truth":false,

"created_at":"2021-03-09T22:16:08.728353Z",

"updated_at":"2021-03-09T22:16:08.728378Z",

"lead_time":4.288,

"result_count":0,

"task":1,

"completed_by":10

}

"predictions": [

{

"created_ago": "3 hours",

"model_version": "model 1",

"result": [

{

"from_name": "tag",

"id": "t5sp3TyXPo",

"source": "$image",

"to_name": "img",

"type": "rectanglelabels",

"value": {

"height": 11.612284069097889,

"rectanglelabels": [

"Moonwalker"

"rotation": 0,

"width": 39.6,

"x": 13.2,

"y": 34.702495201535505

}

]

{

"created_ago": "4 hours",

"model_version": "model 2",

"result": [

{

"from_name": "tag",

"id": "t5sp3TyXPo",

"source": "$image",

"to_name": "img",

"type": "rectanglelabels",

"value": {

"height": 33.61228406909789,

"rectanglelabels": [

"Moonwalker"

"rotation": 0,

"width": 39.6,

"x": 13.2,

"y": 54.702495201535505

}

]

}

]

}

task 信息主要包含3个信息：基本信息、标注信息（annotations）和预测信息（predictions），重点关注predictions，predictions 的result 存放标注结果，是一个列表，我们目前主要使用这个task 的信息，修改setup 和predict 完成predict 返回要求的json 格式。

修改后的NewModel 如下

复制代码

from typing import List, Dict, Optional
from label_studio_ml.model import LabelStudioMLBase
from label_studio_ml.response import ModelResponse

from label_studio_ml.utils import  get_single_tag_keys, get_image_local_path
from label_studio_ml.api import  logger
from ultralytics import YOLO

class NewModel(LabelStudioMLBase):
    """Custom ML Backend model
    """
    
    def setup(self):
        """Configure any parameters of your model here
        """
        self.set("model_version", "0.0.1")
        self.detector= YOLO("/***/best.pt",) #yolo v11n,已训练好的模型权重

        self.from_name, self.to_name, self.value, self.labels_in_config =get_single_tag_keys(self.parsed_label_config,
                            'RectangleLabels', 'Image')  # 前端获取任务属性

        self.labels_in_config = set(self.labels_in_config)  # 前端配置的标签列表


    def predict(self, tasks: List[Dict], context: Optional[Dict] = None, **kwargs) -> ModelResponse:
        """ Write your inference logic here
            :param tasks: [Label Studio tasks in JSON format](https://labelstud.io/guide/task_format.html)
            :param context: [Label Studio context in JSON format](https://labelstud.io/guide/ml_create#Implement-prediction-logic)
            :return model_response
                ModelResponse(predictions=predictions) with
                predictions: [Predictions array in JSON format](https://labelstud.io/guide/export.html#Label-Studio-JSON-format-of-annotated-tasks)
        """
        print(f'''\
        Run prediction on {tasks}
        Received context: {context}
        Project ID: {self.project_id}
        Label config: {self.label_config}
        Parsed JSON Label config: {self.parsed_label_config}
        Extra params: {self.extra_params}''')
        predictions = {
            # "model_version": self.get("model_version"),
            "result": [],
            "scores": 0
        }
        average_conf=0
        for task in tasks:
            image_path = self.get_local_path(task['data']['image'], task_id=task['id']) #获取图片路径
           
            results=self.detector.predict(image_path,conf=0.5)
            for result in results: #yolo 检测后的结果
                #获取对应信息
                confs = result.boxes.conf.cpu().numpy().tolist()
                for index in range(len(result.boxes.cls.cpu().numpy().tolist())): #index 是序号
                    index_=int(result.boxes.cls.cpu().numpy().tolist()[index]) #index_ 是目标id
                    label=result.names[index_] #获取实际标签
                    if label not in self.labels_in_config: #不在这个项目，则跳过
                        continue

                    bbox=result.boxes.xyxyn.cpu().numpy().tolist()[index] #获取坐标

                    conf=confs[index] #获取置信度
                    img_height,img_width,  = result.orig_shape #(1080, 1920)

                    prediction={
                            # "id": task['id'],
                            'from_name': self.from_name,
                            'to_name': self.to_name,
                            # "source": "$image",
                            "original_width": img_width,
                            "original_height": img_height,
                            "image_rotation": 0,
                            'type': 'rectanglelabels',
                            "value":{
                                'rectanglelabels': [label],
                                'x': round(bbox[0]  * 100,2),	# 坐标需要转换
                                'y': round(bbox[1]  * 100,2),	# 数值类型返回整型
                                'width': round((bbox[2]-bbox[0])  * 100,2),
                                'height': round((bbox[3]-bbox[1])  * 100,2),
                                "rotation": 0
                            }
                   }

                    predictions["result"].append(prediction)
                predictions["score"] = sum(confs)/ max(len(confs), 1)

        return ModelResponse(predictions=[predictions])