label studio安装和使用

官方：https://labelstud.io/guide/quick_start

我这个是图片20分类的标注任务

正常流程

1.创建环境：conda create -n label_studio python=3.10 -y

2.安装: pip install label-studio

3.启动：label-studio start

后台启动方法：nohup label-studio start --host 0.0.0.0 > nohup/start.nohup 2>&1 &
服务日志：./nohup/start.nohup

4.打开浏览器：Open Label Studio 在http://localhost:8080

5.创建新账号：http://localhost:8080/user/signup，创建项目，导入数据

导入数据

我是想通过软链接映射的方法来让label studio读取到我指定目录下的图片，这样之后我再有其他任务，在这个目录下在创建一个文件夹放其他图片可以无缝衔接使用

step1 ：

进入Label Studio的虚拟环境，输入：

python 复制代码

python -c "import label_studio.core as core; import os; print(os.path.join(os.path.dirname(core.__file__), 'static_build', 'images'))"

# 得到类似：/home/xxx/miniconda3/envs/ls/lib/python3.9/site-packages/label_studio/core/static_build/images的结果，一会得再images后面加快捷方式文件夹

step2：

建立统一存储的软链接：

比如把所有图片都放在/opt/syp/label_studio/images下

python 复制代码

# 1. 确保你的图片存放总目录存在
mkdir -p /opt/syp/label_studio/images

# 2. 将总目录，链接到 Label Studio 的 images 目录下，命名为 "LS_data"
# 注意：把下面的第一个长路径替换成刚刚在第 1 步里找到的真实路径！
ln -s 存放的目录/opt/syp/label_studio/images 第1步的结果+一个快捷方式/home/supervisor/anaconda3/envs/label_studio/lib/python3.10/site-packages/label_studio/core/static_build/images/LS_data


(# 删掉旧的软链接（打扫战场）方法是：rm /home/supervisor/anaconda3/envs/label_studio/lib/python3.10/site-packages/label_studio/core/static_build/images/LS_data)

这样做好，以后增加新的数据任务，物理文件夹依然是放在 /opt/syp/label_studio/images/ 下面！

我用来生成输入label studio的json的代码参考：

python 复制代码

import os
import json
import urllib.parse

# 1. 这里填当前想导入的任务文件夹名字
TASK_REL_PATH = "test1/证件照"

# 2. 图片物理根目录
BASE_IMG_DIR = f"/opt/syp/label_studio/images/{TASK_REL_PATH}"

# 3. Label Studio 读取图片的基础 URL
encoded_task_path = urllib.parse.quote(TASK_REL_PATH, safe='/')
BASE_URL = f"http://10.25.20.247:8080/static/images/LS_data/{encoded_task_path}"

def generate_import_json():
    task_list =[]
    # 允许的图片格式后缀
    valid_extensions = {".jpg", ".jpeg", ".png", ".bmp", ".webp"}

    if not os.path.exists(BASE_IMG_DIR):
        print(f"找不到文件夹: {BASE_IMG_DIR}，请检查路径是否正确！")
        return

    print(f"正在扫描文件夹: {BASE_IMG_DIR} ...")

    # os.walk 会自动往下钻，遍历所有的子文件夹
    for root, _, files in os.walk(BASE_IMG_DIR):
        for file in files:
            # 提取文件后缀并转小写
            ext = os.path.splitext(file)[1].lower()
            if ext in valid_extensions:
                # 1. 获取文件的完整物理路径
                full_path = os.path.join(root, file)
                
                # 2. 计算出相对于当前任务目录的相对路径 (这样才能拼接到 URL 里)
                rel_path = os.path.relpath(full_path, BASE_IMG_DIR)
                rel_path = rel_path.replace("\\", "/") # 兼容不同系统的斜杠
                
                # 3. 对路径进行 URL 编码，图片名称如果有中文也进行安全转码
                encoded_rel_path = urllib.parse.quote(rel_path, safe='/')
                
                # 4. 拼接出可通过网络访问的完整 URL
                image_url = f"{BASE_URL}/{encoded_rel_path}"
                
                # 5. 按照 Label Studio 的要求组装字典
                task = {
                    "data": {
                        # 这里的 "image" 必须和 XML 里的 <Image name="image" value="$image"/> 对应
                        "image": image_url 
                    }
                }
                task_list.append(task)
    
    # 将结果导出为 JSON 文件
    safe_filename = TASK_REL_PATH.replace("/", "_")
    output_filename = f"import_{safe_filename}.json"
    
    with open(output_filename, 'w', encoding='utf-8') as f:
        # indent=2 让生成的 JSON 文件有缩进，方便检查
        json.dump(task_list, f, ensure_ascii=False, indent=2)
        
    print(f"\n成功扫描了 {len(task_list)} 张图片！")
    print(f"已经生成导入文件：{output_filename}")

if __name__ == "__main__":
    generate_import_json()

**代码中修改的地方：**比如有一个test100，就写TASK_REL_PATH = "test100" 如果是再test511的girl_image下就写TASK_REL_PATH = "test511/girl_image"这样

这样做好以后，不管以后再增加新的数据任务，都放在LS_data下即可，比如LS_data/test1, LS_data/test2