使用 Label Studio 标注文本
文章目录
前言
Label Studio是一个开源的功能强大的标注平台,可以标注视频,图片,音频,文字等各类型的数据。
这篇文章主要介绍标注员如何使用Label Studio标注文本数据。
下面是开源地址
Label Studio的简单使用
1.创建项目
2.添加本地存储
这里的路径填写的是之前设置的LOCAL_FILES_DOCUMENT_ROOT的路径,后面加了一个子文件夹Resume_Labeling(该文件夹需要提前创建),
填好之后可以点击Check Connection按钮检查路径是否配置正确
3.选择标注模板
Projects=>Resume_Labeling=>Settings=>Labeling Interface=>Browse Templates
选择我们刚刚添加的自定义模板
4.添加数据
把需要标注的文件和一个Import.json文件,放到Resume_Labeling文件夹下,再从界面导入Import.json文件,就可以了
导入数据的文件夹结构
点击Import按钮,选择Import.json文件
import.json文件内容
bash
[{
"data": {
"labeler":"task3@qq.com",
"reviewver":"reviewver1@qq.com",
"resume_id":"fsdgsd",
"rules":"rules",
"source_resume": "/mydata/local-files/?d=Resume_Labeling/Round1/Import/LabelStudio/source_resume.html",
"resume": "/mydata/local-files/?d=Resume_Labeling/Round1/Import/LabelStudio/resume.html"
}
}
]
resume.html 和source_resume.html
html
<html>
<head>
<style>
.page[theme="beryl"] * {
user-select: text;
color: #333344;
font-size: 16px;
line-height: 1.6;
overflow-wrap: break-word;
}
.page[theme="beryl"] a {
text-decoration-color: #008117;
}
.page[theme="beryl"] {
width: 794px;
background: #ffffff;
padding: 72px;
margin-bottom: 32px;
border-radius: 4px;
box-shadow: 0px 4px 8px #d0d0d0;
position: relative;
}
/* 以下是页眉页脚样式 */
.page[theme="beryl"]>header,
footer {
position: absolute;
}
.page[theme="beryl"]>header {
top: 28px;
right: 72px;
}
.page[theme="beryl"]>footer {
bottom: 28px;
left: 397px;
transform: translate(-50%, 0);
}
.page[theme="beryl"]>footer>div::before {
content: "- ";
}
.page[theme="beryl"]>footer>div::after {
content: " -";
}
/* 以下是基本信息式 */
.page[theme="beryl"]>.head {
width: 100%;
display: inline;
grid-template-columns: auto 1fr auto;
grid-column-gap: 16px;
margin-bottom: 32px;
}
.page[theme="beryl"]>.head>div>.name {
font-size: 36px;
font-weight: bold;
margin-bottom: 32px;
text-align: center;
}
.page[theme="beryl"]>.head>.information {
display: grid;
grid-template-columns: auto auto auto;
/* grid-column-gap: 20px; */
}
.page[theme="beryl"]>.head>.information>.label {
display: flex;
/* justify-content: flex-end; */
margin-bottom: 8px;
}
.page[theme="beryl"]>.head>.information>.label>.title {
font-size: 16px;
font-weight: bold;
color: #008117;
}
.page[theme="beryl"]>.head>.information>.label>.msg {
font-size: 16px;
margin-left: 8px;
font-weight: bold;
}
.page[theme="beryl"]>.head>.information>.label>.icon {
width: 18px;
height: 22px;
object-fit: contain;
margin-right: 8px;
}
.page[theme="beryl"]>.head>.information>.label>.tag {
margin-right: 8px;
font-weight: bold;
}
.page[theme="beryl"]>.head>.portrait {
/* 48mm x 33mm */
height: 182px;
width: 125px;
object-fit: contain;
}
.page[theme="beryl"]>.head>.portrait:not([src]) {
width: 0;
opacity: 0;
}
/* 以下是详细信息式 */
.page[theme="beryl"]>.main {
margin-bottom: 36px;
}
.page[theme="beryl"]>.main>.mainhead {
display: flex;
margin-bottom: 8px;
background: #497089;
padding-left: 16px;
border-radius: 8px;
}
.page[theme="beryl"]>.main>.mainhead>.icon {
display: none;
}
.page[theme="beryl"]>.main>.mainhead>.t1 {
font-size: 24px;
font-weight: bold;
color: #ffffff;
}
.page[theme="beryl"]>.main>.subhead {
display: inline;
/* grid-template-columns: 1fr auto; */
}
.page[theme="beryl"]>.main>.subhead>.information {
display: grid;
grid-template-columns: auto;
margin-bottom:20px;
}
.page[theme="beryl"]>.main>.subhead>.information>.label {
display: flex;
}
.page[theme="beryl"]>.main>.subhead>.information>.label>.title {
font-size: 16px;
font-weight: bold;
color: #008117;
}
.page[theme="beryl"]>.main>.subhead>.information>.label>.value {
font-size: 16px;
margin-left: 8px;
font-weight: bold;
}
.page[theme="beryl"]>.main>.subhead>.t2 {
font-size: 16px;
font-weight: bold;
color: #008117;
}
.page[theme="beryl"]>.main>.subhead>.time {
font-weight: bold;
color: #008117;
}
.page[theme="beryl"]>.main>.subhead>.note {
font-weight: bold;
color: #555555;
}
.page[theme="beryl"]>.main>ol,
ul {
padding-left: 20px;
}
.page[theme="beryl"]>.main>.contents {
margin-bottom: 8px;
}
.page[theme="beryl"]>.main>.contents>div {
font-size: 16px;
}
/* 以下是打印样式处理 */
@media print {
.page[theme="beryl"] * {
color: #000000;
}
.page[theme="beryl"] {
border-radius: 0;
box-shadow: none;
}
}
* {
padding: 0;
margin: 0;
user-select: none;
box-sizing: border-box;
color: #333344;
print-color-adjust: exact;
-webkit-print-color-adjust: exact;
}
body {
background: #f0f0f0;
}
.themes {
position: fixed;
top: 16px;
left: 16px;
}
.themes>div {
margin: 16px;
font-size: 22px;
height: 48px;
line-height: 48px;
text-align: center;
padding: 0 16px;
border-radius: 24px;
}
.themes,
.language,
.toolbar {
display: none;
}
.language {
position: fixed;
right: 16px;
top: 16px;
}
.toolbar {
position: fixed;
right: 16px;
bottom: 16px;
}
.language>div,
.toolbar>div {
width: 48px;
height: 48px;
border-radius: 50%;
margin: 16px;
display: flex;
align-items: center;
justify-content: center;
}
.language>div>img {
width: 40px;
height: 40px;
object-fit: contain;
}
.toolbar>div>img {
width: 32px;
height: 32px;
object-fit: contain;
}
.themes>.themes-title {
font-size: 28px;
color: #666666;
}
.language>div>img,
.toolbar>div>img {
filter: brightness(2)
}
.themes>.theme,
.language>div,
.toolbar>div {
background: #f0f0f0;
box-shadow: 2px 2px 4px #dadada, -2px -2px 4px #ffffff;
color: #666666;
}
.themes>.theme:active,
.language>div:active,
.toolbar>div:active {
filter: brightness(1.03);
}
.themes>.theme[selected="true"],
.language>div[selected="true"] {
box-shadow: inset 2px 2px 4px #dadada, inset -2px -2px 4px #ffffff;
}
.resume {
/* 210mm x 297mm */
width: 794px;
/* height: 1123px; */
margin: 32px auto;
}
.source {
width: 100%;
text-align: center;
margin-bottom: 32px;
}
@media print {
@page {
margin: 0;
}
.no-print {
display: none !important
}
.resume {
margin: 0 auto;
}
}
</style>
<meta charset="UTF-8">
</head>
<body>
<div id="resume" class="resume">
<div class="page" theme="beryl" style="height: 900px;">
<section class="head" name="basic_information">
<div>
<div class="name">个人简历</div>
</div>
<div class="information">
<div class="label">
<h2 class="title">姓名</h2>
<div class="msg" name="name">xxx</div>
</div>
<div class="label">
<h2 class="title">性别</h2>
<div class="msg" name="gender">男</div>
</div>
<div class="label">
<h2 class="title">年龄</h2>
<div class="msg" name="age">31</div>
</div>
<div class="label">
<h2 class="title">邮箱</h2>
<div class="msg" name="email">dddddd@qq.com</div>
</div>
<div class="label">
<h2 class="title">电话</h2>
<div class="msg" name="phone">1111111</div>
</div>
<div class="label">
<h2 class="title">住址</h2>
<div class="msg" name="loc"></div>
</div>
<div class="label">
<h2 class="title">工作年限</h2>
<div class="msg" name="work_year">2</div>
</div>
</div>
</section>
<section class="main pri-subdir" name="edu_exp">
<div class="mainhead">
<h1 class="t1">教育经历</h1>
</div>
<div class="subhead" name="edu_exp">
<div class="information" name="edu_exp_1">
<div class="label">
<h2 class="title">学校</h2>
<div class="value" name="school">美国麻省大学波士顿分校</div>
</div>
<div class="label">
<h2 class="title">开始时间</h2>
<div class="value" name="start_time">2015.10</div>
</div>
<div class="label">
<h2 class="title">结束时间</h2>
<div class="value" name="end_time">2019.12</div>
</div>
</div>
</div>
<div class="subhead" name="edu_exp">
<div class="information" name="edu_exp_2">
<div class="label">
<h2 class="title">学校</h2>
<div class="value" name="school">第二个学校</div>
</div>
<div class="label">
<h2 class="title">开始时间</h2>
<div class="value" name="start_time">2015.10</div>
</div>
<div class="label">
<h2 class="title">结束时间</h2>
<div class="value" name="end_time">2019.12</div>
</div>
</div>
</div>
</section>
<section class="main no-subdir" name="english_ability">
<div class="mainhead">
<h1 class="t1">英语能力</h1>
</div>
<div class="contents">
<div>读写能力良好</div>
<div>听说能力良好</div>
</div>
</section>
<section class="main no-subdir" name="certs">
<div class="mainhead">
<h1 class="t1">证书</h1>
</div>
<div class="contents">
<div>证书1</div>
<div>证书2</div>
</div>
</section>
<footer>
<div>1</div>
</footer>
</div>
<div class="page" theme="beryl" style="height: 900px;">
<section class="main no-subdir" name="skills">
<div class="mainhead">
<h1 class="t1">专业技能</h1>
</div>
<div class="contents">
<div>Wireshark- HTTP , DNS, TCP/IP, capture Ethernet data</div>
<div>VM WorkStation</div>
</div>
</section>
<section class="main no-subdir" name="my_desc">
<div class="mainhead">
<h1 class="t1">自我评价</h1>
</div>
<div class="contents">
<div>自我评价内容</div>
</div>
</section>
<section class="main pri-subdir" name="job_exp">
<div class="mainhead">
<h1 class="t1">工作经历</h1>
</div>
<div class="subhead" name="job_exp">
<div class="information" name="job_exp_1">
<div class="label">
<h2 class="title">公司</h2>
<div class="value" name="company">美团</div>
</div>
<div class="label">
<h2 class="title">开始时间</h2>
<div class="value" name="start_time">2023.10</div>
</div>
<div class="label">
<h2 class="title">结束时间</h2>
<div class="value" name="end_time">至今</div>
</div>
<div class="label">
<h2 class="title">岗位</h2>
<div class="value" name="position">AI岗位</div>
</div>
<div class="contents"></div>
</div>
</div>
</section>
<footer>
<div>2</div>
</footer>
</div>
<div class="page" theme="beryl" style="height: 900px;">
<section class="main pri-subdir" name="job_exp">
<div class="subhead">
<div class="information" name="job_exp_1">
<div class="contents">
<div>内容</div>
</div>
</div>
</div>
<div class="subhead" name="job_exp">
<div class="information" name="job_exp_2">
<div class="label">
<h2 class="title">公司</h2>
<div class="value" name="company">公司1</div>
</div>
<div class="label">
<h2 class="title">开始时间</h2>
<div class="value" name="start_time">2021.08</div>
</div>
<div class="label">
<h2 class="title">结束时间</h2>
<div class="value" name="end_time">2023.10</div>
</div>
<div class="label">
<h2 class="title">岗位</h2>
<div class="value" name="position">标注审核员</div>
</div>
<div class="contents">
<div>负责对已标注视频数据内容审核工作</div>
</div>
</div>
</div>
</section>
<section class="main pri-subdir" name="proj_exp">
<div class="mainhead">
<h1 class="t1">项目经历</h1>
</div>
<div class="subhead" name="proj_exp">
<div class="information" name="proj_exp_1">
<div class="label">
<h2 class="title">项目名称</h2>
<div class="value" name="proj_name">AI模型数据标注基础研发平台</div>
</div>
<div class="label">
<h2 class="title">开始时间</h2>
<div class="value" name="start_time">2023.10</div>
</div>
<div class="label">
<h2 class="title">结束时间</h2>
<div class="value" name="end_time">至今</div>
</div>
<div class="label">
<h2 class="title">项目职责</h2>
</div>
<div class="contents">
<div>熟练掌握AI模型训练及评测相关标注任务</div>
</div>
<div class="label">
<h2 class="title">项目内容</h2>
</div>
<div class="contents"></div>
</div>
</div>
</section>
<footer>
<div>3</div>
</footer>
</div>
<div class="page" theme="beryl" style="height: 900px;">
<section class="main pri-subdir" name="proj_exp">
<div class="subhead">
<div class="information" name="proj_exp_1">
<div class="contents">
<div>熟练掌握AI模型训练及评测相关标注任务</div>
</div>
</div>
</div>
</section>
<footer>
<div>4</div>
</footer>
</div>
</div>
</body>
</html>
5.标注
界面稍微有调整,左边添加了一个原始简历,用于展示和对比
1.标注时,先选择标签,如Name(也可以用快捷键选择,快捷键在标签的右上角展示,如Name的快捷键是4)
2.然后在标注界面,选择文本,即完成标注
3.如果需要修改文本,则选择文本,在界面的上方会显示一个文本框,在里面填写修改后的文本
4.完成后点击提交(或更新)
6.添加关系
通常简历中可能不止一段教育经历(项目经历|工作经历),为区分,需要给同一段教育经历分组(项目经历|工作经历),可以通过添加关系来达成目的。
1.选择关系的起点:同一段教育经历下的字段,如下图中的结束时间
2.标签基本信息面板,点击关系按钮(或者快捷键Alt+R)
3.选择关系的终点:同一段教育经历下的学校字段,如下图中的学校名称
总结
本文从标注人员的角度简单介绍了Label Studio的使用。