不同类型的语义相似度损失函数(SentenceTransformerLoss)

文章目录

  • 不同输入类型的损失
  • [输入类型:[(anchor, positive/negative, label 1/0)...],label为1距离小、为0距离大](#输入类型:[(anchor, positive/negative, label 1/0)...],label为1距离小、为0距离大)
  • [输入类型:[(sentence1, label1), (sentence2, label2)...],label相同则距离小](#输入类型:[(sentence1, label1), (sentence2, label2)...],label相同则距离小)
  • [输入类型:[(sentence1, sentence2, score), ...], 拟合sentence pair的score(大于0小于1)](#输入类型:[(sentence1, sentence2, score), ...], 拟合sentence pair的score(大于0小于1))
  • [输入类型:[(sentence1, sentence2, label), ...], 多分类sentence pair](#输入类型:[(sentence1, sentence2, label), ...], 多分类sentence pair)
  • [输入类型:[(anchor, positive, negative), ...], 三元组样本对输入](#输入类型:[(anchor, positive, negative), ...], 三元组样本对输入)
  • [输入类型:[(anchor, positive), ...], 仅正样本对输入](#输入类型:[(anchor, positive), ...], 仅正样本对输入)
  • [输入类型:[sentence1, sentence2, ...],无标签输入](#输入类型:[sentence1, sentence2, ...],无标签输入)

不同输入类型的损失

根据任务数据类型 选择合适的损失,详见这里


输入类型:[(anchor, positive/negative, label 1/0)...],label为1距离小、为0距离大

ContrastiveLoss(对比损失)

对于样本对A和B:

  • 正样本对(类别为1),它们之间的距离应尽可能近;
  • 负样本对(类别为0),它们之间的距离应尽可能远,只惩罚距离小于margin的负样本对,距离超过阈值时不再惩罚;

distance_metric默认为余弦距离,margin默认为0.5,lossd^2(a,p) + max(margin - d^2(a,n), 0)

python 复制代码
def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:
    reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
    assert len(reps) == 2
    rep_anchor, rep_other = reps
    distances = self.distance_metric(rep_anchor, rep_other)
    losses = 0.5 * (
        labels.float() * distances.pow(2) + (1 - labels).float() * F.relu(self.margin - distances).pow(2)
    )
    return losses.mean() if self.size_average else losses.sum()

OnlineContrastiveLoss

ContrastiveLoss基本相同,该loss仅选择批次内困难样本计算损失,通常效果比对比损失更优。

损失:选择距离小于最大正样本对距离的负样本,选择距离大于最小负样本对距离的正样本。忽略负样本对最小距离正样本对最大距离 的差超过阈值的easy实例。

python 复制代码
def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor, size_average=False) -> Tensor:
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]

    distance_matrix = self.distance_metric(embeddings[0], embeddings[1])
    negs = distance_matrix[labels == 0]
    poss = distance_matrix[labels == 1]

    # select hard positive and hard negative pairs
    negative_pairs = negs[negs < (poss.max() if len(poss) > 1 else negs.mean())]
    positive_pairs = poss[poss > (negs.min() if len(negs) > 1 else poss.mean())]

    positive_loss = positive_pairs.pow(2).sum()
    negative_loss = F.relu(self.margin - negative_pairs).pow(2).sum()
    loss = positive_loss + negative_loss
    return loss

输入类型:[(sentence1, label1), (sentence2, label2)...],label相同则距离小

BatchAllTripletLoss

损失度量:

  • 批次内具有相同标签的句子属于同一类,距离应近;
  • 批次内具有不同标签的句子属于不同类,距离应远;
  • 对于任意锚点样本,其与具有相同标签样本的距离应小于与其具有不同标签样本的距离;

比如对于四个样本[(a, label1), (b, label1), (c, label2), (d, label2)],则pairwise_dist为[[aa, ab, ac, ad], ..., [da, db, dc, dd]]。若a作为锚点,ab正样本对距离,ac为负样本对距离,loss中的其中一项为ab-ac+margin

正样本对距离越大,负样本对距离越小,则损失越大。忽略距离差大于margin的正负样本对,即ab-ac+margin<0,这种样本对容易区分,对损失影响不大。

python 复制代码
def batch_all_triplet_loss(self, labels: Tensor, embeddings: Tensor) -> Tensor:
    # Get the pairwise distance matrix
    pairwise_dist = self.distance_metric(embeddings)
    anchor_positive_dist = pairwise_dist.unsqueeze(2)
    anchor_negative_dist = pairwise_dist.unsqueeze(1)

    # Compute a 3D tensor of size (batch_size, batch_size, batch_size)
    # triplet_loss[i, j, k] will contain the triplet loss of anchor=i, positive=j, negative=k
    # Uses broadcasting where the 1st argument has shape (batch_size, batch_size, 1)
    # and the 2nd (batch_size, 1, batch_size)
    triplet_loss = anchor_positive_dist - anchor_negative_dist + self.triplet_margin

    # Put to zero the invalid triplets
    # (where label(a) != label(p) or label(n) == label(a) or a == p)
    mask = BatchHardTripletLoss.get_triplet_mask(labels)
    triplet_loss = mask.float() * triplet_loss

    # Remove negative losses (i.e. the easy triplets)
    triplet_loss[triplet_loss < 0] = 0

    # Count number of positive triplets (where triplet_loss > 0)
    valid_triplets = triplet_loss[triplet_loss > 1e-16]
    num_positive_triplets = valid_triplets.size(0)
    # num_valid_triplets = mask.sum()
    # fraction_positive_triplets = num_positive_triplets / (num_valid_triplets.float() + 1e-16)

    # Get final mean triplet loss over the positive valid triplets
    triplet_loss = triplet_loss.sum() / (num_positive_triplets + 1e-16)

    return triplet_loss

BatchHardSoftMarginTripletLoss

批次内任一锚点,与相同标签样本的最大距离也要比与不同标签的最小距离更近,同类样本即使远也要比非同类样本的距离近。

使用软间隔,loss=log(1 + exp(d(a, p) - d(a, n)))。正负样本对距离相近时,损失变化速率最快,易优化;正样本对距离远小于负样本距离时,损失趋于0。

python 复制代码
def batch_hard_triplet_soft_margin_loss(self, labels: Tensor, embeddings: Tensor) -> Tensor:
    # Get the pairwise distance matrix
    pairwise_dist = self.distance_metric(embeddings)

    # For each anchor, get the hardest positive
    # First, we need to get a mask for every valid positive (they should have same label)
    mask_anchor_positive = BatchHardTripletLoss.get_anchor_positive_triplet_mask(labels).float()

    # We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))
    anchor_positive_dist = mask_anchor_positive * pairwise_dist

    # shape (batch_size, 1)
    hardest_positive_dist, _ = anchor_positive_dist.max(1, keepdim=True)

    # For each anchor, get the hardest negative
    # First, we need to get a mask for every valid negative (they should have different labels)
    mask_anchor_negative = BatchHardTripletLoss.get_anchor_negative_triplet_mask(labels).float()

    # We add the maximum value in each row to the invalid negatives (label(a) == label(n))
    max_anchor_negative_dist, _ = pairwise_dist.max(1, keepdim=True)
    anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)

    # shape (batch_size,)
    hardest_negative_dist, _ = anchor_negative_dist.min(1, keepdim=True)

    # Combine biggest d(a, p) and smallest d(a, n) into final triplet loss with soft margin
    # tl = hardest_positive_dist - hardest_negative_dist + margin
    # tl[tl < 0] = 0
    tl = torch.log1p(torch.exp(hardest_positive_dist - hardest_negative_dist))
    triplet_loss = tl.mean()

    return triplet_loss

BatchHardTripletLoss

BatchHardSoftMarginTripletLoss不同的是,手动设置间隔,loss = d(a, p) - d(a, n) + margin,令loss[loss < 0] = 0,忽略正负距离相差超过阈值的样本对。


输入类型:[(sentence1, sentence2, score), ...], 拟合sentence pair的score(大于0小于1)

CosineSimilarityLoss(相似度回归)

计算样本对之间的余弦相似分数,和标签分数做MSE损失。cos_score_transformation默认不执行任何操作,loss_fct默认为MSE损失。

python 复制代码
def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
    output = self.cos_score_transformation(torch.cosine_similarity(embeddings[0], embeddings[1]))
    return self.loss_fct(output, labels.float().view(-1))

CoSENTLoss(相似度回归和排序任务)

Cosine Sentence Loss,远离参考科学空间------CoSENT(一):比Sentence-BERT更有效的句向量方案

损失:对于句对(i, j)(k,l),若标签label[i,j] < label[k,l],则期望模型预测的相似度scores[i,j] < scores[k,l]。损失定义为loss=log(1 + exp(s[i,j] - s[k,l]) + exp...),即期望(i,j)的相似分数小于(k,l)

相似分数度量:余弦相似分数score,1表示相似,0表示不相似。这里不是距离是相似分数,训练完成后,不同向量之间的 余弦距离表示语义相似度, 适用于句子相似度回归和排序任务

比如batch内3对样本编号1,2和3,真值labels为(0.1, 0.7, 0.9),则样本对(1, 2), (1, 3), (2, 3)参与计算loss。如果预测scores为(0.3, 0.4, 0.2),差值分数为(-0.1, 0.1, 0.2),差值分数为正,则损失更大!

python 复制代码
def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]

    scores = self.similarity_fct(embeddings[0], embeddings[1])
    scores = scores * self.scale
    scores = scores[:, None] - scores[None, :]

    # label matrix indicating which pairs are relevant
    labels = labels[:, None] < labels[None, :]
    labels = labels.float()

    # mask out irrelevant pairs so they are negligible after exp()
    scores = scores - (1 - labels) * 1e12

    # append a zero as e^0 = 1
    scores = torch.cat((torch.zeros(1).to(scores.device), scores.view(-1)), dim=0)
    loss = torch.logsumexp(scores, dim=0)

    return loss

输入类型:[(sentence1, sentence2, label), ...], 多分类sentence pair

SoftmaxLoss

孪生网络,文本对多分类。

python 复制代码
model = SentenceTransformer("microsoft/mpnet-base")
train_dataset = Dataset.from_dict({
    "sentence1": [
        "A person on a horse jumps over a broken down airplane.",
        "A person on a horse jumps over a broken down airplane.",
        "A person on a horse jumps over a broken down airplane.",
        "Children smiling and waving at camera",
    ],
    "sentence2": [
        "A person is training his horse for a competition.",
        "A person is at a diner, ordering an omelette.",
        "A person is outdoors, on a horse.",
        "There are children present.",
    ],
    "label": [1, 2, 0, 0],
})
loss = losses.SoftmaxLoss(model, model.get_sentence_embedding_dimension(), num_labels=3)
python 复制代码
 def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor | tuple[Tensor, Tensor]:
     reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
     rep_a, rep_b = reps

     vectors_concat = []
     if self.concatenation_sent_rep:
         vectors_concat.append(rep_a)
         vectors_concat.append(rep_b)

     if self.concatenation_sent_difference:
         vectors_concat.append(torch.abs(rep_a - rep_b))

     if self.concatenation_sent_multiplication:
         vectors_concat.append(rep_a * rep_b)

     features = torch.cat(vectors_concat, 1)

     output = self.classifier(features)

     if labels is not None:
         loss = self.loss_fct(output, labels.view(-1))
         return loss
     else:
         return reps, output

输入类型:[(anchor, positive, negative), ...], 三元组样本对输入

TripletLoss

锚点与正负样本之间的距离要大于margin,也就是说,惩罚dis(anchor,neg) - dis(anhor,pos)<margin的三元组。默认distance_metric为欧式距离,margin为5。

python 复制代码
def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:
    reps = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]

    rep_anchor, rep_pos, rep_neg = reps
    distance_pos = self.distance_metric(rep_anchor, rep_pos)
    distance_neg = self.distance_metric(rep_anchor, rep_neg)

    losses = F.relu(distance_pos - distance_neg + self.triplet_margin)
    return losses.mean()

MultipleNegativesRankingLoss / InfoNCELoss

任意锚点样本,包含一条正样本和多条负样本。计算锚点和正、负样本之间的相似度,使用softmax多分类。增加锚点与正样本之间的相似度,降低锚点与负样本之间的相似度。

等价于InfoNCE loss,在softmax之间对score进行温度缩放。MultipleNegativesRankingLoss里面就是scale参数,scale=1就是标签的交叉熵损失。

python 复制代码
def forward(self, sentence_features: Iterable[dict[str, Tensor]], labels: Tensor) -> Tensor:
    # Compute the embeddings and distribute them to anchor and candidates (positive and optionally negatives)
    embeddings = [self.model(sentence_feature)["sentence_embedding"] for sentence_feature in sentence_features]
    anchors = embeddings[0]  # (batch_size, embedding_dim)
    candidates = torch.cat(embeddings[1:])  # (batch_size * (1 + num_negatives), embedding_dim)

    # For every anchor, we compute the similarity to all other candidates (positives and negatives),
    # also from other anchors. This gives us a lot of in-batch negatives.
    scores = self.similarity_fct(anchors, candidates) * self.scale
    # (batch_size, batch_size * (1 + num_negatives))

    # anchor[i] should be most similar to candidates[i], as that is the paired positive,
    # so the label for anchor[i] is i
    range_labels = torch.arange(0, scores.size(0), device=scores.device)

    return self.cross_entropy_loss(scores, range_labels)

CachedMultipleNegativesRankingLoss

MultipleNegativesRankingLoss的优化版本,将批次中的样本分多个mini-batch,缓存梯度,避免OOM。


输入类型:[(anchor, positive), ...], 仅正样本对输入

可使用MultipleNegativesRankingLoss损失,将batch内其它样本对的positive作为自身的negatives,执行softmax分类。批次内样本数越多,越难分类,预期效果越好。


输入类型:[sentence1, sentence2, ...],无标签输入

无标签输入。

可使用ContrastiveTensionLossInBatchNegatives损失,同一句子执行两次forward(网络中包含dropout等随机操作),目标是使不同forward之间同句接近、不同句远离。

相关推荐
acstdm2 小时前
DAY 48 CBAM注意力
人工智能·深度学习·机器学习
澪-sl2 小时前
基于CNN的人脸关键点检测
人工智能·深度学习·神经网络·计算机视觉·cnn·视觉检测·卷积神经网络
羊小猪~~3 小时前
数据库学习笔记(十七)--触发器的使用
数据库·人工智能·后端·sql·深度学习·mysql·考研
视觉语言导航5 小时前
RAL-2025 | 清华大学数字孪生驱动的机器人视觉导航!VR-Robo:面向视觉机器人导航与运动的现实-模拟-现实框架
人工智能·深度学习·机器人·具身智能
羊小猪~~6 小时前
【NLP入门系列五】中文文本分类案例
人工智能·深度学习·考研·机器学习·自然语言处理·分类·数据挖掘
李师兄说大模型6 小时前
KDD 2025 | 地理定位中的群体智能:一个多智能体大型视觉语言模型协同框架
人工智能·深度学习·机器学习·语言模型·自然语言处理·大模型·deepseek
锅挤7 小时前
深度学习5(深层神经网络 + 参数和超参数)
人工智能·深度学习·神经网络
网安INF7 小时前
深层神经网络:原理与传播机制详解
人工智能·深度学习·神经网络·机器学习
喜欢吃豆7 小时前
目前最火的agent方向-A2A快速实战构建(二): AutoGen模型集成指南:从OpenAI到本地部署的全场景LLM解决方案
后端·python·深度学习·flask·大模型
喜欢吃豆8 小时前
快速手搓一个MCP服务指南(九): FastMCP 服务器组合技术:构建模块化AI应用的终极方案
服务器·人工智能·python·深度学习·大模型·github·fastmcp