目录
- [1. 第一题](#1. 第一题)
- [2. 第二题](#2. 第二题)
- [3. 第三题](#3. 第三题)
⏰ 时间:2024/08/19
🔄 输入输出:ACM格式
⏳ 时长:2h
本试卷分为单选,自我评价题,编程题
单选和自我评价这里不再介绍,4399的编程题一如既往地抽象,明明是NLP岗位的笔试题,却考了OpenCV相关的知识。btw,跟网友讨论了下,4399似乎不同时间节点的笔试题是一样的???
1. 第一题
第一题是LC原题:441. 排列硬币,题目和题解请前往LC查看。
2. 第二题
题目描述
请使用OpenCV库编写程序,实现在视频文件中实时追踪一个人手持手机绿幕的四个顶点的坐标。
要求
- 使用颜色分割技术检测绿幕区域。(8分)
- 使用适当的方法(如轮廓检测)找到绿幕的四个顶点。(10分)
- 在视频帧中标记出这四个顶点。(8分)
手机绿幕指:手机屏幕显示全绿色图片,用于后期处理替换为其他界面,绿色范围:lower_green = np.array([35, 100, 100])
,upper_green = np.array([85, 255, 255])
。
测试用例
输入:green_screen_track.mp4
输出:带顶点标记的视频序列帧图片
题解
python
import cv2
import numpy as np
lower_green = np.array([35, 100, 100])
upper_green = np.array([85, 255, 255])
def get_largest_contour(contours):
""" 获取最大轮廓 """
max_contour = max(contours, key=cv2.contourArea)
return max_contour
def get_four_vertices(contour):
""" 近似轮廓为四边形 """
epsilon = 0.02 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
if len(approx) == 4:
return approx.reshape(4, 2)
else:
return None
def main(video_path):
cap = cv2.VideoCapture(video_path)
while cap.isOpened():
ret, frame = cap.read()
if not ret:
break
hsv_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
mask = cv2.inRange(hsv_frame, lower_green, upper_green)
contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
if contours:
largest_contour = get_largest_contour(contours)
vertices = get_four_vertices(largest_contour)
if vertices is not None:
for (x, y) in vertices:
cv2.circle(frame, (x, y), 5, (0, 0, 255), -1)
cv2.polylines(frame, [vertices], isClosed=True, color=(0, 255, 0), thickness=2)
cv2.imshow('Green Screen Tracking', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
if __name__ == "__main__":
video_path = 'green_screen_track.mp4'
main(video_path)
3. 第三题
You can use Chinese to answer the questions.
Problem Description
You need to use the Swin Transformer model to train a binary classifier to identify whether an image contains a green screen. Green screens are commonly used in video production and photography for background replacement in post-production. Your task is to write a program that uses the Swin Transformer model to train and evaluate the performance of this classifier.
Input Data
- Training Dataset: A set of images, including images with and without green screens.
- Labels: Labels for each image, where 0 indicates no green screen and 1 indicates the presence of a green screen.
Output Requirements
- Trained Model: Train a binary classifier using the Swin Transformer model.
- Model Evaluation: Evaluate the model's accuracy, precision, recall, and F1-score on a validation or test set.
Programming Requirements
- Data Preprocessing: Including image loading, normalization, and label processing.
- Model Definition: Using the Swin Transformer model.
- Training Process: Including loss function, optimizer, and training loop.
- Evaluation Process: Evaluate the model's performance on the validation or test set.
- Results Presentation: Output evaluation metrics and visualize some prediction results.
Here is a sample code framework to help you get started:
python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, datasets
from swin_transformer_pytorch import SwinTransformer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from PIL import Image
# Dataset class definition
class GreenScreenDataset(Dataset):
def __init__(self, image_paths, labels, transform=None):
self.image_paths = image_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.image_paths)
def __getitem__(self, idx):
image = Image.open(self.image_paths[idx]).convert('RGB')
label = self.labels[idx]
if self.transform:
image = self.transform(image)
return image, label
# Data preprocessing, please define transform
# TODO
# Load datasets
train_dataset = GreenScreenDataset(train_image_paths, train_labels, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataset = GreenScreenDataset(val_image_paths, val_labels, transform=transform)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
# Define the SwinTransformer model
# TODO
# Loss function and optimizer
criterion = nn.CrossEntropyLoss()
# TODO
# Training process
def train(model, train_loader, criterion, optimizer, num_epochs=10):
model.train()
for epoch in range(num_epochs):
running_loss = 0.0
for images, labels in train_loader:
# TODO: forward pass, compute loss, backpropagation, optimizer step
running_loss += loss.item()
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')
# Evaluation process
def evaluate(model, val_loader):
model.eval()
all_preds = []
all_labels = []
with torch.no_grad():
for images, labels in val_loader:
outputs = model(images)
_, preds = torch.max(outputs, 1)
all_preds.extend(preds.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_labels, all_preds)
# TODO: Calculate precision, recall, and F1-score
print(f'Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-score: {f1:.4f}')
# Train the model
train(model, train_loader, criterion, optimizer, num_epochs=10)
# Evaluate the model
evaluate(model, val_loader)
题解
该问题要求训练一个基于Swin Transformer模型的二分类器,用以识别图像中是否包含绿幕。解决方案涉及数据预处理、模型设计、训练和评估等多个环节。
首先,在数据预处理阶段,图像需要被调整大小并进行归一化,以满足Swin Transformer的输入需求。此外,数据集中的标签是二值化的,分别代表有无绿幕(0表示无绿幕,1表示有绿幕),确保数据集类能够准确处理这些标签是至关重要的。在模型设计上,使用了预训练的Swin Transformer模型,并针对二分类任务进行了微调。输出层被替换为一个具有两个节点的全连接层,分别对应两个类别。通过这种方式,模型能够有效地适应二分类任务。训练过程采用了标准的训练循环,设置了损失函数和优化器,并使用学习率调度器动态调整学习率。此外,为了防止过拟合,模型在训练过程中还应用了正则化技术,如dropout。在模型评估阶段,除了准确率,还使用了精确率、召回率和F1分数等指标,以全面评估模型在二分类任务中的表现。同时,为了更直观地展示模型效果,选择了一些样本图像进行可视化,显示它们的预测结果与实际标签的对比。
python
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms
from swin_transformer_pytorch import SwinTransformer
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
# 数据集类定义
class GreenScreenDataset(Dataset):
def __init__(self, image_paths, labels, transform=None):
self.image_paths = image_paths
self.labels = labels
self.transform = transform
def __len__(self):
return len(self.image_paths)
def __getitem__(self, idx):
image = Image.open(self.image_paths[idx]).convert('RGB')
label = self.labels[idx]
if self.transform:
image = self.transform(image)
return image, torch.tensor(label, dtype=torch.long)
# 数据预处理
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
train_dataset = GreenScreenDataset(train_image_paths, train_labels, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_dataset = GreenScreenDataset(val_image_paths, val_labels, transform=transform)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
model = SwinTransformer(
hidden_dim=96,
layers=(2, 2, 6, 2),
num_heads=(3, 6, 12, 24),
num_classes=2,
window_size=7,
input_resolution=224
)
model = model.to(torch.device('cuda' if torch.cuda.is_available() else 'cpu'))
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=1e-4, weight_decay=0.01)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
# 训练
def train(model, train_loader, criterion, optimizer, scheduler, num_epochs=10):
model.train()
for epoch in range(num_epochs):
running_loss = 0.0
for images, labels in train_loader:
images, labels = images.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(images)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
scheduler.step()
print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss/len(train_loader):.4f}')
# 模型评估
def evaluate(model, val_loader):
model.eval()
all_preds = []
all_labels = []
with torch.no_grad():
for images, labels in val_loader:
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, preds = torch.max(outputs, 1)
all_preds.extend(preds.cpu().numpy())
all_labels.extend(labels.cpu().numpy())
accuracy = accuracy_score(all_labels, all_preds)
precision = precision_score(all_labels, all_preds)
recall = recall_score(all_labels, all_preds)
f1 = f1_score(all_labels, all_preds)
print(f'Accuracy: {accuracy:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}, F1-score: {f1:.4f}')
return all_preds, all_labels
# 可视化
def visualize_predictions(val_loader, model):
model.eval()
images, labels = next(iter(val_loader))
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, preds = torch.max(outputs, 1)
images = images.cpu().numpy()
preds = preds.cpu().numpy()
labels = labels.cpu().numpy()
# 可视化前6个样本
plt.figure(figsize=(12, 8))
for i in range(6):
plt.subplot(2, 3, i + 1)
image = np.transpose(images[i], (1, 2, 0))
image = image * np.array([0.229, 0.224, 0.225]) + np.array([0.485, 0.456, 0.406]) # 反归一化
image = np.clip(image, 0, 1)
plt.imshow(image)
plt.title(f'Pred: {preds[i]}, Actual: {labels[i]}')
plt.axis('off')
plt.show()
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
train(model, train_loader, criterion, optimizer, scheduler, num_epochs=10)
all_preds, all_labels = evaluate(model, val_loader)
visualize_predictions(val_loader, model)