macOS配置Apocrita及ssh访问及获取GPU权限

记录Queen Mary University of London的macOS配置Apocrita及ssh访问及获取GPU权限

Your ITS Research account has been created, your credentials are printed below:

Username: qp1111

Password: ***********

✅ Step 1:在 Mac 上生成 SSH key(如果你之前没生成过),打开 Terminal(应用 → 实用工具 → Terminal),输入:

powershell 复制代码
ssh-keygen -t ed25519 -C "your_email@example.com"

看到提示:

powershell 复制代码
Enter file in which to save the key (/Users/你的用户名/.ssh/id_ed25519):

直接按 回车(使用默认路径即可)。

再提示设置 passphrase(密钥密码)

👉 可以设置,也可以直接回车跳过。

如果设置了,每次登录需要再输入一次。

完成后你会得到两个文件:

powershell 复制代码
~/.ssh/id_ed25519	私钥(不能给任何人)
~/.ssh/id_ed25519.pub	公钥(要上传给 Apocrita)

查看公钥:

powershell 复制代码
cat ~/.ssh/id_ed25519.pub

✅ Step 2:把公钥上传到 Apocrita 网站

登录 Apocrita 的账户管理页面(你应该有链接,比如 My HPC Account)。

找到类似:

"SSH Keys" → Add SSH Public Key

✅ Step 3:首次登录(使用密码 + 私钥)

在 Mac 终端里:

直接 SSH 登录(USERNAME 替换成你的 Apocrita 用户名,注意不是学校邮箱名)

powershell 复制代码
ssh -i ~/.ssh/id_ed25519 USERNAME@login.hpc.universityname.ac.uk

这一步会要求:

密钥的 passphrase(如果你设置过)& Apocrita 的账户密码

这是 Apocrita 的安全策略:必须私钥 + 密码 双认证

✅ Step 4:修改密码指令

powershell 复制代码
passwd

输入旧的和新的密码

✅ Step 5(可选):让每次登录更方便(推荐)

使用 ssh-agent(不必每次输入密钥密码)

powershell 复制代码
ssh-add ~/.ssh/id_ed25519

然后直接,后输入修改的密码

powershell 复制代码
ssh USERNAME@login.hpc.universityname.ac.uk

✅ 然后是提交一个作业获取GPU权限:这里提示大家要交一个真实的code不能是空跑一下哈哈

You account has now been created.

Access to GPUs is subject to the vetting process below.

Please can you run jobs on the short queue (set job runtime to 1 hour) and provide the job number(s) so we can double check they are optimised to run on the GPU cards? To speed up the validation process, please add echo $SGE_HGR_gpu into your job script to print out the GPU devices allocated to your job.

This is a one-off process which we ask all GPU users to complete before we grant access because we have limited GPU nodes and we want to ensure all jobs submitted to those nodes run correctly.

Once we have inspected your job and are satisfied it will run on the GPU nodes correctly, we'll grant access to run longer jobs.

✅ 创建conda环境

powershell 复制代码
module load miniforge/25.3.0
conda --version
conda create -n llm_gpu python=3.10 -y
conda activate llm_gpu

装环境

powershell 复制代码
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers

跑推理还要下载权重,跑个train吧

✅ Step1: mini_gpt_train.py

powershell 复制代码
vi mini_gpt_train.py
python 复制代码
import math
import time
import torch
import torch.nn as nn
import torch.nn.functional as F


# --------------------
# Configuration
# --------------------
VOCAB_SIZE = 4096
D_MODEL = 512
N_HEAD = 8
NUM_LAYERS = 8
DIM_FF = 2048
SEQ_LEN = 256
BATCH_SIZE = 32
NUM_STEPS = 20000
# NUM_STEPS = 3500 七分钟就跑完了
LOG_INTERVAL = 50


# --------------------
# Mini GPT-like Transformer Decoder
# --------------------
class MultiHeadSelfAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super().__init__()
        assert d_model % num_heads == 0
        self.num_heads = num_heads
        self.head_dim = d_model // num_heads

        self.qkv_proj = nn.Linear(d_model, 3 * d_model)
        self.out_proj = nn.Linear(d_model, d_model)

    def forward(self, x, attn_mask=None):
        B, T, D = x.size()
        qkv = self.qkv_proj(x)
        qkv = qkv.view(B, T, 3, self.num_heads, self.head_dim)
        qkv = qkv.permute(2, 0, 3, 1, 4)
        q, k, v = qkv[0], qkv[1], qkv[2]

        attn_scores = torch.matmul(q, k.transpose(-2, -1)) / math.sqrt(self.head_dim)
        if attn_mask is not None:
            attn_scores = attn_scores.masked_fill(attn_mask == 0, float("-inf"))

        attn_weights = torch.softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_weights, v)

        attn_output = (
            attn_output.transpose(1, 2).contiguous().view(B, T, D)
        )
        out = self.out_proj(attn_output)
        return out


class TransformerBlock(nn.Module):
    def __init__(self, d_model, num_heads, dim_ff):
        super().__init__()
        self.ln1 = nn.LayerNorm(d_model)
        self.attn = MultiHeadSelfAttention(d_model, num_heads)
        self.ln2 = nn.LayerNorm(d_model)
        self.ff = nn.Sequential(
            nn.Linear(d_model, dim_ff),
            nn.GELU(),
            nn.Linear(dim_ff, d_model),
        )

    def forward(self, x, attn_mask=None):
        x = x + self.attn(self.ln1(x), attn_mask=attn_mask)
        x = x + self.ff(self.ln2(x))
        return x


class MiniGPT(nn.Module):
    def __init__(self, vocab_size, d_model, n_head, num_layers, dim_ff, max_seq_len):
        super().__init__()
        self.token_emb = nn.Embedding(vocab_size, d_model)
        self.pos_emb = nn.Embedding(max_seq_len, d_model)
        self.layers = nn.ModuleList(
            [TransformerBlock(d_model, n_head, dim_ff) for _ in range(num_layers)]
        )
        self.ln_f = nn.LayerNorm(d_model)
        self.head = nn.Linear(d_model, vocab_size, bias=False)
        self.max_seq_len = max_seq_len

    def forward(self, idx):
        B, T = idx.size()
        pos = torch.arange(0, T, device=idx.device).unsqueeze(0)

        x = self.token_emb(idx) + self.pos_emb(pos)

        mask = (
            torch.tril(torch.ones(T, T, device=idx.device))
            .unsqueeze(0)
            .unsqueeze(0)
        )

        for layer in self.layers:
            x = layer(x, attn_mask=mask)

        x = self.ln_f(x)
        logits = self.head(x)
        return logits


def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("Using device:", device, flush=True)

    model = MiniGPT(
        vocab_size=VOCAB_SIZE,
        d_model=D_MODEL,
        n_head=N_HEAD,
        num_layers=NUM_LAYERS,
        dim_ff=DIM_FF,
        max_seq_len=SEQ_LEN,
    ).to(device)

    print(
        "Model parameters:",
        sum(p.numel() for p in model.parameters()) / 1e6,
        "M",
        flush=True,
    )

    optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
    criterion = nn.CrossEntropyLoss()

    start_time = time.time()

    for step in range(1, NUM_STEPS + 1):
        x = torch.randint(
            0, VOCAB_SIZE, (BATCH_SIZE, SEQ_LEN), device=device, dtype=torch.long
        )
        y = x[:, 1:].contiguous()
        x_input = x[:, :-1].contiguous()

        logits = model(x_input)
        logits = logits.view(-1, VOCAB_SIZE)
        y = y.view(-1)

        loss = criterion(logits, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if step % LOG_INTERVAL == 0:
            elapsed_min = (time.time() - start_time) / 60.0
            print(
                f"Step {step}/{NUM_STEPS}, loss={loss.item():.4f}, elapsed={elapsed_min:.2f} min",
                flush=True,
            )

    total_min = (time.time() - start_time) / 60.0
    print(f"Training finished. Total time: {total_min:.2f} minutes", flush=True)


if __name__ == "__main__":
    main()

✅ Step2: 作业脚本:mini_gpt_job.sh

powershell 复制代码
vi mini_gpt_job.sh
powershell 复制代码
#!/bin/bash
#$ -l h_rt=1:00:00        # walltime limit: 1 hour
#$ -cwd                   # run in current working directory
#$ -l gpu=1               # request 1 GPU
#$ -pe smp 8              # request 8 CPU cores (required per GPU)
#$ -l h_vmem=8G           # 8 GB RAM per core

echo "Job running on:"
hostname

echo "GPU allocated:"
echo $SGE_HGR_gpu

# Load Miniforge (conda) module
module load miniforge/25.3.0

# Properly initialize conda in a non-interactive shell
source "$(conda info --base)/etc/profile.d/conda.sh"

# Activate the LLM environment
conda activate llm_gpu

echo "Starting Mini-GPT training..."
python3 mini_gpt_train.py

echo "Job finished."

退出编辑

powershell 复制代码
chmod +x mini_gpt_job.sh

✅ Step3: 提交

powershell 复制代码
qsub mini_gpt_job.sh

会返回

powershell 复制代码
Your job 6653935 ("mini_gpt_job.sh") has been submitted

跑完的话把这个id发给管理员就可以了,中途查看状态

powershell 复制代码
qstat
qstat -j <jobid>

查看运行结果

powershell 复制代码
cat mini_gpt_job.sh.o6653935   # stdout(主要输出)
cat mini_gpt_job.sh.e6653935 # stderr(错误输出)
相关推荐
zzzsde1 小时前
【Linux】基础开发工具(1):软件包管理器&&vim编辑器
linux·运维·服务器
断水客1 小时前
如何在手机上搭建Linux学习环境
linux·运维·学习
会飞的土拨鼠呀1 小时前
ubuntu24安装snmp服务
linux·运维
无名小卒Rain1 小时前
docker pull tomcat 报错missing signature key解决办法
运维·docker·容器
java_logo2 小时前
LOBE-CHAT Docker 容器化部署指南
运维·docker·语言模型·容器·llama
zwm_yy2 小时前
服务器检查内存爆满
运维·服务器
摩尔元数2 小时前
2025,服务器通信MES厂商谁主沉浮?
运维·服务器
last demo2 小时前
nfs服务器
linux·运维·服务器·php
翼龙云_cloud3 小时前
阿里云渠道商:自建或RDS怎么迁移到阿里云PolarDB?
运维·服务器·阿里云·云计算