macOS配置Apocrita及ssh访问及获取GPU权限

记录Queen Mary University of London的macOS配置Apocrita及ssh访问及获取GPU权限

Your ITS Research account has been created, your credentials are printed below:

Username: qp1111

Password: ***********

✅ Step 1:在 Mac 上生成 SSH key(如果你之前没生成过),打开 Terminal(应用 → 实用工具 → Terminal),输入:

powershell 复制代码
ssh-keygen -t ed25519 -C "your_email@example.com"

看到提示:

powershell 复制代码
Enter file in which to save the key (/Users/你的用户名/.ssh/id_ed25519):

直接按 回车(使用默认路径即可)。

再提示设置 passphrase(密钥密码)

👉 可以设置,也可以直接回车跳过。

如果设置了,每次登录需要再输入一次。

完成后你会得到两个文件:

powershell 复制代码
~/.ssh/id_ed25519	私钥(不能给任何人)
~/.ssh/id_ed25519.pub	公钥(要上传给 Apocrita)

查看公钥:

powershell 复制代码
cat ~/.ssh/id_ed25519.pub

✅ Step 2:把公钥上传到 Apocrita 网站

登录 Apocrita 的账户管理页面(你应该有链接,比如 My HPC Account)。

找到类似:

"SSH Keys" → Add SSH Public Key

✅ Step 3:首次登录(使用密码 + 私钥)

在 Mac 终端里:

直接 SSH 登录(USERNAME 替换成你的 Apocrita 用户名,注意不是学校邮箱名)

powershell 复制代码
ssh -i ~/.ssh/id_ed25519 USERNAME@login.hpc.universityname.ac.uk

这一步会要求:

密钥的 passphrase(如果你设置过)& Apocrita 的账户密码

这是 Apocrita 的安全策略:必须私钥 + 密码 双认证

✅ Step 4:修改密码指令

powershell 复制代码
passwd

输入旧的和新的密码

✅ Step 5(可选):让每次登录更方便(推荐)

使用 ssh-agent(不必每次输入密钥密码)

powershell 复制代码
ssh-add ~/.ssh/id_ed25519

然后直接,后输入修改的密码

powershell 复制代码
ssh USERNAME@login.hpc.universityname.ac.uk

✅ 然后是提交一个作业获取GPU权限:这里提示大家要交一个真实的code不能是空跑一下哈哈

You account has now been created.

Access to GPUs is subject to the vetting process below.

Please can you run jobs on the short queue (set job runtime to 1 hour) and provide the job number(s) so we can double check they are optimised to run on the GPU cards? To speed up the validation process, please add echo $SGE_HGR_gpu into your job script to print out the GPU devices allocated to your job.

This is a one-off process which we ask all GPU users to complete before we grant access because we have limited GPU nodes and we want to ensure all jobs submitted to those nodes run correctly.

Once we have inspected your job and are satisfied it will run on the GPU nodes correctly, we'll grant access to run longer jobs.

✅ 创建conda环境

powershell 复制代码
module load miniforge/25.3.0
conda --version
conda create -n llm_gpu python=3.10 -y
conda activate llm_gpu

装环境

powershell 复制代码
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install transformers

跑推理还要下载权重,跑个train吧

✅ Step1: mini_gpt_train.py

powershell 复制代码
vi mini_gpt_train.py
python 复制代码
import math
import time
import torch
import torch.nn as nn
import torch.nn.functional as F


# --------------------
# Configuration
# --------------------
VOCAB_SIZE = 4096
D_MODEL = 512
N_HEAD = 8
NUM_LAYERS = 8
DIM_FF = 2048
SEQ_LEN = 256
BATCH_SIZE = 32
NUM_STEPS = 20000
# NUM_STEPS = 3500 七分钟就跑完了
LOG_INTERVAL = 50


# --------------------
# Mini GPT-like Transformer Decoder
# --------------------
class MultiHeadSelfAttention(nn.Module):
    def __init__(self, d_model, num_heads):
        super().__init__()
        assert d_model % num_heads == 0
        self.num_heads = num_heads
        self.head_dim = d_model // num_heads

        self.qkv_proj = nn.Linear(d_model, 3 * d_model)
        self.out_proj = nn.Linear(d_model, d_model)

    def forward(self, x, attn_mask=None):
        B, T, D = x.size()
        qkv = self.qkv_proj(x)
        qkv = qkv.view(B, T, 3, self.num_heads, self.head_dim)
        qkv = qkv.permute(2, 0, 3, 1, 4)
        q, k, v = qkv[0], qkv[1], qkv[2]

        attn_scores = torch.matmul(q, k.transpose(-2, -1)) / math.sqrt(self.head_dim)
        if attn_mask is not None:
            attn_scores = attn_scores.masked_fill(attn_mask == 0, float("-inf"))

        attn_weights = torch.softmax(attn_scores, dim=-1)
        attn_output = torch.matmul(attn_weights, v)

        attn_output = (
            attn_output.transpose(1, 2).contiguous().view(B, T, D)
        )
        out = self.out_proj(attn_output)
        return out


class TransformerBlock(nn.Module):
    def __init__(self, d_model, num_heads, dim_ff):
        super().__init__()
        self.ln1 = nn.LayerNorm(d_model)
        self.attn = MultiHeadSelfAttention(d_model, num_heads)
        self.ln2 = nn.LayerNorm(d_model)
        self.ff = nn.Sequential(
            nn.Linear(d_model, dim_ff),
            nn.GELU(),
            nn.Linear(dim_ff, d_model),
        )

    def forward(self, x, attn_mask=None):
        x = x + self.attn(self.ln1(x), attn_mask=attn_mask)
        x = x + self.ff(self.ln2(x))
        return x


class MiniGPT(nn.Module):
    def __init__(self, vocab_size, d_model, n_head, num_layers, dim_ff, max_seq_len):
        super().__init__()
        self.token_emb = nn.Embedding(vocab_size, d_model)
        self.pos_emb = nn.Embedding(max_seq_len, d_model)
        self.layers = nn.ModuleList(
            [TransformerBlock(d_model, n_head, dim_ff) for _ in range(num_layers)]
        )
        self.ln_f = nn.LayerNorm(d_model)
        self.head = nn.Linear(d_model, vocab_size, bias=False)
        self.max_seq_len = max_seq_len

    def forward(self, idx):
        B, T = idx.size()
        pos = torch.arange(0, T, device=idx.device).unsqueeze(0)

        x = self.token_emb(idx) + self.pos_emb(pos)

        mask = (
            torch.tril(torch.ones(T, T, device=idx.device))
            .unsqueeze(0)
            .unsqueeze(0)
        )

        for layer in self.layers:
            x = layer(x, attn_mask=mask)

        x = self.ln_f(x)
        logits = self.head(x)
        return logits


def main():
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    print("Using device:", device, flush=True)

    model = MiniGPT(
        vocab_size=VOCAB_SIZE,
        d_model=D_MODEL,
        n_head=N_HEAD,
        num_layers=NUM_LAYERS,
        dim_ff=DIM_FF,
        max_seq_len=SEQ_LEN,
    ).to(device)

    print(
        "Model parameters:",
        sum(p.numel() for p in model.parameters()) / 1e6,
        "M",
        flush=True,
    )

    optimizer = torch.optim.AdamW(model.parameters(), lr=3e-4)
    criterion = nn.CrossEntropyLoss()

    start_time = time.time()

    for step in range(1, NUM_STEPS + 1):
        x = torch.randint(
            0, VOCAB_SIZE, (BATCH_SIZE, SEQ_LEN), device=device, dtype=torch.long
        )
        y = x[:, 1:].contiguous()
        x_input = x[:, :-1].contiguous()

        logits = model(x_input)
        logits = logits.view(-1, VOCAB_SIZE)
        y = y.view(-1)

        loss = criterion(logits, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        if step % LOG_INTERVAL == 0:
            elapsed_min = (time.time() - start_time) / 60.0
            print(
                f"Step {step}/{NUM_STEPS}, loss={loss.item():.4f}, elapsed={elapsed_min:.2f} min",
                flush=True,
            )

    total_min = (time.time() - start_time) / 60.0
    print(f"Training finished. Total time: {total_min:.2f} minutes", flush=True)


if __name__ == "__main__":
    main()

✅ Step2: 作业脚本:mini_gpt_job.sh

powershell 复制代码
vi mini_gpt_job.sh
powershell 复制代码
#!/bin/bash
#$ -l h_rt=1:00:00        # walltime limit: 1 hour
#$ -cwd                   # run in current working directory
#$ -l gpu=1               # request 1 GPU
#$ -pe smp 8              # request 8 CPU cores (required per GPU)
#$ -l h_vmem=8G           # 8 GB RAM per core

echo "Job running on:"
hostname

echo "GPU allocated:"
echo $SGE_HGR_gpu

# Load Miniforge (conda) module
module load miniforge/25.3.0

# Properly initialize conda in a non-interactive shell
source "$(conda info --base)/etc/profile.d/conda.sh"

# Activate the LLM environment
conda activate llm_gpu

echo "Starting Mini-GPT training..."
python3 mini_gpt_train.py

echo "Job finished."

退出编辑

powershell 复制代码
chmod +x mini_gpt_job.sh

✅ Step3: 提交

powershell 复制代码
qsub mini_gpt_job.sh

会返回

powershell 复制代码
Your job 6653935 ("mini_gpt_job.sh") has been submitted

跑完的话把这个id发给管理员就可以了,中途查看状态

powershell 复制代码
qstat
qstat -j <jobid>

查看运行结果

powershell 复制代码
cat mini_gpt_job.sh.o6653935   # stdout(主要输出)
cat mini_gpt_job.sh.e6653935 # stderr(错误输出)
相关推荐
logic_510 小时前
静态路由配置
运维·服务器·网络
zhuzewennamoamtf10 小时前
Linux内核platform抽象、数据结构、内核匹配机制
linux·运维·数据结构
门思科技10 小时前
企业级 LoRaWAN 网关远程运维方案对比:VPN 与 NPS FRP 的技术与安全差异分析
运维·网络·安全
云和数据.ChenGuang10 小时前
Deepseek 持续迭代的模型
运维·运维技术·数据库运维工程师·运维教程
Bigger10 小时前
Tauri(21)——窗口缩放后的”失焦惊魂”,游戏控制权丢失了
前端·macos·app
Bigger11 小时前
Tauri (20)——为什么 NSPanel 窗口不能用官方 API 全屏?
前端·macos·app
物联网软硬件开发-轨物科技11 小时前
【轨物方案】聚焦锯床设备智能化升级,打造工业互联网新范式
运维·科技·物联网
suzhou_speeder11 小时前
企业数字化网络稳定运行与智能化管理解决方案
运维·服务器·网络·交换机·poe·poe交换机
RisunJan12 小时前
Linux命令-grpck命令(验证和修复组配置文件(`/etc/group` 和 `/etc/gshadow`)完整性的工具)
linux·运维·服务器
Evan芙13 小时前
nginx日志管理及日志格式定制
运维·nginx