LLM-Intro to Large Language Models

LLM

some LLM's model and weight are not opened to user

what is?

Llama 270b model

  • 2 files

    • parameters file
      • parameter or weight of neural network
      • parameter -- 2bytes, float number
    • code run parameters(inference)
      • c or python, etc
      • for c, 500 lines code without dependency to run
      • self contained package(no network need)
  • how to get parameters?

    • lossy compress large chunk of text (10TB) with 6000 GPU for 12 days (cost 200$) to 140G zip file(gestalt of the text, weights and parameters)
  • what neural do is trying to predict the next word in a sequence. parameters are dispersed throughout the neural network and neurons are connected to each other, fire in a certain way

  • prediction has strong relationship with compression

  • LLM create a correct form of text and fill it with its knowedge. not create a copy of text that was be trained.

  • how does it work?


training stage

  • pre-training

    • expensive
    • base model. get a document generator model
    • it's about knowledge
    • internet documents
  • fine tuning

    • cheaper
    • assistant model. get a assistant model
    • it's about alighment
    • Q&A document
    • training with high quality conversation(question and answer).write labeling instructions to specify how assistant should behave
    • focus on quality not amount
  • stage 3(optional)

    • use comparison label
    • reenforcement learning from human feedback
  • labeling is a human-machine collaboration
  • rank of LLM

LLM scaling laws:

  • more D and N will get better model
  • multimodality. now some LLM like GPT can use different tools to help it with answering questions. browser, calculator, python interpreter.

  • future directions of development in LLM

give LLM system 2 ablility


  • LLM now only have system one(instinctive)
  • convert time to accuracy

self-improvement

  • in narrow domain it is possible to self-improve

customization

experts in certain domain

future of LLM

相关推荐
俊男无期几秒前
【AI入门】什么是训练和推理
人工智能
递归不收敛1 分钟前
多模态学习大纲笔记(未完成)
人工智能·笔记·学习·自然语言处理
碧海银沙音频科技研究院2 分钟前
DiVE长尾识别的虚拟实例蒸馏方法
arm开发·人工智能·深度学习·算法·音视频
彩云回4 分钟前
堆叠泛化(Stacking)
人工智能·机器学习·1024程序员节
AI浩4 分钟前
FMC-DETR:面向航拍视角目标检测的频域解耦多域协同方法
人工智能·目标检测·计算机视觉
AI浩7 分钟前
基于多焦点高斯邻域注意力机制与大规模基准的视频人群定位
人工智能·深度学习·音视频
中杯可乐多加冰8 分钟前
2025长沙1024程序员日:为开发者职业发展插上腾飞之翼
人工智能
8Qi810 分钟前
A Survey of Camouflaged Object Detection and Beyond论文阅读笔记
人工智能·深度学习·目标检测·计算机视觉·伪装目标检测
开发者导航14 分钟前
【开发者导航】全自动 AI 视频创作与发布工具:LuoGen-agent
人工智能·音视频
AI智能架构工坊17 分钟前
提升AI虚拟健康系统开发效率:架构师推荐10款低代码开发平台
android·人工智能·低代码·ai