LLM-Intro to Large Language Models

LLM

some LLM's model and weight are not opened to user

what is?

Llama 270b model

  • 2 files

    • parameters file
      • parameter or weight of neural network
      • parameter -- 2bytes, float number
    • code run parameters(inference)
      • c or python, etc
      • for c, 500 lines code without dependency to run
      • self contained package(no network need)
  • how to get parameters?

    • lossy compress large chunk of text (10TB) with 6000 GPU for 12 days (cost 200$) to 140G zip file(gestalt of the text, weights and parameters)
  • what neural do is trying to predict the next word in a sequence. parameters are dispersed throughout the neural network and neurons are connected to each other, fire in a certain way

  • prediction has strong relationship with compression

  • LLM create a correct form of text and fill it with its knowedge. not create a copy of text that was be trained.

  • how does it work?


training stage

  • pre-training

    • expensive
    • base model. get a document generator model
    • it's about knowledge
    • internet documents
  • fine tuning

    • cheaper
    • assistant model. get a assistant model
    • it's about alighment
    • Q&A document
    • training with high quality conversation(question and answer).write labeling instructions to specify how assistant should behave
    • focus on quality not amount
  • stage 3(optional)

    • use comparison label
    • reenforcement learning from human feedback
  • labeling is a human-machine collaboration
  • rank of LLM

LLM scaling laws:

  • more D and N will get better model
  • multimodality. now some LLM like GPT can use different tools to help it with answering questions. browser, calculator, python interpreter.

  • future directions of development in LLM

give LLM system 2 ablility


  • LLM now only have system one(instinctive)
  • convert time to accuracy

self-improvement

  • in narrow domain it is possible to self-improve

customization

experts in certain domain

future of LLM

相关推荐
Datawhale9 分钟前
最新豆包大模型发布!火山引擎推出Agent开发新范式
人工智能·火山引擎
m0_751336392 小时前
深度学习驱动的流体力学计算前沿技术体系
人工智能·深度学习·机器学习·新能源·fluent·航空航天·流体力学
MWHLS4 小时前
[AAAI Oral] 简单通用的公平分类方法
人工智能·论文·图像分类·语义分割·reid
AI technophile4 小时前
OpenCV计算机视觉实战(11)——边缘检测详解
人工智能·opencv·计算机视觉
百万蹄蹄向前冲5 小时前
大学期末考,AI定制个性化考试体验
前端·人工智能·面试
SuperW5 小时前
RV1126+OPENCV在视频中添加时间戳
人工智能·opencv·音视频
AI扶我青云志6 小时前
激活函数-sigmoid、tanh、relu、softmax对比
人工智能·深度学习·神经网络
云云3216 小时前
封号零风险」策略:用亚矩阵云手机解锁Telegram的100%隐匿工作流
人工智能·智能手机·矩阵
蓦然回首却已人去楼空6 小时前
用mac的ollama访问模型,为什么会出现模型胡乱输出,然后过一会儿再访问,就又变成正常的
人工智能·macos
点云SLAM6 小时前
Pytorch中gather()函数详解和实战示例
人工智能·pytorch·python·深度学习·机器学习·计算视觉·gather函数