LLM-Intro to Large Language Models

LLM

some LLM's model and weight are not opened to user

what is?

Llama 270b model

  • 2 files

    • parameters file
      • parameter or weight of neural network
      • parameter -- 2bytes, float number
    • code run parameters(inference)
      • c or python, etc
      • for c, 500 lines code without dependency to run
      • self contained package(no network need)
  • how to get parameters?

    • lossy compress large chunk of text (10TB) with 6000 GPU for 12 days (cost 200$) to 140G zip file(gestalt of the text, weights and parameters)
  • what neural do is trying to predict the next word in a sequence. parameters are dispersed throughout the neural network and neurons are connected to each other, fire in a certain way

  • prediction has strong relationship with compression

  • LLM create a correct form of text and fill it with its knowedge. not create a copy of text that was be trained.

  • how does it work?


training stage

  • pre-training

    • expensive
    • base model. get a document generator model
    • it's about knowledge
    • internet documents
  • fine tuning

    • cheaper
    • assistant model. get a assistant model
    • it's about alighment
    • Q&A document
    • training with high quality conversation(question and answer).write labeling instructions to specify how assistant should behave
    • focus on quality not amount
  • stage 3(optional)

    • use comparison label
    • reenforcement learning from human feedback
  • labeling is a human-machine collaboration
  • rank of LLM

LLM scaling laws:

  • more D and N will get better model
  • multimodality. now some LLM like GPT can use different tools to help it with answering questions. browser, calculator, python interpreter.

  • future directions of development in LLM

give LLM system 2 ablility


  • LLM now only have system one(instinctive)
  • convert time to accuracy

self-improvement

  • in narrow domain it is possible to self-improve

customization

experts in certain domain

future of LLM

相关推荐
gogoMark3 小时前
口播视频怎么剪!利用AI提高口播视频剪辑效率并增强”网感”
人工智能·音视频
2201_754918413 小时前
OpenCV 特征检测全面解析与实战应用
人工智能·opencv·计算机视觉
love530love5 小时前
Windows避坑部署CosyVoice多语言大语言模型
人工智能·windows·python·语言模型·自然语言处理·pycharm
985小水博一枚呀5 小时前
【AI大模型学习路线】第二阶段之RAG基础与架构——第七章(【项目实战】基于RAG的PDF文档助手)技术方案与架构设计?
人工智能·学习·语言模型·架构·大模型
白熊1886 小时前
【图像生成大模型】Wan2.1:下一代开源大规模视频生成模型
人工智能·计算机视觉·开源·文生图·音视频
weixin_514548896 小时前
一种开源的高斯泼溅实现库——gsplat: An Open-Source Library for Gaussian Splatting
人工智能·计算机视觉·3d
四口鲸鱼爱吃盐6 小时前
BMVC2023 | 多样化高层特征以提升对抗迁移性
人工智能·深度学习·cnn·vit·对抗攻击·迁移攻击
Echo``7 小时前
3:OpenCV—视频播放
图像处理·人工智能·opencv·算法·机器学习·视觉检测·音视频
Douglassssssss7 小时前
【深度学习】使用块的网络(VGG)
网络·人工智能·深度学习
okok__TXF7 小时前
SpringBoot3+AI
java·人工智能·spring