【Spark】What is the difference between Input and Shuffle Read

TaiKuLaHa2023-11-01 19:59

Spark调参过程中

保持每个task的 input + shuffle read 量在300-500M左右比较合适

The Spark UI is documented here: https://spark.apache.org/docs/3.0.1/web-ui.html

The relevant paragraph reads:

Input: Bytes read from storage in this stage
Output: Bytes written in storage in this stage
Shuffle read: Total shuffle bytes and records read, includes both data read locally and data read from remote executors
Shuffle write: Bytes and records written to disk in order to be read by a shuffle in a future stage

上一篇：基于若依的ruoyi-nbcio流程管理系统增加仿钉钉流程设计(一）

下一篇：PyQt5：构建目标检测算法GUI界面 (附python代码)

热门推荐

01GitHub 镜像站点 02BongoCat - 跨平台键盘猫动画工具 03UV安装并设置国内源 04Linux下V2Ray安装配置指南 05GitLab 零基础入门指南：从安装到项目管理全流程 062025软件测试面试八股文（含答案+文档）07两千字总结：Codex 国内如何安装和使用的教程，以及如何设置中文回答 08KGG转MP3工具|非KGM文件|解密音频 09NVIDIA显卡驱动、CUDA、cuDNN 和 TensorRT 版本匹配指南 10在VSCode配置Java开发环境的保姆级教程（适配各类AI编程IDE）