Docker--Spark

What is Apache Spark™?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page⁠. This README file only contains basic setup instructions.

Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

复制代码
docker run -it spark /opt/spark/bin/spark-shell

Try the following command, which should return 1,000,000,000:

复制代码
scala> spark.range(1000 * 1000 * 1000).count()
Interactive Python Shell

The easiest way to start using PySpark is through the Python shell:

复制代码
docker run -it spark:python3 /opt/spark/bin/pyspark

And run the following command, which should also return 1,000,000,000:

复制代码
>>> spark.range(1000 * 1000 * 1000).count()
Interactive R Shell

The easiest way to start using R on Spark is through the R shell:

复制代码
docker run -it spark:r /opt/spark/bin/sparkR
Running Spark on Kubernetes

https://spark.apache.org/docs/latest/running-on-kubernetes.html⁠

Configuration and environment variables

See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable⁠

License

Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.

Licensed under the Apache License, Version 2.0⁠.

As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

Some additional license information which was able to be auto-detected might be found in the repo-info repository's spark/ directory⁠.

As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

相关推荐
mizuhokaga20 小时前
Linux内网集群基于Docker 安装 Chat2DB
linux·运维·docker
西柚00120 小时前
Ubuntu22.04.5 + Docker + MySQL 5.7
mysql·docker·容器
Sean‘21 小时前
AKS 集群离线部署 kube-state-metrics 文档
运维·docker·容器
万象.1 天前
docker存储卷分类与实操
docker·容器
F1FJJ1 天前
只是想查个数据,不想装 phpMyAdmin
数据库·网络协议·容器·开源软件
.柒宇.1 天前
基于 RHEL 9.7 搭建 Kubernetes v1.34 集群实战:Docker 运行时 (cri-dockerd) 与国内源配置详解
docker·云原生·容器·kubernetes·kubelet
F1FJJ1 天前
Shield CLI:MySQL 插件 vs phpMyAdmin:轻量 Web 数据库管理工具对比
前端·网络·数据库·网络协议·mysql·容器
钟智强1 天前
从2.7GB到481MB:我的Docker Compose优化实战,以及为什么不能全信AI
后端·docker
YMWM_1 天前
docker在jetson thor的应用
运维·docker·容器·jetson thor
木子欢儿1 天前
使用 Docker 快速搭建 MinIO 文件存储服务
运维·docker·容器