Docker--Spark

What is Apache Spark™?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page⁠. This README file only contains basic setup instructions.

Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

复制代码
docker run -it spark /opt/spark/bin/spark-shell

Try the following command, which should return 1,000,000,000:

复制代码
scala> spark.range(1000 * 1000 * 1000).count()
Interactive Python Shell

The easiest way to start using PySpark is through the Python shell:

复制代码
docker run -it spark:python3 /opt/spark/bin/pyspark

And run the following command, which should also return 1,000,000,000:

复制代码
>>> spark.range(1000 * 1000 * 1000).count()
Interactive R Shell

The easiest way to start using R on Spark is through the R shell:

复制代码
docker run -it spark:r /opt/spark/bin/sparkR
Running Spark on Kubernetes

https://spark.apache.org/docs/latest/running-on-kubernetes.html⁠

Configuration and environment variables

See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable⁠

License

Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.

Licensed under the Apache License, Version 2.0⁠.

As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

Some additional license information which was able to be auto-detected might be found in the repo-info repository's spark/ directory⁠.

As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

相关推荐
江畔何人初27 分钟前
k8s静态pod
云原生·容器·kubernetes
u0104058363 小时前
淘客返利系统的CI/CD流水线搭建:Docker镜像构建与K8s部署实践
ci/cd·docker·kubernetes
市场部需要一个软件开发岗位3 小时前
docker操作记录
运维·docker·容器
南墙上的石头3 小时前
docker日常使用命令汇总
docker·容器·rpc
小明_GLC3 小时前
Docker 构建镜像一直卡在下载 Python?
python·docker·容器
JY.yuyu3 小时前
Docker搭建Web安全渗透测试靶场
运维·docker·容器
小义_3 小时前
【Docker】知识三
linux·docker·云原生·容器
En^_^Joy3 小时前
Docker入门:快速安装与实战指南
运维·docker·容器
70asunflower4 小时前
Docker 镜像的完整内容解析
运维·docker·容器
API开发4 小时前
apiSQL网关 for Docker 离线安装和升级教程
运维·docker·容器·api·api网关·apisql·替代graphql