Docker--Spark

What is Apache Spark™?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page⁠. This README file only contains basic setup instructions.

Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

复制代码
docker run -it spark /opt/spark/bin/spark-shell

Try the following command, which should return 1,000,000,000:

复制代码
scala> spark.range(1000 * 1000 * 1000).count()
Interactive Python Shell

The easiest way to start using PySpark is through the Python shell:

复制代码
docker run -it spark:python3 /opt/spark/bin/pyspark

And run the following command, which should also return 1,000,000,000:

复制代码
>>> spark.range(1000 * 1000 * 1000).count()
Interactive R Shell

The easiest way to start using R on Spark is through the R shell:

复制代码
docker run -it spark:r /opt/spark/bin/sparkR
Running Spark on Kubernetes

https://spark.apache.org/docs/latest/running-on-kubernetes.html⁠

Configuration and environment variables

See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable⁠

License

Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.

Licensed under the Apache License, Version 2.0⁠.

As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

Some additional license information which was able to be auto-detected might be found in the repo-info repository's spark/ directory⁠.

As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

相关推荐
草木红13 分钟前
Python 中使用 Docker Compose
开发语言·python·docker·flask
雨奔23 分钟前
Kubernetes Volume 完全指南:数据持久化与容器共享方案
云原生·容器·kubernetes
草木红26 分钟前
Vue3 + Docker + Nginx 完整部署流程
nginx·docker·容器
逆向爱好者27 分钟前
windows环境下通过docker安装使用Superset
windows·docker·容器
耳东哇33 分钟前
linuxdocker配置毫秒镜像下载镜像
docker
亚林瓜子34 分钟前
AWS Catalog中数据搬到Catalog中
大数据·python·spark·云计算·aws·pyspark·glue
何妨呀~40 分钟前
CentOS7.9搭建K8s1.28.2集群实战
云原生·容器·kubernetes
Drache_long40 分钟前
Docker(三)
运维·docker·容器
@土豆40 分钟前
k3s一键部署教程(快速安装轻量k8s)
云原生·容器·kubernetes
雨奔1 小时前
Kubernetes PodSecurityPolicy 完全指南:Pod 安全准入控制核心
安全·容器·kubernetes