Docker--Spark

What is Apache Spark™?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page⁠. This README file only contains basic setup instructions.

Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

复制代码
docker run -it spark /opt/spark/bin/spark-shell

Try the following command, which should return 1,000,000,000:

复制代码
scala> spark.range(1000 * 1000 * 1000).count()
Interactive Python Shell

The easiest way to start using PySpark is through the Python shell:

复制代码
docker run -it spark:python3 /opt/spark/bin/pyspark

And run the following command, which should also return 1,000,000,000:

复制代码
>>> spark.range(1000 * 1000 * 1000).count()
Interactive R Shell

The easiest way to start using R on Spark is through the R shell:

复制代码
docker run -it spark:r /opt/spark/bin/sparkR
Running Spark on Kubernetes

https://spark.apache.org/docs/latest/running-on-kubernetes.html⁠

Configuration and environment variables

See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable⁠

License

Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.

Licensed under the Apache License, Version 2.0⁠.

As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

Some additional license information which was able to be auto-detected might be found in the repo-info repository's spark/ directory⁠.

As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

相关推荐
YuanDaima204817 小时前
Docker 工程化安装与核心命令实战
运维·人工智能·docker·微服务·容器·bash
cgsthtm18 小时前
rocky linux 8.10 下的 podman 配置镜像加速
docker·podman·镜像加速·podman-docker·毫秒镜像
牛奔18 小时前
在 Docker 容器里测试外部域名延迟
运维·docker·容器
Elastic 中国社区官方博客19 小时前
Kubernetes 可观测性:用于更安全 EKS 故障排查的 MCP 专家 agents
大数据·elasticsearch·搜索引擎·云原生·容器·kubernetes·全文检索
颖火虫盟主19 小时前
Claude Code Cron 定时任务:从入门到自动化
运维·docker·自动化
IT策士20 小时前
深入浅出:使用 Gunicorn + Nginx + Docker 将 Django 项目部署到云服务器
nginx·docker·gunicorn
jran-20 小时前
Docker 数据卷&应用部署
运维·docker·容器
jran-20 小时前
Docker dockerfile镜像制作&compose服务编排&私有仓库
java·docker·容器
CCPC不拿奖不改名20 小时前
PostgreSQL数据库部署linux服务器流程
linux·服务器·数据库·windows·python·docker·postgresql
cgsthtm21 小时前
openEuler release 24.03 (LTS-SP2) 安装 docker
docker·systemctl·dnf·openeuler 24.03