Docker--Spark

What is Apache Spark™?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

Online Documentation

You can find the latest Spark documentation, including a programming guide, on the project web page⁠. This README file only contains basic setup instructions.

Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

复制代码
docker run -it spark /opt/spark/bin/spark-shell

Try the following command, which should return 1,000,000,000:

复制代码
scala> spark.range(1000 * 1000 * 1000).count()
Interactive Python Shell

The easiest way to start using PySpark is through the Python shell:

复制代码
docker run -it spark:python3 /opt/spark/bin/pyspark

And run the following command, which should also return 1,000,000,000:

复制代码
>>> spark.range(1000 * 1000 * 1000).count()
Interactive R Shell

The easiest way to start using R on Spark is through the R shell:

复制代码
docker run -it spark:r /opt/spark/bin/sparkR
Running Spark on Kubernetes

https://spark.apache.org/docs/latest/running-on-kubernetes.html⁠

Configuration and environment variables

See more in https://github.com/apache/spark-docker/blob/master/OVERVIEW.md#environment-variable⁠

License

Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.

Licensed under the Apache License, Version 2.0⁠.

As with all Docker images, these likely also contain other software which may be under other licenses (such as Bash, etc from the base distribution, along with any direct or indirect dependencies of the primary software being contained).

Some additional license information which was able to be auto-detected might be found in the repo-info repository's spark/ directory⁠.

As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

相关推荐
Empty_77721 分钟前
K8S-网络原理
网络·容器·kubernetes
永不停歇的蜗牛24 分钟前
K8S之创建cm指令create和 apply的区别
java·容器·kubernetes
java_logo35 分钟前
Transmission Docker 容器化部署指南
运维·docker·容器·kubernetes·apache·rocketmq·transmission
♛识尔如昼♛1 小时前
SONiC (5) - SONiC 的架构
docker·数据中心·sonic·lldp·sonic 架构
ljp11121 小时前
UNRaid安装chfs
docker·免费·文件共享
昵称为空C1 小时前
Spring Boot 项目docker分层镜像构建案例
spring boot·ci/cd·docker
kali-Myon1 小时前
快速解决 Docker 环境中无法打开 gdb 调试窗口以及 tmux 中无法滚动页面内容和无法选中复制的问题
运维·安全·docker·容器·gdb·pwn·tmux
beijingliushao1 小时前
103-Spark之Standalone环境测试
大数据·ajax·spark
管理大亨2 小时前
ELK + Redis Docker 企业级部署落地方案
大数据·运维·elk·elasticsearch·docker·jenkins
一周困⁸天.2 小时前
K8S-网络组件 Calico
网络·容器·kubernetes