Introducing Apache Spark and PySpark

DB架构2024-08-02 11:49

1.Apache Spark Component

Spark SQL and DataFrames + Datasets

A module for working with structured data.

MLlib

A scalable machine learning library.

Structured Streaming

This makes it easy to build scalable fault-tolerant streaming applications.

GraphX (legacy)

GraphX is Apache Spark's library for graphs and graph-parallel computation.However, for graph analytics, GraphFrames is recommended instead of GraphX,which isn't being actively developed as much and lacks Python bindings. GraphFrames is an open source general graph processing library that is similar to Apache Spark's GraphX but uses DataFrame-based APIs.

2.Spark Versus PySpark Versus SparkSQL

3.AWS EMR, Azure Databricks, GCP Dataproc

4.PySpark Addresses Challenges of Data Science

倘若您觉得我写的好，那么请您动动你的小手粉一下我，你的小小鼓励会带来更大的动力。Thanks.

上一篇：C# 图形图像技术（通过Graphics绘制图像）

下一篇：spring boot(学习笔记第十六课)