Skip to content

Commit

Permalink
Add spark doc
Browse files Browse the repository at this point in the history
  • Loading branch information
Yikun committed Sep 16, 2022
1 parent 8e63e65 commit a16cd1a
Show file tree
Hide file tree
Showing 8 changed files with 57 additions and 0 deletions.
1 change: 1 addition & 0 deletions spark/README-short.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.
49 changes: 49 additions & 0 deletions spark/content.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# What is Apache Spark™?

Apache Spark™ is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, pandas API on Spark for pandas workloads, MLlib for machine learning, GraphX for graph processing, and Structured Streaming for stream processing.

%%LOGO%%

## Online Documentation
You can find the latest Spark documentation, including a programming guide, on the [project web page](https://spark.apache.org/documentation.html). This README file only contains basic setup instructions.

## Interactive Scala Shell

The easiest way to start using Spark is through the Scala shell:

```
docker run -it spark /opt/spark/bin/spark-shell
```

Try the following command, which should return 1,000,000,000:

```
scala> spark.range(1000 * 1000 * 1000).count()
```

## Interactive Python Shell

The easiest way to start using PySpark is through the Python shell:

```
docker run -it spark:python3 /opt/spark/bin/pyspark
```

And run the following command, which should also return 1,000,000,000:

```
>>> spark.range(1000 * 1000 * 1000).count()
```

## Interactive R Shell

The easiest way to start using R on Spark is through the R shell:

```
docker run -it apache/spark-r /opt/spark/bin/sparkR
```

## Running Spark on Kubernetes

https://spark.apache.org/docs/latest/running-on-kubernetes.html

1 change: 1 addition & 0 deletions spark/get-help.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[Apache Spark™ community](https://spark.apache.org/community.html)
1 change: 1 addition & 0 deletions spark/github-repo
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://github.com/apache/spark-docker
1 change: 1 addition & 0 deletions spark/issues.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
https://issues.apache.org/jira/browse/SPARK
3 changes: 3 additions & 0 deletions spark/license.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Apache Spark, Spark, Apache, the Apache feather logo, and the Apache Spark project logo are trademarks of The Apache Software Foundation.

Licensed under the [Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0).
Binary file added spark/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions spark/maintainer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[Apache Spark](https://spark.apache.org/committers.html)

0 comments on commit a16cd1a

Please sign in to comment.