SPARK

Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. More information can be found here

Available containers

Name	sha256	mpi	labels	descripion
Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif	73106abd844581fab4c0282b676abaa5373ba4e730e18be67f8f998addc5c066	none	none	Spark v3.2.1, Hadoop v3.2, java 11

Running the container

Starting the cluster (common settings)

module load java/11

export SPARK_MASTER_HOST=`hostname`

export SPARK_MASTER_PORT=7077

export SPARK_WORKER_DIR=/location/to/work/dir

singularity run Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif spark-class "org.apache.spark.deploy.master.Master"
singularity run Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif spark-class "org.apache.spark.deploy.worker.Worker" spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"

Submiting a job (simple example, more here)

spark-submit --class <main-class>  --master spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search