SPARK

Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. More information can be found here

Available containers

Name sha256 mpi labels descripion
Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif 73106abd844581fab4c0282b676abaa5373ba4e730e18be67f8f998addc5c066 none none Spark v3.2.1, Hadoop v3.2, java 11

Running the container

Starting the cluster (common settings)

module load java/11

export SPARK_MASTER_HOST=`hostname`

export SPARK_MASTER_PORT=7077

export SPARK_WORKER_DIR=/location/to/work/dir

singularity run Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif spark-class "org.apache.spark.deploy.master.Master"
singularity run Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif spark-class "org.apache.spark.deploy.worker.Worker" spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"

Submiting a job (simple example, more here)

spark-submit --class <main-class>  --master spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT