SPARK
Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. More information can be found here
Available containers
Name | sha256 | mpi | labels | descripion |
---|---|---|---|---|
Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif | 73106abd844581fab4c0282b676abaa5373ba4e730e18be67f8f998addc5c066 | none | none | Spark v3.2.1, Hadoop v3.2, java 11 |
Running the container
Starting the cluster (common settings)
module load java/11
export SPARK_MASTER_HOST=`hostname`
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_DIR=/location/to/work/dir
singularity run Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif spark-class "org.apache.spark.deploy.master.Master"
singularity run Singularity.Spark-3.2.1-Hadoop-3.2-Java-11.sif spark-class "org.apache.spark.deploy.worker.Worker" spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT"
Submiting a job (simple example, more here)
spark-submit --class <main-class> --master spark://$SPARK_MASTER_HOST:$SPARK_MASTER_PORT