Hadoop
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. More information can be found here
Available containers
Name | sha256 | mpi | labels | descripion |
---|---|---|---|---|
Singularity.Hadoop-3.2-Java-11.sif | 022a101faab9acc056c0315df903099141d7a8f58ec8ffdcbc22f36edb4c0dfa | none | none | Hadoop v3.2, Java 11 |
Running the container
Settings files
Minimal configuration. Store the following files in some location e.g. /host/hadoop/conf
export HADOOP_CONF_DIF=/host/hadoop/conf
export HADOOP_LOGS=/host/hadoop/logs
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://nodemanager:port</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/host/hadoop/data</value>
</property>
</configuration>
Please replace nodemanager, port and /host/hadoop/data with the appropriate values
hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>rep_value</value>
</property>
</configuration>
Please replace rep_value with the appropriate number of data replications
Starting cluster
Before first time start, the dfs should be formated
singularity run -B $HADOOP_LOGS:/app/software/Hadoop/3.2Hadoop-3.2-Java-11/logs Singularity.Hadoop-3.2-Java-11.sif hdfs --config $HADOOP_CONF_DIR namenode -format
Starting namenode
singularity run -B $HADOOP_LOGS:/app/software/Hadoop/3.2Hadoop-3.2-Java-11/logs Singularity.Hadoop-3.2-Java-11.sif hdfs --config $HADOOP_CONF_DIR --daemon start namenode
Starting datanodes
On each node run
singularity run -B $HADOOP_LOGS:/app/software/Hadoop/3.2Hadoop-3.2-Java-11/logs Singularity.Hadoop-3.2-Java-11.sif hdfs --config $HADOOP_CONF_DIR --daemon start datanode
Checking the cluster
singularity run -B $HADOOP_LOGS:/app/software/Hadoop/3.2Hadoop-3.2-Java-11/logs Singularity.Hadoop-3.2-Java-11.sif hdfs --config $HADOOP_CONF_DIR dfsadmin -report