Skip to content

Using persistence storage (outside Docker)

David Lauzon edited this page Oct 2, 2015 · 1 revision

Data storage

This image is configured (in hdfs-site.xml) to store HDFS data at the following locations: file:///data/dfs/data (for DataNode), file:///data/dfs/name (for NameNode), and file:///data/dfs/namesecondary (for SecondaryNameNode). To enable data persistence accross HDFS restarts, the data should be stored outside Docker. In the examples below, a directory from the host is mounted into the container. To follow these examples, please create a local directory as follow:

mkdir -p ~/data/hadoop/hdfs

Formating the namenode (only do this step once)

docker run --rm -i -h hdfs-namenode \
	-v $HOME/data/hadoop/hdfs:/data \
	gelog/hadoop hdfs namenode -format

Starting the NameNode

This command starts a container for the HDFS NameNode in the background, and starts tailing its logs.

docker run -d --name hdfs-namenode \
	-h hdfs-namenode -p 50070:50070 \
	-v $HOME/data/hadoop/hdfs:/data \
	gelog/hadoop hdfs namenode && \
docker logs -f hdfs-namenode

If everything looks good in the logs (no errors), hit CTRL + C to detach the console from the logs.

Starting a DataNode

This command starts a separate container for the HDFS DataNode in the background, link it with the NameNode container, and starts tailing its logs.

docker run -d --name hdfs-datanode1 \
	-h hdfs-datanode1 -p 50075:50075 \
	--link=hdfs-namenode:hdfs-namenode \
	-v $HOME/data/hadoop/hdfs:/data \
	gelog/hadoop hdfs datanode && \
docker logs -f hdfs-datanode1

If everything looks good in the logs (no errors), hit CTRL + C to detach the console from the logs.

Starting a Secondary NameNode

This command starts a separate container for the HDFS Secondary NameNode in the background, link it with the NameNode container, and starts tailing its logs.

docker run -d --name hdfs-secondarynamenode \
	-h hdfs-secondarynamenode -p 50090:50090 \
	--link=hdfs-namenode:hdfs-namenode \
	-v $HOME/data/hadoop/hdfs:/data \
	gelog/hadoop hdfs secondarynamenode && \
docker logs -f hdfs-secondarynamenode

If everything looks good in the logs (no errors), hit CTRL + C to detach the console from the logs.