This tutorial will guide you through the entire process of setting up a highly available NFS server. To proceed, you must have the following:
1. 2 servers with similar hard disk setup (These will be used to create a redundant nfs server)
2. atleast 1 server where the nfs share will be mounted.
3. Static IPs
4. Basic knowledge of vi (:q! = quit, :wq = write and then quit, i = insert mode, esc = leave insert mode, dd = delete line when not in insert mode)
First off, install CentOS on both machines. During the install process, create a separate blank partition on both machines to be used as your nfs mount. Set the mount point to /data during installation.
From this point on i’m going to be referring to both nfs servers by their IPs and hostnames. Server1 will be nfs1 with ip 10.132.196.221 and server2 will be nfs2 with ip 10.132.196.222. Your private IPs might be different so make sure to put in the correct IPs where necessary within this tutorial.
Do the following on nfs1(10.132.196.221) and nfs2(10.132.196.222):
vi /etc/fstab
This will give you the mount points and devices on your system. Look for the /data mount point and comment it out to prevent it form automatically being mounted on boot. Take note of the device for the /data mount point. Here is what my fstab looks like.
/dev/VolGroup00/LogVol00 / ext3 defaults 1 1
#/dev/VolGroup00/LogVol02 /data ext3 defaults 1 2
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol01 swap swap defaults 0 0
I’m using LVM so my device names are weird. Yours might just be /dev/sda1, /dev/sda2, etc…
Now lets go ahead an unmount that data partition because heartbeat will take care of mounting the partition on the live nfs server.
umount /data
Make sure that ntp and ntpdate are installed on both of the nfs servers.
yum install ntp ntpdate
The time on both servers must be identical.
Now lets check and make sure that the nfs service is not running on startup and that selinux is also turned off.
setup
Go down to “Firewall Configuration” and disable selinux and the firewall. Next, lets go to “System Services” and make sure that the nfs service is not enabled. You will need to reboot your machines for these settings to take affect.
Lets go ahead and create our exports for the nfs so that the nfs share can be mounted on other machines in your network.
vi /etc/exports
Add the following line to your exports file but make sure to replace the IP. My private IPs are in the form of 10.132.196.1. Yours may be 192.168.0.1.
/data/export/ 10.132.196.0/255.255.255.0(rw,no_root_squash,no_all_squash,sync)
The above line in my exports file will allow me to mount the nfs share anywhere within my local network. If you only want to allow a specific machine to be able to mount the nfs share then use a specific IP instead of the 0 at the end. For example, here is an exports file that allows only 10.132.196.24 to mount the nfs share.
/data/export/ 10.132.196.24/255.255.255.0(rw,no_root_squash,no_all_squash,sync)
Now we need to install DRBD and the DRBD kernel module.
yum install drbd kmod-drbd
After installing drbd we need to setup the config file for it.
vi /etc/drbd.conf
Let me show you my config file and then we’ll go over it.
common {
protocol C;syncer {
rate 15M;
al-extents 257;
}
}resource r0 {
handlers {
pri-on-incon-degr “halt -f”;
}disk {
on-io-error detach;
}startup {
degr-wfc-timeout 120;
}on nfs1 {
device /dev/drbd0;
disk /dev/VolGroup00/LogVol02;
address 10.132.196.221:7789;
meta-disk internal;
}on nfs2 {
device /dev/drbd0;
disk /dev/VolGroup00/LogVol02;
address 10.132.196.222:7789;
meta-disk internal;
}
}
So lets start from the top.
You may choose your desired protocol but Protocol C is the most commonly used one and it is the safest method.
The rest of the config is pretty self explanatory. Replace nfs1 and nfs2 with the hostnames of your nfs servers. To get the hostnames use the following command on both servers:
uname -n
Then replace the disk value with the device name from your fstab file that you commented out. Enter the IP address of each server and use port 7789. The last part is the meta-disk. I used an internal meta-disk because I only have one hard disk in the server and it would not give me any benefit to create a separate partition for the metadata. If you have a raid setup or a separate disk from your data partition that you can use for the meta data than go ahead and create a 150mb partition. Replace the word “internal” in the config file with your device name that you used for the meta data partition.
Now that we finally have our drbd.conf file ready we can move on. Lets go ahead and enable the drbd kernel module.
modprobe drbd
Now that the kernel module is enabled lets start up drbd.
drbdadm up all
This will start drbd, now lets check its status.
cat /proc/drbd
You can always use the above command to check the status of drbd. The above command should show you something like this.
0: cs:Connected st:Secondary/Secondary ld:Inconsistent
ns:0 nr:0 dw:0 dr:0 al:0 bm:1548 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured
You should get some more data before it but the above part is what we are interested in. If you notice it shows that drbd is connected and both nodes are in secondary mode. This is because we have not assigned which node is going to be the primary yet. It also says the data is inconsistent because we have not done the initial sync yet.
I am going to set nfs1 to be my primary node and nfs2 to be my secondary node. If nfs1 fails, nfs2 will takeover but if nfs1 comes back online then all the data from nfs2 will be synced back to nfs1 and nfs1 will take over again.
First of all lets go ahead and delete any data that was created on the /data partition that we setup during our intial OS installation. Be very careful with the command below. Make sure to use the appropriate device because all data on that device will be lost.
dd if=/dev/zero bs=1M count=1 of=/dev/VolGroup00/LogVol02; sync
Instead of “/dev/VolGroup00/LogVol02″, replace it with your device for the /data parition. Now that the partition is completely erased on both servers, lets create the meta data.
drbdadm create-md r0
Do the following ONLY on nfs1(10.132.196.221)
Now that the metadata is created, we can move onto assigning a primary node and conducting the initial sync. It is absolutely important that you only execute the following command on the primary node. It doesn’t matter which node you choose to be the primary since they should be identical. In my case, I decided to use nfs1 as the primary.
drbdadm — –overwrite-data-of-peer primary r0
Ok, now we just have to sit back and wait for the initial sync to finish. This is going to take some time to finish even though there is no data on each device, drbd has to sync every single block on /data partition from nfs1 to nfs2. You can check the status by using the following command.
cat /proc/drbd
Do the following on nfs1(10.132.196.221) and nfs2(10.132.196.222):
After the initial sync is finished, “cat /proc/drbd” should show something like this.
0: cs:Connected st:Primary/Secondary ld:Consistent
ns:37139 nr:0 dw:0 dr:49035 al:0 bm:6 lo:0 pe:0 ua:0 ap:0
1: cs:Unconfigured
If you notice, we are still connected and have a primary and secondary node with consistent data.
Do the following ONLY on nfs1(10.132.196.221):
Now lets make an ext3 file system on our drbd device and mount it. Since drbd is running, the ext3 file system will also be created on the secondary node.
mkfs.ext3 /dev/drbd0
The above command will create an ext3 file system on the drbd device. Now lets go ahead and mount it.
mount -t ext3 /dev/drbd0 /data
NFS has important information that it stores in /var/lib/nfs that is required to function properly. In order to preserve file locks and other such information, we need to have that data stored on the drbd device so that if the primary node failes, NFS on the secondary node will continue from right where the primary node left off.
mv /var/lib/nfs/ /data/
ln -s /data/nfs/ /var/lib/nfs
mkdir /data/export
umount /data
So lets go over what we just did. We moved the nfs folder from /var/lib to /data. Then we created a symbolic link from /var/lib/nfs to /data/nfs since the operating system is still going to look for /var/lib/nfs when nfs is running. Then we created an export directory in /data to store all the actual data that we are going to use for our nfs share. Finally, we un-mounted the /data partition since we finished what we were doing.
Do the following ONLY on nfs2(10.132.196.222):
Since we moved the nfs folder to /data, that was synced over to the secondary node as well. We just need to create the symbolic link so that when the /data partition is mounted on nfs2 we have a link to the nfs data.
rm -rf /var/lib/nfs/
ln -s /data/nfs/ /var/lib/nfs
So we removed the nfs folder and created a symbolic link from /var/lib/nfs to /data/nfs. The symbolic link will be broken since the /data parition is not mounted. Don’t worry about that because in the event of a failover that partiton will be mounted and everything will work just fine =).
Do the following on nfs1(10.132.196.221) and nfs2(10.132.196.222):
Now onto heartbeat. Heartbeat is going to make sure partitions are umount/mount and services are started/stopped in the even of a fail over. So lets get to it.
yum install heartbeat
Ok, now that we have heartbeat installed, lets go ahead and create our 3 necessary config files.
vi /etc/ha.d/ha.cf
Paste the following data into ha.cf and save it (:wq).
logfacility local0
keepalive 2
deadtime 10
bcast eth0
node nfs1 nfs2
Replace nfs1 and nfs2 with your server hostnames. You can retrieve the hostname for each server by executing the following command.
uname -n
Now lets create our resource config file.
vi /etc/ha.d/haresources
Put the following data in there and save it.
nfs1 IPaddr::10.132.196.220/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext3 nfslock nfs
The first word is the hostname for the primary server and should be identical on both servers. I have chosen nfs1 to be my primary server. The net part is the virtual IP. This is the virtual IP we are going to use for the live nfs server, whether it be nfs1 or nfs2. This ip can be any IP that is not being used within your network. For example, if your nfs servers have IPs 192.168.0.11 and 192.168.0.12 then maybe you can use 192.168.0.10 as your virtual IP. Its up to you.
Finally lets create our authentication file.
vi /etc/ha.d/authkeys
Put the following data in that file and save it.
auth 3
3 md5 somepassword12345
Replace “somepassword12345″ with your own password. This will be used by both of the heartbeat daemons on nfs1 and nfs2 to authenticate each other. The filw should be read-only by root so lets go ahead and do that.
chmod 600 /etc/ha.d/authkeys
Thats it! Lets just start drbd and heartbeat on both servers now.
/etc/init.d/drbd start
/etc/init.d/heartbeat start
Now we have a redundant NFS server running! Lets do a couple tests on the primary nfs server.
ifconfig
We should see our virtual IP address show up. Mine looks like this.
eth0 Link encap:Ethernet HWaddr 00:14:22:7C:65:6B
inet addr:10.132.196.221 Bcast:10.132.196.255 Mask:255.255.255.0
inet6 addr: fe80::214:22ff:fe7c:656b/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:897038670 errors:0 dropped:367 overruns:0 frame:0
TX packets:1204564630 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:245125213396 (228.2 GiB) TX bytes:1194659917566 (1.0 TiB)
Interrupt:169eth0:0 Link encap:Ethernet HWaddr 00:14:22:7C:65:6B
inet addr:10.132.196.220 Bcast:10.132.196.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:169lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:372 errors:0 dropped:0 overruns:0 frame:0
TX packets:372 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:50327 (49.1 KiB) TX bytes:50327 (49.1 KiB)
If you notice, eth0:0 has the virtual IP that I used for the nfs servers. You should only see this on the nfs server that is live.
Now lets check our partitions.
df -h
The primary nfs server should show the /data mounted while the secondary nfs should not show the drbd device mounted. My primary nfs server looks like this.
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
15G 2.5G 11G 19% /
/dev/sda1 99M 19M 76M 20% /boot
tmpfs 4.0G 0 4.0G 0% /dev/shm
/dev/drbd0 259G 172G 75G 70% /data
Now lets go ahead and mount the nfs share on another server. First lets create the /data folder on the new server. This doesn’t have to be named “data” and can be named whatever you like.
mkdir /data
Now lets setup our nfs mount.
vi /etc/fstab
At the end of the file, add the following line.
10.132.196.220:/data/export /data nfs rw 0 0
Replace the IP address with the virtual IP address that you chose for your nfs servers. Now lets mount the partition for the first time.
mount /data
Thats it! Now you have a redundant nfs server and client actually using a redundant nfs server. Just to simulate a failover lets test some stuff out. Go ahead and create some files in the “/data” folder from your nfs client machine.
cd /data
touch testfile1.txt
mkdir testdirectory
Now that we have some data in the “/data” folder we can simulate a failed nfs server. If we did everything right, the data that we just created was created on the primary nfs server and synced to the secondary nfs server via drbd. Lets stop heartbeat on nfs1 so that nfs2 thinks that nfs1 has failed.
/etc/init.d/heatbeat stop
Now that heartbeat is stopped on nfs1 run the following commands to make sure that the /data partition was unmounted and the virtual IP is gone.
df -h
ifconfig
When you check the same thing on nfs2, you should see that the /data partition has been mounted and the virtual IP is now live.
Now if you go back to your nfs client machine and do an “ls” in the /data directory, you should see that your data is still there. Lets change our test data around.
cd /data
mv testfile1.txt testfile2.txt
rm -rf testdirectory
Now lets go back to nfs1 and start up heartbeat again.
/etc/init.d/heatbeat start
Give it a couple seconds and check the partition and virtual IP.
df -h
ifconfig
You should see that the partition is mounted again and the virtual IP is also live on nfs1. If you check the same thing on nfs2, the partition will be un-mounted and the virtual IP should be gone. If you check the /data directory on your nfs client machine, you should see that the “testfile2.txt” file is still there.
Congratulations, you have a fully functional and highly available nfs server! Check out http://www.drbd.org for more information on DRBD.
Confused on this line:
drbdadm — –overwrite-data-of-peer primary r0
What’s with the dashes? Every combination of dashes and spaces fails, and the help file mentions nothing about this command.
its actually double dashes. Wordpress autoformats the dashes to one big dash for some reason.
the command is
drbdadm [DASH][DASH] [DASH][DASH]overwrite-data-of-peer primary r0
Great Tutorial, soon I will setup mine to our video server
thanks