Building HA storage cluster with RHEL5, DRBD and GFS2

2011-02-11 (All day)
This is a practical advice based on my own experience of building HA storage cluster. Configuration described here used to test of cluster capabilities, in production you shouldn't use manual fencing.
  1. To run cluster you have to install next packages: kmod-drbd82 gfs2-utils cman kmod-gfs drbd82
  2. We are going to run 2-nodes cluster, first has "node1" name and 10.10.10.250 address, second one has "node2" name and 10.10.10.251 address.
  3. Check if next records presents in /etc/hosts of both nodes:
    [root@node1]# cat /etc/hosts | grep node
    10.10.10.250 node1
    10.10.10.251 node2
  4. Setup hostname in /etc/sysconfig/network of both nodes:
    [root@node1]# cat /etc/sysconfig/network
    NETWORKING=yes
    NETWORKING_IPV6=no
    HOSTNAME=node1
    [root@node2]# cat /etc/sysconfig/network
    NETWORKING=yes
    NETWORKING_IPV6=no
    HOSTNAME=node2
  5. Nodes has the same 250Gb hard disks, partitioned as shown below:
    [root@node1]# fdisk -l /dev/sda

    Disk /dev/sda: 250.0 GB, 250059350016 bytes
    255 heads, 63 sectors/track, 30401 cylinders
    Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot Start End Blocks Id System
    /dev/sda1 * 1 4177 33551721 83 Linux
    /dev/sda2 4178 4308 1052257+ 82 Linux swap/Solaris
    /dev/sda3 4309 30401 209592022+ 83 Linux
    /dev/sda3 is the shared storage we are going to build.
  6. Create /etc/drbd.conf:
    #
    # please have a a look at the example configuration file in
    # /usr/share/doc/drbd82/drbd.conf
    #
    resource r0 {
    protocol C;
    startup {
    wfc-timeout 120;
    degr-wfc-timeout 120;
    become-primary-on both;
    }
    net {
    allow-two-primaries;
    cram-hmac-alg "sha1";
    shared-secret "123456";

    after-sb-0pri discard-least-changes;
    after-sb-1pri discard-secondary;
    after-sb-2pri disconnect;

    rr-conflict disconnect;
    }
    disk {
    on-io-error detach;
    }
    syncer {
    rate 100M;
    }
    on node1 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.10.10.251:7789;
    meta-disk internal;
    }
    on node2 {
    device /dev/drbd0;
    disk /dev/sda3;
    address 10.10.10.250:7789;
    meta-disk internal;
    }
    }
  7. Run drbdadm to create storage on primary node:
    [root@node1]# drbdadm create-md r0
  8. Now run drbd in both nodes:
    [root@node1]# /etc/init.d/drbd start
  9. Make primary node1:
    [root@node1]# drbdsetup /dev/drbd0 primary -o
  10. Create GFS2 partition on first node:
    [root@node1]# mkfs.gfs2 -p lock_dlm -t Cluster:gfs2 -j 8 /dev/drbd0
  11. Here is the general cluster config /etc/cluster/cluster.conf:
    <?xml version="1.0"?>
    <cluster name="Cluster" config_version="1">
    <cman two_node="1" expected_votes="1"/>
    <clusternodes>
    <clusternode name="node1" nodeid="1" votes="1">
    <fence>
    <method name="first>
    <device name="first" ipaddr="10.10.10.250"/>
    </method>
    </fence>
    </clusternode>
    <clusternode name="node2" nodeid="2" votes="1">
    <fence>
    <method name="second">
    <device name="second" ipaddr="10.10.10.251"/>
    </method>
    </fence>
    </clusternode>
    </clusternodes>

    <fencedevices>
    <device name="first" agent="fence_manual"/>
    <device name="second" agent="fence_manual"/>
    </fencedevices>
    <rm>
    <failoverdomains/>
    <resources>
    <clusterfs fstype="gfs2" name="sda3" mountpoint="/home" device="/dev/sda3" options="rw"/>
    </resources>
    </rm>
    </cluster>
  12. After running /etc/init.d/cman and /etc/init.d/drbd, cluster run in degraded mode and starts syncing. You may controll progress watching /proc/drbd, syncronization of 200Gb partition in 100Mbit took approximelly 4 hours. Within this time you can't mount partiion on secondary (secondary - last started node) node, wait until syncronization compete.
  13. Now you may run this script /etc/init.d/mountgfs2.sh, which mounts on system's boot shared partition:
    #! /bin/sh
    #
    # chkconfig: 345 71 7
    # description: Mounts and unmounts the GFS2 partition
    #
    # /etc/init.d/mountgfs2.sh
    #
    # Needs to be mounted after drbd start and
    # unmounted before drbd stop
    #

    # Mount gfs2 partition on /synchronized
    case "$1" in
    start)
    echo "Mounting gfs2 partition"
    drbdadm primary r0
    mount -t gfs2 /dev/drbd0 /home
    ;;
    stop)
    echo "Umounting gfs2 partition"
    umount /home
    drbdadm secondary r0
    ;;
    *)
    echo "Usage: /etc/init.d/mountgfs2.sh {start|stop}"
    exit 1
    ;;
    esac
    exit 0
  14. Run chkconfig --add /etc/init.d/mountgfs2.sh. After restart nodes shared partition mounted in /home
  15. If you done fine, you will see in /proc/drbd stuff like this:
    [root@node1 /]# cat /proc/drbd
    version: 8.2.6 (api:88/proto:86-88)
    GIT-hash: 3e69822d3bb4920a8c1bfdf7d647169eba7d2eb4 build by buildsvn@c5-i386-build, 2008-10-03 11:42:32
    0: cs:Connected st:Primary/Primary ds:UpToDate/UpToDate C r---
    ns:548916 nr:68 dw:104 dr:564688 al:4 bm:82 lo:0 pe:0 ua:0 ap:0 oos:0
That's all folks!
AttachmentSize
drbd.conf_.txt697 bytes
cluster.conf_.txt826 bytes
mountgfs2.sh_.txt1.1 KB