mirror of
https://github.com/SCST-project/scst.git
synced 2026-05-14 01:01:27 +00:00
git-svn-id: http://svn.code.sf.net/p/scst/svn/trunk@6933 d57e44dd-8a1f-0410-8b47-8ef2f437770f
188 lines
6.8 KiB
Plaintext
188 lines
6.8 KiB
Plaintext
Synchronization of the Persistent Reservation Information via the DLM
|
|
=====================================================================
|
|
|
|
Introduction
|
|
------------
|
|
|
|
In an H.A. setup where multiple servers share data it is required that
|
|
the persistent reservation state is kept consistent across the cluster.
|
|
One possible approach is to use the DLM to keep the PR state synchronized
|
|
across nodes. Since the DLM can associate data with each DLM lock object,
|
|
DLM lock objects can be used to store PR data. The data that is associated
|
|
with a DLM lock object is called the Lock Value Block or LVB. The code in
|
|
scst_dlm.c uses the DLM to keep PR data synchronized across all nodes in
|
|
a cluster.
|
|
|
|
|
|
Software Components
|
|
-------------------
|
|
|
|
The following software components are needed by the code in scst_dlm.c:
|
|
* The DLM kernel driver (dlm.ko). This driver is only built if CONFIG_DLM
|
|
has been set.
|
|
* The DLM control daemon (dlm_controld.pcmk). This daemon passes cluster
|
|
node IDs and IP addresses to the DLM kernel driver via the configfs
|
|
interface of the DLM kernel driver.
|
|
* Corosync to manage cluster membership of the cluster nodes and to assign
|
|
a node ID to each cluster node.
|
|
* A facility to start the DLM control daemon, e.g. Pacemaker.
|
|
|
|
On most Linux distributions the software packages that contain this software
|
|
have the names kernel, dlm, corosync and pacemaker.
|
|
|
|
NOTE! You might need to apply a DLM bugfix patch, see scst-devel mailing list
|
|
thread https://sourceforge.net/p/scst/mailman/scst-devel/thread/CADHfD59FK6seaammL8b9LM3U3tw5HvYp3kPTk_r1OYkPR7bPhg@mail.gmail.com/#msg34761854
|
|
for more details.
|
|
|
|
|
|
DLM Configuration
|
|
-----------------
|
|
|
|
The DLM kernel module supports the TCP and SCTP communication protocols. An
|
|
advantage of SCTP for H.A. purposes is that it supports multihoming. One of
|
|
these protocols can be selected via the -r <proto> option of dlm_controld.
|
|
That option can be set via the "args" argument of the Pacemaker dlm_controld
|
|
resource. For more information, see also:
|
|
* The dlm_controld(8) man page.
|
|
* In the "Pacemaker 1.1, Clusters from Scratch" guide, the section "Configure
|
|
the Cluster for the DLM".
|
|
* The dlm_controld resource agent: /usr/lib/ocf/resource.d/pacemaker/controld
|
|
|
|
Here is an example of how to set up a cluster with two nodes and how to
|
|
configure and start the DLM control daemon:
|
|
1. If a network switch is present between the two nodes, enable IPv4 multicast
|
|
on that switch.
|
|
2. Copy /etc/corosync/corosync.conf.example into /etc/corosync/corosync.conf
|
|
and edit that file.
|
|
3. If a file /etc/default/corosync exists, enable Corosync in that file.
|
|
4. Start Corosync:
|
|
systemctl start corosync || /etc/init.d/corosync start
|
|
5. Check that all configured Corosync rings have two members:
|
|
corosync-cfgtool -s && { corosync-cmapctl | grep members; }
|
|
6. Start pcsd:
|
|
systemctl start pcsd || /etc/init.d/pcsd start
|
|
7. Set up cluster authentication:
|
|
pcs cluster auth centos7-vm centos7b-vm
|
|
8. Start Pacemaker:
|
|
systemctl start pacemaker || /etc/init.d/pacemaker start
|
|
9. If the cluster has only two nodes, disable the Pacemaker quorum policy and
|
|
disable STONITH:
|
|
crm_attribute -t crm_config -n no-quorum-policy -v ignore
|
|
crm_attribute -t crm_config -n stonith-enabled -v false
|
|
10. Check the cluster status:
|
|
pcs status
|
|
11. Create a Pacemaker resource for dlm_controld:
|
|
pcs resource delete dlm
|
|
pcs resource create dlm ocf:pacemaker:controld \
|
|
args="-q0 -f0" allow_stonith_disabled=true \
|
|
op monitor timeout=60 \
|
|
--clone interleave=true
|
|
12. Check the Pacemaker status:
|
|
pcs status
|
|
|
|
|
|
Startup and Shutdown
|
|
--------------------
|
|
|
|
The startup sequence is as follows:
|
|
* Load the DLM kernel module. If not loaded explicitly, "modprobe scst" will
|
|
load the DLM kernel module implicitly.
|
|
* Load and configure SCST with all target ports disabled.
|
|
* Enable cluster mode for all SCST devices that can be accessed through more
|
|
than one cluster node:
|
|
for x in /sys/kernel/scst_tgt/handlers/*/*/; do
|
|
echo 1 >$x/cluster_mode &
|
|
done
|
|
wait
|
|
* Start Corosync and Pacemaker.
|
|
* Wait until Pacemaker has reached the idle state:
|
|
pacemaker_dc_status() {
|
|
local dc
|
|
|
|
dc="$(crmadmin -D 2>/dev/null | sed 's/Designated Controller is: //')"
|
|
[ -n "$dc" ] &&
|
|
crmadmin -S "$dc" 2>/dev/null |
|
|
sed 's/^Status of crmd@[^[:blank:]]*:[[:blank:]]\([^[:blank:]]*\).*/\1/'
|
|
}
|
|
timeout=300
|
|
for ((i=0;i<timeout;i++)); do
|
|
if [ "$(pacemaker_dc_status)" = "S_IDLE" ]; then
|
|
echo "Pacemaker reached idle state after $i s"
|
|
break
|
|
fi
|
|
sleep 1
|
|
done
|
|
if [ "$i" = "$timeout" ]; then
|
|
echo "Pacemaker did not reach the IDLE state in $i s"
|
|
fi
|
|
* Enable SCST target ports.
|
|
* If no DLM resource has been configured in Pacemaker, start dlm_controld.pcmk
|
|
explicitly.
|
|
|
|
The proper shutdown order is as follows:
|
|
* Tell SCST to stop accepting SCSI commands and wait until all initiators have
|
|
logged out:
|
|
for x in $(find /sys/kernel/scst_tgt/targets/ -name enabled); do
|
|
echo 0 > $x &
|
|
done
|
|
wait
|
|
while ls -Ad /sys/kernel/scst_tgt/targets/*/*/sessions/* 2>&1 |
|
|
grep -vE '/sys/kernel/scst_tgt/targets/(copy_manager|scst_local)/'; do
|
|
sleep 1
|
|
done
|
|
* Tell SCST to release the DLM lockspaces:
|
|
while grep -q '^1$' /sys/kernel/scst_tgt/devices/*/cluster_mode 2>/dev/null
|
|
do
|
|
for x in /sys/kernel/scst_tgt/devices/*/cluster_mode; do
|
|
{ [ -e "$x" ] && echo 0 > "$x"; } &
|
|
done
|
|
wait
|
|
sleep 1
|
|
done
|
|
* Stop Pacemaker and Corosync
|
|
* Unload the SCST kernel modules
|
|
* Unload the DLM kernel driver
|
|
|
|
|
|
Lockspace names
|
|
---------------
|
|
|
|
The names of the DLM lockspaces used by SCST follow the following pattern:
|
|
scst-<t10_dev_id> where t10_dev_id is the T10 device ID of the SCST device
|
|
associated with this lockspace.
|
|
|
|
|
|
Notes
|
|
-----
|
|
|
|
Since the lockspace name depends on the t10_dev_id it is not allowed to
|
|
change the t10_dev_id if cluster mode has been enabled.
|
|
|
|
|
|
Testing
|
|
-------
|
|
|
|
Two examples of test suites for the cluster PR support code are:
|
|
* The SCSI conformance tests in the libiscsi project.
|
|
* The Windows Cluster Validation Tests
|
|
(https://technet.microsoft.com/en-us/library/Cc726064.aspx).
|
|
|
|
|
|
To do
|
|
-----
|
|
|
|
* Ensure that PREEMPT AND ABORT affects all cluster nodes instead of
|
|
only the cluster node that received this command.
|
|
|
|
|
|
See also
|
|
--------
|
|
|
|
* Bart Van Assche, Using the DLM as a distributed in-memory database, Linux
|
|
Plumbers North America, Seattle, August 20, 2015
|
|
(https://linuxplumbersconf.org/2015/ocw//system/presentations/2691/original/Using%20the%20DLM%20as%20a%20Distributed%20In-Memory%20Database.pdf).
|
|
* Andrew Beekhof, Pacemaker Configuration Explained, 2015
|
|
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/).
|
|
* Andrew Beekhof, Clusters from Scratch, 2015
|
|
(http://clusterlabs.org/doc/en-US/Pacemaker/1.1-pcs/html/Clusters_from_Scratch/index.html).
|