Skip to content

RAID health via SMTP

Since upgrading to Proxmox 5, I’ve had very little visibility into my server’s health. HPE hasn’t released packages for debian stretch (proxmox 5) and really for detailed information, those packages are my only choice. I only care about the RAID array however, and after poking around I found that the jessie package of hpacucli installs and works just fine. I used to just pop into the hpe web interface to check on health but that interface isn’t fully functional now. It’s a good time to automate this and do daily checks.

hpacucli can perform a quick query and give me everything I need. So I’m going to perform the query and write it to a file, then set up a cron job to perform that same query daily and just run a diff true/false on it. If true, I’m going to shoot myself an email with the output of the query.

hpacucli

at the time of writing this, here’s my uname -a output

Linux capehenlopen 4.10.15-1-pve #1 SMP PVE 4.10.15-15 (Fri, 23 Jun 2017 08:57:55 +0200) x86_64 GNU/Linux

I added the following hpe source

deb http://downloads.linux.hpe.com/SDR/repo/mcp jessie/current non-free

and following hpe’s recommendation, that line is in

/etc/apt/sources.list.d/mcp.list

Once the sources is added, the only thing needed here is the hpacucli package, so run an

apt-get update
apt-get install hpacucli

If you want to add the pub keys for hpe, follow their instructions or just run the following, then install hpacucli again without warnings.

curl http://downloads.linux.hpe.com/SDR/hpPublicKey1024.pub | apt-key add -
curl http://downloads.linux.hpe.com/SDR/hpPublicKey2048.pub | apt-key add -
curl http://downloads.linux.hpe.com/SDR/hpPublicKey2048_key1.pub | apt-key add -
curl http://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub | apt-key add -

However with proxmox, you may need to install curl first

apt-get install curl

mandatory word of warning, be very careful running curl against urls without first verifying the code there. Curl offers convenience at a huge cost if you’re lazy. Here we’re piping it to apt-key, not huge deal.


Once that’s all set, we should be able to run hpacucli and get our output. Here’s what my two servers provide.

 

root@capehenlopen:/raidlogs# hpacucli ctrl all show config

Smart Array P410 in Slot 2 (sn: PACCRID12050DTV)

array A (SATA, Unused Space: 0 MB)


 logicaldrive 1 (2.7 TB, RAID 5, OK)

physicaldrive 2I:1:1 (port 2I:box 1:bay 1, SATA, 1 TB, OK)
 physicaldrive 2I:1:2 (port 2I:box 1:bay 2, SATA, 1 TB, OK)
 physicaldrive 2I:1:3 (port 2I:box 1:bay 3, SATA, 1 TB, OK)
 physicaldrive 2I:1:4 (port 2I:box 1:bay 4, SATA, 1 TB, OK)

array B (SATA, Unused Space: 0 MB)


 logicaldrive 2 (465.7 GB, RAID 1, OK)

physicaldrive 1I:1:7 (port 1I:box 1:bay 7, SATA, 500 GB, OK)
 physicaldrive 1I:1:8 (port 1I:box 1:bay 8, SATA, 500 GB, OK)

SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 50014380188F7EBF)

and

root@gordonspond:~# hpacucli ctrl all show config

Smart Array P410 in Slot 3 (sn: PACCRCN81ZZ1W0X)

array A (SATA, Unused Space: 0 MB)


 logicaldrive 1 (2.7 TB, RAID 5, OK)

physicaldrive 1I:0:1 (port 1I:box 0:bay 1, SATA, 1 TB, OK)
 physicaldrive 1I:0:2 (port 1I:box 0:bay 2, SATA, 1 TB, OK)
 physicaldrive 1I:0:3 (port 1I:box 0:bay 3, SATA, 1 TB, OK)
 physicaldrive 1I:0:4 (port 1I:box 0:bay 4, SATA, 1 TB, OK)

SEP (Vendor ID PMCSIERA, Model SRC 8x6G) 250 (WWID: 50014380118CDA7F)

So all the information here is perfect. It’s static unless physical changes or errors occur. I can create the baseline file and never have to update it unless I add drives or move things around. Next step is to set up SMTP.


POSTFIX

Proxmox comes with postfix pre installed, so we just need to edit it’s config. I use Sendgrid for SMTP so I’ll just paste the needed blob to make that work. That being said, I had authentication errors initially. Sendgrid’s documentation however pointed out a needed SASL authentication library for authenticating with them

So if using sendgrid, be sure to

apt-get install libsasl2-modules

and you should be good to go. Here’s a working config

# Sendgrid Settings
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = static:username:password
smtp_sasl_security_options = noanonymous
smtp_tls_security_level = encrypt
header_size_limit = 4096000
relayhost = [smtp.sendgrid.net]:587

Which can just be appended to /etc/postfix/main.cf

In case you’ve never played with postfix, look to the following for logs if you have issues.

/var/log/mail.err

/var/log/mail.log

mail.err likely won’t be much use unless you use a sasl password db and have permission errors, for sendgrid we just throw our credentials right in the config.

All that’s left is a test. Restart postfix and paste the following.

systemctl restart postfix

echo "test body" | mail -s "test subject" -a "From: test@mzimmerer.com" your_email@provider.com

Putting it together

For this, I created a directory in my root directory, not important where this goes so long as it’s not on the RAID array we’re querying . We need to get our baseline file. I just did a

test=$(hpacucli ctrl all show config)
echo "$test" > baseline.txt

(we could have just done hpacucli ctrl all show config > baseline.txt but within the script we compare the files using echo, so a “>” character show ups and flags the files as different. This solves that problem. We use echo in the script so we can run the hpacucli command just once and create a variable, rather than running it to check the diff, then running it again just to email it.)

and now we can run our diff against /raidlogs/baseline.txt

Below is the script

root@capehenlopen:/raidlogs# more chkraid.sh 
#!/bin/bash
#run hp raid check and compare to baseline file.
#if diff, email contents of output

raidhlth=$(hpacucli ctrl all show config)
diff -q /raidlogs/baseline.txt <(echo "$raidhlth") > /dev/null
if [[ $? == 0 ]]
then
    echo "$raidhlth" | mail -s "raid check OK" -a "From: gordonspond@mzimmerer.com" youremail@gmail.com
else
    echo "$raidhlth" | mail -s "RAID array ALERT!" -a "From: gordonspond@mzimmerer.com" youremail@gmail.com
fi

I will change this up a bit. Once I see it working consistently I will get rid of the OK action. A duplicate script is on the other server as well, with a different From: address so I can identify the two. Really that’s it though, a very simple short script takes care of this for us. We just now need to add a cronjob to run this every day. Copy the script into /etc/cron.daily/ and ensure the execute bit is set and file extension is removed, we’re good to go.

cp /raidlogs/chkraid.sh /etc/cron.daily/chkraid

chmod 755 /etc/cron.daily/chkraid

 

Enjoy!