Contents
Intro
Here I will try to describe some production server tricks to keep production server self maintainable.
Tricks below are good for small standalone server where central monitoring system is useless overhead.
Here I will assume that you have:
- Gentoo
- OpenRC
- AWS
- cronie
- syslog-ng
- mailx
- ansi2html
- monit
- ntpd
- awscli
- sendmail-like (for example, msmtp)
Working sendmail
Here I will assume that you have working sendmail so emails could go out to admin email and admin will be able to respond in time.
If sendmail is not configured then install it:
emerge msmtp
Edit /etc/msmtprc
:
account gmail
host smtp.gmail.com
port 587
from username@gmail.com
user username
password password
tls on
tls_starttls on
auth on
# This allows msmtp to be used like /usr/sbin/sendmail.
account default : gmail
# Resolve local account name to external email.
aliases /etc/aliases
# Syslog logging with facility LOG_MAIL instead of the default LOG_USER
syslog LOG_MAIL
Edit /etc/aliases
# resolve root's email address
root: username@gmail.com
# resolve anybody else's email address
default: username@gmail.com
Backup block devices
You need to backup all block devices to snapshots to be able to recover everything if something will fail.
Create special user and security key in AWS console and just give him permissions to use EC2 snapshots. Also it’s recommended to lock that user to IP address of server.
The following IAM policy could be used:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot",
"ec2:DescribeSnapshots",
"ec2:DescribeVolumes"
],
"Resource": "*",
"Condition": {
"IpAddress": {
"aws:SourceIp": "A.B.C.D/32"
}
}
}
]
}
Then you need to configure that user:
aws configure
There are a lot of available utilities to make rsnapshot like backups fro block devices in AWS.
You could use that utility: https://github.com/sormy/aws-ec2-rsnapshot
The script below will run each day, create snapshot, try to sync fs before snapshot and keep 7 last daily snapshots.
/etc/cron.daily/aws-ec2-rsnapshot:
#!/bin/bash
export HOME=/root
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
volume_id=$(
aws ec2 describe-volumes \
--filters Name=attachment.instance-id,Values=$instance_id \
--query '(Volumes[*].Attachments[?Device==`/dev/sda1`])[0][0].VolumeId' \
--output text
)
output=$(aws-ec2-rsnapshot artembutusov.com/daily/root 7 $volume_id sync)
if [ $? != 0 ]; then
echo "$output" | mailx -s "Unable to create volume snapshot" root
fi
The script below will run each week, create snapshot, try to sync fs before snapshot and keep 8 last weekly snapshots (2 months).
/etc/cron.weekly/aws-ec2-rsnapshot:
#!/bin/bash
export HOME=/root
instance_id=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
volume_id=$(
aws ec2 describe-volumes \
--filters Name=attachment.instance-id,Values=$instance_id \
--query '(Volumes[*].Attachments[?Device==`/dev/sda1`])[0][0].VolumeId' \
--output text
)
output=$(aws-ec2-rsnapshot artembutusov.com/weekly/root 8 $volume_id sync)
if [ $? != 0 ]; then
echo "$output" | mailx -s "Unable to create volume snapshot" root
fi
Both scripts will not show any errors if snapshot creation was successfully completed but will show all details if it’s failed.
Keep updated portage tree
You should have updated portage tree all time.
There are some reasons for that
– you will be able to merge fresh package anything in any time
– you will be able to detect if system is too outdated
– IMPORTANT: you will be able to identify if system is affected by security vulnerabilities published in GLSA
I have also eix
installed so the easies way is to put script in cron.
/etc/cron.daily/eix-sync:
#!/bin/bash
eix-sync > /dev/null 2>&1
GLSA check
We need to check GLSA every day and notify admin if system is vulnerable.
The script below will check for GLSA and send email if something was found.
/etc/cron.daily/glsa-check:
#!/bin/bash
glsa=$(glsa-check -t -n -v all 2>&1)
[[ "$glsa" =~ "This system is not affected by any of the listed GLSAs" ]] || (
echo "$glsa" | mailx -s "GLSA check failed" root
)
or if you want color email notification:
#!/bin/bash
glsa=$(glsa-check -t -v all 2>&1)
[[ "$glsa" =~ "This system is not affected by any of the listed GLSAs" ]] || (
echo "$glsa" | ansi2html | mailx -a "Content-Type: text/html" -s "GLSA check failed" root
)
PS: Coloring requires ansi2html
package installed.
SSL check (cron)
It’s very important to refresh SSL in time and do not get any problems for your services due to outdated SSL.
Install the following script.
/usr/local/bin/ssl-check:
#!/bin/bash
cert_path=$1
days=${2:-1}
if [ -z "$cert_path" ]; then
echo "Usage: ssl-check <certificate_path> [days_to_notify]"
exit 0
fi
if [ ! -f "$cert_path" ]; then
echo "Unable to find file: $cert_path"
exit 1
fi
now_time=$(date +%s)
cert_cname=$(openssl x509 -text -noout -in "$cert_path" | grep 'Subject:.* CN=' | sed 's/^.*=//')
cert_end_ts=$(openssl x509 -text -noout -in "$cert_path" | grep "Not After" | sed 's/\s*Not After : //')
cert_end_time=$(date -d "$cert_end_ts" +%s)
cert_expire_days=$(( ($cert_end_time - $now_time) / 60 / 60 / 24 ))
if [ "$cert_expire_days" -le 0 ]; then
echo "SSL certificate $cert_cname is expired"
exit 2
elif [ "$cert_expire_days" -le "$days" ]; then
echo "SSL certificate $cert_cname will expire in $cert_expire_days day(s)"
exit 3
else
echo "SSL certificate $cert_cname will expire in $cert_expire_days day(s)"
exit 0
fi
The script below will check daily local certificate and notify admin if certificate will expire in 7 days.
/etc/cron.daily/ssl-check:
#!/bin/bash
output=$(ssl-check /etc/letsencrypt/live/artembutusov.com/cert.pem 7)
if [ $? != 0 ]; then
echo "$output" | mailx -s "SSL check failed" root
fi
SSL check (monit)
Certificate check could be implemented even easier with monit
but it will require active service with that SSL.
/etc/monit.d/ssl-check:
check host ssl-artembutusov-com with address artembutusov.com
if failed
port 443
protocol https
and certificate valid > 7 days
then alert
Monit check
If you are using monit
then monit will be able to monitor almost everything except situations when monit is died for some reason =). So we need a simple check script which will let us know if monit died.
/etc/cron.hourly/monit-check:
#!/bin/bash
ps -p $(cat /var/run/monit.pid 2> /dev/null) > /dev/null 2>&1 || (
echo "Monit is not running" | mailx -s "Monit is not running" root
)
NTP sync
Bad time could create a lot of problems sometimes so we need to be sure that time is good.
/etc/cron.hourly/ntp-sync:
#!/bin/bash
ntpd -gq > /dev/null 2>&1
Check updates
I most cases we update system when we need new features or system is vulnerable or system is too old.
First issue is always manual fix, second issue is addressed by GLSA check and last issue should be monitored.
The system became outdated when attempt to update world with latest portage creates problems for portage.
We could detect that by script below.
/etc/cron.weekly/check-updates
#!/bin/bash
output=$(emerge --update --deep --newuse --color n world -vp 2>&1)
if [ $? != 0 ]; then
echo "$output" | mailx -s "There are new conflicting updates available for system" root
elif [[ ! "$output" =~ "Total: 0 packages" ]]; then
echo "$output" | mailx -s "There are new updates available for system" root
fi
The same but with coloring:
#!/bin/bash
output=$(emerge --update --deep --newuse --color y world -vp 2>&1)
if [ $? != 0 ]; then
echo "$output" | ansi2html | mailx -a "Content-Type: text/html" -s "There are new conflicting updates available for system" root
elif [[ ! "$output" =~ "Total: 0 packages" ]]; then
echo "$output" | ansi2html | mailx -a "Content-Type: text/html" -s "There are new updates available for system" root
fi
PS: Coloring requires ansi2html
package installed.
syslog alerts
Some processes could write in syslog very interesting messages which should be delivered to admin.
The script below required for syslog-ng
to deliver messages to admin.
/usr/local/bin/syslog-alert-sender:
#!/bin/bash
strwrap () {
local str="$1"
local width="${2-80}"
if [ "${#str}" -gt "$width" ]; then
echo -n "${str:0:$width}..."
else
echo -n "$str"
fi
}
while read line; do
echo "$line" | mailx -s "$(strwrap "$line")" "root"
done < /dev/stdin
Apple Mail client could hang if it will get too long “Subject” header on email, so we need to wrap subject and limit it to 80 characters.
In the example below I would like to deliver all messages with level from error
to emergency
to admin.
/etc/syslog-ng/syslog-ng.conf:
...
filter sshd_ignore { level(err); program("^sshd$"); };
filter monit_ignore { level(err); program("^monit$"); };
filter alert { level(err..emerg); not filter(sshd_ignore); not filter(monit_ignore); };
destination alert_sender { program("syslog-alert-sender"); };
log { source(src); filter(alert); destination(alert_sender); };
I guess you don’t want to get an email each time bots trying to brute-force your ssh password, so sshd is excluded here. You need to also exclude monit, otherwise each time monit will have an issue you will start getting a lot of emails from your server (for each error entry in log).
By default syslog do not store message level in log so it will be very hard to identify level and may be create some kind of ignore rule without that information.
Format of syslog-ng logging could be changed like below.
/etc/syslog-ng/syslog-ng.conf:
...
template full_message { template("$ISODATE $HOST $FACILITY.$LEVEL $MSGHDR$MSG\n"); };
options {
...
file-template(full_message);
proto-template(full_message);
};
...
ssh fail2ban
We need to protect SSH from suspicious activity:
Install syslog-ng
, fail2ban
and iptables
.
/etc/fail2ban/jail.d/sshd.conf:
[sshd]
enabled = true
logpath = /var/log/messages
action = iptables[name=SSH, port=ssh, protocol=tcp]
portage tree (monit)
If for some reason portage tree will became outdated we need to notify admin.
/etc/monit.d/portage-tree:
check file portage-tree path /usr/portage/metadata/timestamp.chk
# allow 5 cycles for portage update (file could be unavailable)
if timestamp > 2 days within 5 cycles then alert
service status (monit)
If for some reason any service will crash we need to notify admin.
/etc/monit.d/rc-status:
check program rc-status with path "/bin/rc-status --crashed --nocolor"
if status == 0 then alert
free space (monit)
Missing free space could freeze whole server so we need to monitor that.
/etc/monit.d/rootfs:
check filesystem rootfs with path /dev/xvda1
if space usage > 80% then alert