Table of Contents

Installation

yum install epel-release yum install monit

installs the standard layout with init scripts for CentOS 7 in systemctl and CentOS 6.x in /etc/init.d

/etc/monitrc has all the system level configurations

# port to bind to
# IP which can access the UI of monit
# basic auth with password
# mail server for alerts
# alert format
# standard system monitoring elements

Components to monitor

sendmail service

service - pid / process /binary / checksum / ownership can be set as conditions

###################################################################################
vi /etc/monit.d/sendmail

check process sendmail with pidfile /var/run/sendmail.pid

 group mail
 start program = "/etc/init.d/sendmail start"
 stop  program = "/etc/init.d/sendmail stop"
 if failed port 25 protocol smtp then restart
 depends on sendmail_bin
 depends on sendmail_rc

check file sendmail_bin with path /usr/lib/sendmail

 group mail
 if failed checksum then unmonitor
 if failed permission 2755 then unmonitor
# if failed uid root then unmonitor
# if failed gid root then unmonitor

check file sendmail_rc with path /etc/init.d/sendmail

 group mail
 if failed checksum then unmonitor
 if failed permission 0644 then unmonitor
 #if failed uid root then unmonitor
 #if failed gid root then unmonitor

#############################################################################
save the file
/etc/init.d/monit restart

sendmail queue length

vi /etc/init.d/sendmailqueue

##############################################################################
check program mail-queue path “/usr/bin/check_sendmail_queue.sh”

  if status != 0 then alert
      alert devops@expertus.com

##############################################################################

vi /usr/bin/check_sendmail_queue.sh

##############################################################################

#!/bin/bash
 queuelength=`/usr/bin/mailq | tail -n1 | awk '{print $3}'`
 queuecount=`echo $queuelength | grep "[0-9]"`
 if [ "$queuecount" == "" ]; then
      echo 0;
 else
      echo ${queuelength};
fi
exit

##############################################################################
chmod +x /usr/bin/check_sendmail_queue.sh

dns resolution issues

vi /etc/monit.d/dnscheck

###############################################################################

check host nscheck with address www.google.com

if failed icmp type echo
    count 5 with timeout 5 seconds
    2 times within 3 cycles then alert
      alert devops@expertus.com

###############################################################################

vi /usr/bin/dnscheck.sh
###############################################################################
#!/bin/bash #dnslookup # of 1=success | 0=failed

DNS_SERVER=8.8.4.4
HOST_QUERY=www.google.com
if [`host $HOST_QUERY $DNS_SERVER | grep "has address" | wc -l` -eq 0 ]; then
 #lookup failed, bad DNS lookup
 echo "0"
 else
 echo "1"
 fi

########################################################################

solr monitoring

nodejs monitoring

vi /etc/monit.d/nodejs

#########################################################################
check process node matching “node”

      start program = "/bin/bash -c /home/sandbox/bin/nodestart.sh"
      stop program  = "/bin/bash -c /home/sandbox/bin/nodestop.sh"
      if failed host qalearnexa.exphosted.com port 8081 type tcp then restart
      if failed host qalearnexa.exphosted.com port 8081 type tcp then alert
      alert devops@expertus.com

#########################################################################

How it fits with Zabbix / URL monitoring

System
Current - Zabbix
New - Zabbix + monit
Zabbix will be used for historical data
Monit will be used for immediate action based on rules and then alert

Disk
Current - Zabbix
New - Zabbix + monit
Zabbix will be used for historical data of disk usage growth
Monit will be used to monitor mounts and do a remount if it is unable to access a specific disk mount and then alert

CPU
Current - Zabbix
New - Zabbix + monit
Zabbix will be used for historical CPU load averages 5 min/10 min /15 min
Monit for setting rule based actions when the averages exceed a threshold - like restarting a service

Memory
current - Zabbix
New - zabbix + monit
Zabbix will be used for historical data and period (from - to) based analysis
Monit for setting rule based actions when the memory usage exceed a threshold - like restarting a service or alerting the devops

Processes
current - specific processes like apache / mysql are monitored by Zabbix but not very extensive
New - zabbix + monit
Monit will monitor anything with a pid, port number and an init script or systemd script
fail2ban
opendkim
passenger
Haproxy
sendmail
sendmail queue
DNS up
The following were issues we have faced at one time or another and all of the above can be monitored by monit and an alert can be configured to be sent or a specific action set by monit.

System login
current - Papertrail
New - papertrail(no change)

syslog
current - Papertrail
New - papertrail (no change)

URLMonitoring current - zabbix and sitemonitor
New - zabbix and sitemonitor(no change)