===== Installation ===== yum install epel-release yum install monit installs the standard layout with init scripts for CentOS 7 in systemctl and CentOS 6.x in /etc/init.d /etc/monitrc has all the system level configurations # port to bind to\\ # IP which can access the UI of monit\\ # basic auth with password\\ # mail server for alerts\\ # alert format\\ # standard system monitoring elements\\ ===== Components to monitor ===== ==== sendmail service ==== service - pid / process /binary / checksum / ownership can be set as conditions\\ ###################################################################################\\ vi /etc/monit.d/sendmail check process sendmail with pidfile /var/run/sendmail.pid group mail start program = "/etc/init.d/sendmail start" stop program = "/etc/init.d/sendmail stop" if failed port 25 protocol smtp then restart depends on sendmail_bin depends on sendmail_rc check file sendmail_bin with path /usr/lib/sendmail group mail if failed checksum then unmonitor if failed permission 2755 then unmonitor # if failed uid root then unmonitor # if failed gid root then unmonitor check file sendmail_rc with path /etc/init.d/sendmail group mail if failed checksum then unmonitor if failed permission 0644 then unmonitor #if failed uid root then unmonitor #if failed gid root then unmonitor #############################################################################\\ save the file\\ /etc/init.d/monit restart\\ ==== sendmail queue length ==== vi /etc/init.d/sendmailqueue ##############################################################################\\ check program mail-queue path "/usr/bin/check_sendmail_queue.sh" if status != 0 then alert alert devops@expertus.com ##############################################################################\\ vi /usr/bin/check_sendmail_queue.sh\\ ##############################################################################\\ #!/bin/bash queuelength=`/usr/bin/mailq | tail -n1 | awk '{print $3}'` queuecount=`echo $queuelength | grep "[0-9]"` if [ "$queuecount" == "" ]; then echo 0; else echo ${queuelength}; fi exit ##############################################################################\\ chmod +x /usr/bin/check_sendmail_queue.sh ==== dns resolution issues ==== vi /etc/monit.d/dnscheck ###############################################################################\\ check host nscheck with address www.google.com if failed icmp type echo count 5 with timeout 5 seconds 2 times within 3 cycles then alert alert devops@expertus.com ###############################################################################\\ vi /usr/bin/dnscheck.sh\\ ###############################################################################\\ #!/bin/bash #dnslookup # of 1=success | 0=failed DNS_SERVER=8.8.4.4 HOST_QUERY=www.google.com if [`host $HOST_QUERY $DNS_SERVER | grep "has address" | wc -l` -eq 0 ]; then #lookup failed, bad DNS lookup echo "0" else echo "1" fi ########################################################################\\ ==== solr monitoring ==== ==== nodejs monitoring ==== vi /etc/monit.d/nodejs #########################################################################\\ check process node matching "node" start program = "/bin/bash -c /home/sandbox/bin/nodestart.sh" stop program = "/bin/bash -c /home/sandbox/bin/nodestop.sh" if failed host qalearnexa.exphosted.com port 8081 type tcp then restart if failed host qalearnexa.exphosted.com port 8081 type tcp then alert alert devops@expertus.com #########################################################################\\ ===== How it fits with Zabbix / URL monitoring ===== **System**\\ Current - Zabbix\\ New - Zabbix + monit\\ Zabbix will be used for historical data\\ Monit will be used for immediate action based on rules and then alert\\ **Disk**\\ Current - Zabbix\\ New - Zabbix + monit\\ Zabbix will be used for historical data of disk usage growth\\ Monit will be used to monitor mounts and do a remount if it is unable to access a specific disk mount and then alert\\ **CPU**\\ Current - Zabbix\\ New - Zabbix + monit\\ Zabbix will be used for historical CPU load averages 5 min/10 min /15 min\\ Monit for setting rule based actions when the averages exceed a threshold - like restarting a service\\ **Memory**\\ current - Zabbix\\ New - zabbix + monit\\ Zabbix will be used for historical data and period (from - to) based analysis\\ Monit for setting rule based actions when the memory usage exceed a threshold - like restarting a service or alerting the devops\\ **Processes**\\ current - specific processes like apache / mysql are monitored by Zabbix but not very extensive\\ New - zabbix + monit\\ Monit will monitor anything with a pid, port number and an init script or systemd script\\ fail2ban\\ opendkim\\ passenger\\ Haproxy\\ sendmail\\ sendmail queue\\ DNS up\\ The following were issues we have faced at one time or another and all of the above can be monitored by monit and an alert can be configured to be sent or a specific action set by monit.\\ **System login**\\ current - Papertrail\\ New - papertrail(no change)\\ **syslog**\\ current - Papertrail\\ New - papertrail (no change)\\ **URLMonitoring** current - zabbix and sitemonitor\\ New - zabbix and sitemonitor(no change)\\