We still cannot pinpoint a single source for this issue but reasons on why it is different:
With this setting on, ActiveRecord will re-establish connection when it detects a failed connection.
Cannot use this due to no support for UTF in new connections.
http://coderrr.wordpress.com/2009/01/08/activerecord-threading-issues-and-resolutions/ Possible issues: Transaction data with reconnects and rollbacks.
Currently at 16M.
Have not seen such big queries so far nor any large packet log messages.
—
https://tracker.exphosted.com/view.php?id=5744
It is currently at 600 seconds. Increasing it to 24 hours recommended as our 'clients' are trusted and we have a pooled mechanism on Rails end.
wait_timeout = 28800
Will help dig deeper and record more information. Installation steps after the following para.
#on lb server echo "disable server learnexa/prodapp01" | socat stdio /var/run/haproxy/haproxy.sock
#as expprodl /deploy/systasks/god.sh stop ~/bin/apache_sss.sh stop
gem uninstall mysql yum erase Mysql-Devel
yum install mysql-devel
gem install mysql -v 2.8.1
** Schedule to upgrade the gem, asap.
No downtime necessary.
https://tracker.exphosted.com/view.php?id=5620
Installation steps (requires login) : https://rpm.newrelic.com/accounts/710727/plugins/directory/52
Also, add long_query_time either sing plugin conf or MySQL conf.
Key Points:
sync_binlog=1 slow-query-log = 1 slow-query-log-file = /var/log/mysql/mysql-slow.log long_query_time = 0 log-queries-not-using-indexes
Rollback will be 24 hours or earlier.
Steps for rollback will be the same, except instead of adding the “code” to my.cnf, we will remove it.
sync_binlog=1 slow-query-log = 1 slow-query-log-file = /var/log/mysql/mysql-slow.log long_query_time = 1 log-queries-not-using-indexes
Rollback will be 24 hours or earlier.
Steps for rollback will be the same, except instead of adding the “code” to my.cnf, we will remove it.
|Master|| | Aborted_clients | 8924 | | Aborted_connects | 14 | | Slow_queries | 121 | |Slave|| | Aborted_clients | 7360 | | Aborted_connects | 4 | | Slow_queries | 0 |
|Master|| | Created_tmp_disk_tables | 41209 | | Created_tmp_files | 49 | | Created_tmp_tables | 1138451 | |Slave|| | Created_tmp_disk_tables | 17750 | | Created_tmp_files | 15 | | Created_tmp_tables | 1403820 |
Factors to consider:
| Pool Size | Apache + All Passenger processes (private_dirty)MB | CPU | No. of Requests - Concurrent Users |RPS| Response Time Mean|Aborted_clients| Aborted_connects | Notes |
| 5 | 6.95+940.22 | ~38-52% | 2000-50 | 15.2 | 61.333 | 7 | 12 | Not sure why Passenger is consuming ~200M more here |
| 15 | 13.82+766.20 | same | same | 17.17 | 56.949 | 3 | 1 | |
| 23 | 13.82+798.08 | same | same | 18.19 | 58.190 | 0 | 0 | |
| 30 | 20.25+798.98 | same | same | 18.17 | 58.2 | 0 | 0 | |
Wait for a few days to see if Rufus complains on DEV environment. However, we are good to increase the pool to 23, monitor for a few days and then to 30.
0) Check that no sessions are on the server.
1) Bring app server out of rotation.
2) Increase max connections in database.yml - 23
3) Restart passenger.
4) Quick test and put server back in rotation.
5) Repeat for the other app server.
6) Monitor app server and db logs from this point on.
1) Bring app server out of pool.
2) Edit /opt/apache2/conf/httpd.conf, add
PassengerMaxPoolSize 13
3) Restart Passenger
4) Test.
5) If no issues:
6) Put back app server in pool and loop with the other app server.
Rollback:
1) Bring app server out of pool
2) Remove PassengerMaxPoolSize line.
3) Restart passenger (and make the first request).
https://tracker.exphosted.com/view.php?id=5683