[00:39:08] New patchset: Asher; "adding percona mysql checks" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1850 [00:39:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1850 [00:41:45] !log added ganglia1002 and ganglia1001 to dns [00:41:47] Logged the message, Mistress of the network gear. [00:42:23] RobH: still there ? [00:43:44] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1850 [00:43:45] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1850 [01:33:23] New patchset: Asher; "install just the new mysql check files on eqiad dbs" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1851 [01:45:42] New review: Asher; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1851 [01:45:43] Change merged: Asher; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1851 [02:18:42] PROBLEM - Misc_Db_Lag on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1455s [02:24:02] PROBLEM - MySQL replication status on storage3 is CRITICAL: CHECK MySQL REPLICATION - lag - CRITICAL - Seconds_Behind_Master : 1775s [02:28:32] RECOVERY - Misc_Db_Lag on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:33:52] RECOVERY - MySQL replication status on storage3 is OK: CHECK MySQL REPLICATION - lag - OK - Seconds_Behind_Master : 0s [02:53:58] RECOVERY - Puppet freshness on srv272 is OK: puppet ran at Thu Jan 12 02:53:41 UTC 2012 [04:17:51] RECOVERY - MySQL disk space on es1004 is OK: DISK OK [04:23:31] RECOVERY - Disk space on es1004 is OK: DISK OK [04:36:40] PROBLEM - MySQL slave status on es1004 is CRITICAL: CRITICAL: Slave running: expected Yes, got No [08:42:42] hi :) If anyone could review / merge / deploy https://gerrit.wikimedia.org/r/#change,1847 that would be great [09:09:36] PROBLEM - Puppet freshness on db22 is CRITICAL: Puppet has not run in the last 10 hours [10:00:32] PROBLEM - MySQL disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 440055 MB (3% inode=99%): [10:07:02] PROBLEM - Disk space on es1004 is CRITICAL: DISK CRITICAL - free space: /a 408755 MB (3% inode=99%): [10:24:00] RECOVERY - MySQL slave status on es1004 is OK: OK: [11:25:44] PROBLEM - Puppet freshness on db1044 is CRITICAL: Puppet has not run in the last 10 hours [11:26:44] PROBLEM - Puppet freshness on db1006 is CRITICAL: Puppet has not run in the last 10 hours [11:27:44] PROBLEM - Puppet freshness on db1001 is CRITICAL: Puppet has not run in the last 10 hours [11:27:44] PROBLEM - Puppet freshness on db1018 is CRITICAL: Puppet has not run in the last 10 hours [11:36:47] PROBLEM - Puppet freshness on db1007 is CRITICAL: Puppet has not run in the last 10 hours [11:36:47] PROBLEM - Puppet freshness on db1005 is CRITICAL: Puppet has not run in the last 10 hours [11:36:47] PROBLEM - Puppet freshness on db1010 is CRITICAL: Puppet has not run in the last 10 hours [11:36:47] PROBLEM - Puppet freshness on db1009 is CRITICAL: Puppet has not run in the last 10 hours [11:36:47] PROBLEM - Puppet freshness on db1020 is CRITICAL: Puppet has not run in the last 10 hours [11:36:47] PROBLEM - Puppet freshness on db1021 is CRITICAL: Puppet has not run in the last 10 hours [11:36:48] PROBLEM - Puppet freshness on db1022 is CRITICAL: Puppet has not run in the last 10 hours [11:36:48] PROBLEM - Puppet freshness on db1038 is CRITICAL: Puppet has not run in the last 10 hours [11:36:49] PROBLEM - Puppet freshness on db1034 is CRITICAL: Puppet has not run in the last 10 hours [11:36:49] PROBLEM - Puppet freshness on db1047 is CRITICAL: Puppet has not run in the last 10 hours [11:36:50] PROBLEM - Puppet freshness on db1033 is CRITICAL: Puppet has not run in the last 10 hours [11:36:51] PROBLEM - Puppet freshness on db1043 is CRITICAL: Puppet has not run in the last 10 hours [11:36:51] PROBLEM - Puppet freshness on db1025 is CRITICAL: Puppet has not run in the last 10 hours [11:36:52] PROBLEM - Puppet freshness on db1048 is CRITICAL: Puppet has not run in the last 10 hours [11:40:27] PROBLEM - Puppet freshness on db1002 is CRITICAL: Puppet has not run in the last 10 hours [11:40:27] PROBLEM - Puppet freshness on db1035 is CRITICAL: Puppet has not run in the last 10 hours [11:40:28] PROBLEM - Puppet freshness on db1042 is CRITICAL: Puppet has not run in the last 10 hours [11:42:37] PROBLEM - Puppet freshness on db1004 is CRITICAL: Puppet has not run in the last 10 hours [11:43:37] PROBLEM - Puppet freshness on db1011 is CRITICAL: Puppet has not run in the last 10 hours [11:44:27] PROBLEM - Puppet freshness on db1030 is CRITICAL: Puppet has not run in the last 10 hours [11:46:37] PROBLEM - Puppet freshness on db1019 is CRITICAL: Puppet has not run in the last 10 hours [11:46:37] PROBLEM - Puppet freshness on db1041 is CRITICAL: Puppet has not run in the last 10 hours [11:46:37] PROBLEM - Puppet freshness on db1008 is CRITICAL: Puppet has not run in the last 10 hours [11:47:27] PROBLEM - Puppet freshness on db1017 is CRITICAL: Puppet has not run in the last 10 hours [11:48:27] PROBLEM - Puppet freshness on db1012 is CRITICAL: Puppet has not run in the last 10 hours [11:48:27] PROBLEM - Puppet freshness on db1028 is CRITICAL: Puppet has not run in the last 10 hours [11:49:27] PROBLEM - Puppet freshness on db1027 is CRITICAL: Puppet has not run in the last 10 hours [11:49:27] PROBLEM - Puppet freshness on db1015 is CRITICAL: Puppet has not run in the last 10 hours [11:50:37] PROBLEM - Puppet freshness on db1046 is CRITICAL: Puppet has not run in the last 10 hours [11:51:37] PROBLEM - Puppet freshness on db1003 is CRITICAL: Puppet has not run in the last 10 hours [11:51:37] PROBLEM - Puppet freshness on db1026 is CRITICAL: Puppet has not run in the last 10 hours [11:51:37] PROBLEM - Puppet freshness on db1045 is CRITICAL: Puppet has not run in the last 10 hours [11:51:37] PROBLEM - Puppet freshness on db1039 is CRITICAL: Puppet has not run in the last 10 hours [11:51:37] PROBLEM - Puppet freshness on db1013 is CRITICAL: Puppet has not run in the last 10 hours [11:52:27] PROBLEM - Puppet freshness on db1014 is CRITICAL: Puppet has not run in the last 10 hours [11:52:27] PROBLEM - Puppet freshness on db1024 is CRITICAL: Puppet has not run in the last 10 hours [11:53:37] PROBLEM - Puppet freshness on db1029 is CRITICAL: Puppet has not run in the last 10 hours [11:54:37] PROBLEM - Puppet freshness on db1031 is CRITICAL: Puppet has not run in the last 10 hours [11:54:38] PROBLEM - Puppet freshness on db1016 is CRITICAL: Puppet has not run in the last 10 hours [11:55:37] PROBLEM - Puppet freshness on db1040 is CRITICAL: Puppet has not run in the last 10 hours [12:41:22] New patchset: Mark Bergsma; "Add generic::debconf::set definition for preseeding" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1852 [12:41:59] New patchset: Mark Bergsma; "Install all Mailman languages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1853 [12:42:21] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1852 [12:42:22] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1852 [12:45:59] mark: hi :) can you possibly have a look at https://gerrit.wikimedia.org/r/#change,1847 please ? [12:46:17] mark: would let us administrate the postgre database on gallium [12:51:58] New patchset: Mark Bergsma; "Install all Mailman languages" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1853 [12:52:27] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1853 [12:52:27] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1853 [12:53:35] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1847 [12:53:35] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1847 [12:56:10] New patchset: Mark Bergsma; "Escape value var" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1854 [12:56:35] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1854 [12:56:36] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1854 [13:00:48] New patchset: Mark Bergsma; "Use the noninteractive debconf frontend" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1855 [13:01:26] New patchset: Mark Bergsma; "Use the noninteractive debconf frontend" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1855 [13:02:13] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1855 [13:02:13] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1855 [13:10:15] New patchset: Mark Bergsma; "Correct mailman default_server_language" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1856 [13:10:30] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1856 [13:10:31] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1856 [13:12:28] New patchset: Mark Bergsma; "Add newline in comparison string" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1857 [13:12:52] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1857 [13:12:53] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1857 [13:21:26] PROBLEM - Puppet freshness on ms1002 is CRITICAL: Puppet has not run in the last 10 hours [13:23:50] mark: thanks mark :-) [13:27:55] New patchset: Mark Bergsma; "Attempt to get the debconf test to work" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1858 [13:27:57] I hate bash [13:28:16] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1858 [13:28:16] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1858 [13:41:10] New patchset: Mark Bergsma; "Simplify test again" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1859 [13:41:25] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1859 [13:41:31] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1859 [13:41:32] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1859 [13:45:52] New patchset: Mark Bergsma; "Add quotes" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1860 [13:46:09] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1860 [13:46:10] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1860 [14:00:40] New patchset: Mark Bergsma; "Install a DNS recursor on sodium" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1861 [14:00:59] New review: Mark Bergsma; "(no comment)" [operations/puppet] (production); V: 0 C: 2; - https://gerrit.wikimedia.org/r/1861 [14:01:00] Change merged: Mark Bergsma; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1861 [14:07:14] New review: Mark Bergsma; "Why does it need permissions for that dir anyway? e.g. df works without that..." [operations/puppet] (production); V: 0 C: 0; - https://gerrit.wikimedia.org/r/1845 [14:31:35] New review: Dzahn; "on sodium:" [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1845 [14:31:36] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1845 [14:35:38] mutante: so why does it need permission to that dir? [14:38:31] mark: investigating, saw your comment right after merge [14:40:36] it belongs to Debian-exim Debian-exim and does not allow others [14:41:07] would you rather like to change that instead of ignoring tmpfs in check_disk? [14:41:36] no, I want to know why check_disk needs permission to that dir [14:41:39] what does it do? [14:41:43] i figured it does not really belong in check_disk where we expect to check the physical disk [14:41:58] well I'd prefer it if check_disk worked for tmpfs as well [14:42:08] and seeing as df works fine, why doesn't check_disk? [14:42:25] ah [14:42:27] nevermind [14:42:30] df does not work [14:43:17] which user does the check run as? [14:43:21] nagios [14:43:34] perhaps add 'nagios' to the Debian-exim group on sodium then [14:43:36] (in mail.pp) [14:44:08] alright [14:46:42] PROBLEM - MySQL disk space on db16 is CRITICAL: Connection refused by host [14:47:02] PROBLEM - Disk space on mw1092 is CRITICAL: Connection refused by host [14:47:32] PROBLEM - RAID on srv196 is CRITICAL: Connection refused by host [14:47:32] PROBLEM - Disk space on srv200 is CRITICAL: Connection refused by host [14:48:02] PROBLEM - Disk space on db18 is CRITICAL: Connection refused by host [14:48:22] PROBLEM - DPKG on es1001 is CRITICAL: Connection refused by host [14:49:52] PROBLEM - RAID on es1001 is CRITICAL: Connection refused by host [14:51:22] PROBLEM - mobile traffic loggers on cp1044 is CRITICAL: Connection refused by host [14:51:42] PROBLEM - DPKG on db16 is CRITICAL: Connection refused by host [14:54:32] PROBLEM - RAID on mw70 is CRITICAL: Connection refused by host [14:55:22] PROBLEM - DPKG on bast1001 is CRITICAL: Connection refused by host [14:55:22] PROBLEM - RAID on cp1043 is CRITICAL: Connection refused by host [14:56:02] PROBLEM - RAID on db25 is CRITICAL: Connection refused by host [14:56:02] PROBLEM - Disk space on db34 is CRITICAL: Connection refused by host [14:56:03] PROBLEM - RAID on db46 is CRITICAL: Connection refused by host [14:56:22] PROBLEM - MySQL disk space on db11 is CRITICAL: Connection refused by host [14:56:22] PROBLEM - DPKG on db13 is CRITICAL: Connection refused by host [14:56:22] PROBLEM - Disk space on mw1001 is CRITICAL: Connection refused by host [14:56:32] PROBLEM - DPKG on db18 is CRITICAL: Connection refused by host [14:56:52] PROBLEM - Disk space on mw46 is CRITICAL: Connection refused by host [14:57:12] RECOVERY - Disk space on sodium is OK: DISK OK [14:57:13] PROBLEM - Disk space on snapshot4 is CRITICAL: Connection refused by host [14:57:13] PROBLEM - Disk space on srv195 is CRITICAL: Connection refused by host [14:57:22] PROBLEM - Disk space on mw1080 is CRITICAL: Connection refused by host [14:57:32] PROBLEM - DPKG on srv271 is CRITICAL: Connection refused by host [14:57:32] PROBLEM - Disk space on bast1001 is CRITICAL: Connection refused by host [14:57:32] PROBLEM - DPKG on cp1041 is CRITICAL: Connection refused by host [14:57:42] PROBLEM - Disk space on db13 is CRITICAL: Connection refused by host [14:57:52] PROBLEM - DPKG on es4 is CRITICAL: Connection refused by host [14:58:02] PROBLEM - Disk space on es3 is CRITICAL: Connection refused by host [14:58:22] PROBLEM - DPKG on mw1146 is CRITICAL: Connection refused by host [14:58:22] PROBLEM - DPKG on mw1134 is CRITICAL: Connection refused by host [14:58:32] PROBLEM - RAID on mw55 is CRITICAL: Connection refused by host [14:58:32] PROBLEM - DPKG on mw1121 is CRITICAL: Connection refused by host [14:58:32] PROBLEM - DPKG on mw67 is CRITICAL: Connection refused by host [14:58:32] PROBLEM - RAID on mw69 is CRITICAL: Connection refused by host [14:58:32] PROBLEM - DPKG on mw70 is CRITICAL: Connection refused by host [14:58:33] PROBLEM - RAID on mw72 is CRITICAL: Connection refused by host [14:58:52] PROBLEM - DPKG on srv196 is CRITICAL: Connection refused by host [14:58:52] PROBLEM - RAID on srv210 is CRITICAL: Connection refused by host [14:59:22] PROBLEM - Disk space on virt3 is CRITICAL: Connection refused by host [14:59:22] PROBLEM - Disk space on mw1082 is CRITICAL: Connection refused by host [14:59:32] PROBLEM - Disk space on virt2 is CRITICAL: Connection refused by host [14:59:33] PROBLEM - MySQL disk space on db34 is CRITICAL: Connection refused by host [14:59:33] PROBLEM - DPKG on db46 is CRITICAL: Connection refused by host [14:59:52] PROBLEM - Disk space on db46 is CRITICAL: Connection refused by host [15:00:02] PROBLEM - RAID on mw1001 is CRITICAL: Connection refused by host [15:00:12] PROBLEM - RAID on mw1075 is CRITICAL: Connection refused by host [15:00:12] PROBLEM - RAID on mw1082 is CRITICAL: Connection refused by host [15:00:12] PROBLEM - Disk space on mw1121 is CRITICAL: Connection refused by host [15:00:12] PROBLEM - Disk space on mw1134 is CRITICAL: Connection refused by host [15:00:24] *sigh* [15:00:32] PROBLEM - Disk space on mw1146 is CRITICAL: Connection refused by host [15:00:32] PROBLEM - Disk space on mw67 is CRITICAL: Connection refused by host [15:00:32] PROBLEM - Disk space on mw70 is CRITICAL: Connection refused by host [15:00:42] PROBLEM - DPKG on mw55 is CRITICAL: Connection refused by host [15:00:42] PROBLEM - RAID on snapshot4 is CRITICAL: Connection refused by host [15:00:52] PROBLEM - Disk space on srv196 is CRITICAL: Connection refused by host [15:00:52] PROBLEM - RAID on srv208 is CRITICAL: Connection refused by host [15:00:52] PROBLEM - DPKG on srv210 is CRITICAL: Connection refused by host [15:01:02] PROBLEM - Disk space on srv236 is CRITICAL: Connection refused by host [15:01:12] PROBLEM - RAID on srv236 is CRITICAL: Connection refused by host [15:01:22] PROBLEM - Disk space on cp1043 is CRITICAL: Connection refused by host [15:01:22] PROBLEM - RAID on bast1001 is CRITICAL: Connection refused by host [15:01:22] PROBLEM - RAID on virt2 is CRITICAL: Connection refused by host [15:01:22] PROBLEM - RAID on virt3 is CRITICAL: Connection refused by host [15:01:32] PROBLEM - RAID on srv272 is CRITICAL: Connection refused by host [15:01:42] PROBLEM - DPKG on srv236 is CRITICAL: Connection refused by host [15:01:42] PROBLEM - MySQL disk space on db46 is CRITICAL: Connection refused by host [15:01:52] PROBLEM - MySQL disk space on db18 is CRITICAL: Connection refused by host [15:01:52] PROBLEM - DPKG on db11 is CRITICAL: Connection refused by host [15:01:52] PROBLEM - Disk space on es1 is CRITICAL: Connection refused by host [15:01:52] PROBLEM - MySQL disk space on es4 is CRITICAL: Connection refused by host [15:01:52] PROBLEM - Disk space on es1002 is CRITICAL: Connection refused by host [15:01:53] PROBLEM - MySQL disk space on es2 is CRITICAL: Connection refused by host [15:02:02] PROBLEM - RAID on db16 is CRITICAL: Connection refused by host [15:02:02] PROBLEM - RAID on es1 is CRITICAL: Connection refused by host [15:02:02] PROBLEM - RAID on mw1092 is CRITICAL: Connection refused by host [15:02:02] PROBLEM - RAID on es1002 is CRITICAL: Connection refused by host [15:02:02] PROBLEM - RAID on db11 is CRITICAL: Connection refused by host [15:02:12] PROBLEM - MySQL disk space on db13 is CRITICAL: Connection refused by host [15:02:22] PROBLEM - RAID on mw46 is CRITICAL: Connection refused by host [15:02:32] PROBLEM - Disk space on mw55 is CRITICAL: Connection refused by host [15:02:42] PROBLEM - RAID on es4 is CRITICAL: Connection refused by host [15:02:52] PROBLEM - DPKG on srv208 is CRITICAL: Connection refused by host [15:02:52] PROBLEM - Disk space on srv210 is CRITICAL: Connection refused by host [15:03:02] PROBLEM - Disk space on mw1075 is CRITICAL: Connection refused by host [15:03:12] PROBLEM - Disk space on srv272 is CRITICAL: Connection refused by host [15:03:12] PROBLEM - Disk space on es4 is CRITICAL: Connection refused by host [15:03:22] PROBLEM - DPKG on srv272 is CRITICAL: Connection refused by host [15:03:22] PROBLEM - Disk space on es1001 is CRITICAL: Connection refused by host [15:03:23] someone messing with nagios? [15:03:33] PROBLEM - RAID on db18 is CRITICAL: Connection refused by host [15:03:34] arr, i am checking [15:03:44] i just made a change to an nrpe command [15:03:52] PROBLEM - DPKG on mw1048 is CRITICAL: Connection refused by host [15:04:03] PROBLEM - RAID on mw1104 is CRITICAL: Connection refused by host [15:04:03] PROBLEM - RAID on mw1121 is CRITICAL: Connection refused by host [15:04:13] PROBLEM - RAID on mw1134 is CRITICAL: Connection refused by host [15:04:32] PROBLEM - RAID on mw30 is CRITICAL: Connection refused by host [15:04:32] PROBLEM - RAID on mw67 is CRITICAL: Connection refused by host [15:04:32] PROBLEM - Disk space on mw69 is CRITICAL: Connection refused by host [15:04:42] PROBLEM - Disk space on mw72 is CRITICAL: Connection refused by host [15:04:42] RECOVERY - RAID on mw70 is OK: OK: no RAID installed [15:04:42] PROBLEM - Disk space on db11 is CRITICAL: Connection refused by host [15:04:52] PROBLEM - MySQL disk space on es1 is CRITICAL: Connection refused by host [15:05:02] PROBLEM - DPKG on srv195 is CRITICAL: Connection refused by host [15:05:02] PROBLEM - DPKG on mw1074 is CRITICAL: Connection refused by host [15:05:12] PROBLEM - DPKG on virt3 is CRITICAL: Connection refused by host [15:05:22] PROBLEM - RAID on srv271 is CRITICAL: Connection refused by host [15:05:22] PROBLEM - DPKG on mw46 is CRITICAL: Connection refused by host [15:05:22] PROBLEM - RAID on mw1146 is CRITICAL: Connection refused by host [15:05:24] New patchset: Dzahn; "revert change to check_disk nrpe command" [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1862 [15:05:39] New review: gerrit2; "Lint check passed." [operations/puppet] (production); V: 1 - https://gerrit.wikimedia.org/r/1862 [15:05:42] PROBLEM - MySQL disk space on es1002 is CRITICAL: Connection refused by host [15:05:42] PROBLEM - DPKG on mw1075 is CRITICAL: Connection refused by host [15:05:52] PROBLEM - DPKG on mw1001 is CRITICAL: Connection refused by host [15:05:52] PROBLEM - DPKG on snapshot4 is CRITICAL: Connection refused by host [15:05:52] RECOVERY - RAID on db46 is OK: OK: State is Optimal, checked 2 logical device(s) [15:06:01] New review: Dzahn; "this worked fine on sodium, but obviously something happened, and we want to solve this in another w..." [operations/puppet] (production); V: 1 C: 2; - https://gerrit.wikimedia.org/r/1862 [15:06:01] Change merged: Dzahn; [operations/puppet] (production) - https://gerrit.wikimedia.org/r/1862 [15:06:12] PROBLEM - DPKG on mw72 is CRITICAL: Connection refused by host [15:06:12] PROBLEM - RAID on db13 is CRITICAL: Connection refused by host [15:06:22] PROBLEM - Disk space on srv208 is CRITICAL: Connection refused by host [15:06:22] RECOVERY - MySQL disk space on db16 is OK: DISK OK [15:06:32] RECOVERY - DPKG on db18 is OK: All packages OK [15:06:32] RECOVERY - Disk space on mw1092 is OK: DISK OK [15:06:42] PROBLEM - RAID on srv195 is CRITICAL: Connection refused by host [15:06:42] PROBLEM - mobile traffic loggers on cp1043 is CRITICAL: Connection refused by host [15:06:42] PROBLEM - Disk space on mw1048 is CRITICAL: Connection refused by host [15:06:59] eh, then it wasnt related. starts recovering before i merged the revert [15:07:02] RECOVERY - Disk space on snapshot4 is OK: DISK OK [15:07:12] RECOVERY - RAID on srv196 is OK: OK: no RAID installed [15:07:12] PROBLEM - Disk space on mw1088 is CRITICAL: Connection refused by host [15:07:22] RECOVERY - Disk space on srv200 is OK: DISK OK [15:07:42] PROBLEM - Disk space on cp1044 is CRITICAL: Connection refused by host [15:07:42] RECOVERY - Disk space on db13 is OK: DISK OK [15:07:42] RECOVERY - Disk space on db18 is OK: DISK OK [15:07:42] PROBLEM - DPKG on db25 is CRITICAL: Connection refused by host [15:07:49] i think just spence being so busy that it refuses then [15:07:52] PROBLEM - DPKG on es2 is CRITICAL: Connection refused by host [15:07:52] RECOVERY - DPKG on es4 is OK: All packages OK [15:07:52] PROBLEM - MySQL disk space on es3 is CRITICAL: Connection refused by host [15:08:02] PROBLEM - DPKG on mw30 is CRITICAL: Connection refused by host [15:08:12] RECOVERY - DPKG on es1001 is OK: All packages OK [15:08:12] RECOVERY - DPKG on mw1146 is OK: All packages OK [15:08:22] PROBLEM - DPKG on mw1142 is CRITICAL: Connection refused by host [15:08:22] RECOVERY - DPKG on mw1134 is OK: All packages OK [15:08:22] RECOVERY - RAID on mw55 is OK: OK: no RAID installed [15:08:22] PROBLEM - DPKG on mw33 is CRITICAL: Connection refused by host [15:08:22] RECOVERY - DPKG on mw1121 is OK: All packages OK [15:08:32] PROBLEM - RAID on mw33 is CRITICAL: Connection refused by host [15:08:32] PROBLEM - DPKG on mw1136 is CRITICAL: Connection refused by host [15:08:33] RECOVERY - DPKG on mw67 is OK: All packages OK [15:08:42] RECOVERY - DPKG on mw70 is OK: All packages OK [15:08:52] RECOVERY - RAID on srv210 is OK: OK: no RAID installed [15:08:52] RECOVERY - DPKG on srv196 is OK: All packages OK [15:09:12] RECOVERY - Disk space on virt3 is OK: DISK OK [15:09:12] RECOVERY - Disk space on mw1082 is OK: DISK OK [15:09:20] or it just took a little while until nagios-nrpe was restarted [15:09:22] RECOVERY - DPKG on db46 is OK: All packages OK [15:09:32] PROBLEM - Disk space on db25 is CRITICAL: Connection refused by host [15:09:32] RECOVERY - RAID on es1001 is OK: OK: State is Optimal, checked 2 logical device(s) [15:09:36] (when puppet triggered that due to a config change) [15:09:42] PROBLEM - MySQL disk space on db45 is CRITICAL: Connection refused by host [15:09:42] PROBLEM - DPKG on srv190 is CRITICAL: Connection refused by host [15:09:42] PROBLEM - Disk space on mw1074 is CRITICAL: Connection refused by host [15:09:52] PROBLEM - Disk space on srv190 is CRITICAL: Connection refused by host [15:09:52] PROBLEM - RAID on srv229 is CRITICAL: Connection refused by host [15:09:52] PROBLEM - RAID on mw1048 is CRITICAL: Connection refused by host [15:09:52] PROBLEM - RAID on mw1037 is CRITICAL: Connection refused by host [15:09:52] RECOVERY - Disk space on virt2 is OK: DISK OK [15:10:02] PROBLEM - RAID on mw1074 is CRITICAL: Connection refused by host [15:10:02] PROBLEM - Disk space on srv276 is CRITICAL: Connection refused by host [15:10:02] RECOVERY - RAID on mw1075 is OK: OK: no RAID installed [15:10:02] RECOVERY - RAID on mw1082 is OK: OK: no RAID installed [15:10:02] RECOVERY - Disk space on mw1121 is OK: DISK OK [15:10:02] PROBLEM - Disk space on mw1136 is CRITICAL: Connection refused by host [15:10:03] RECOVERY - Disk space on mw1134 is OK: DISK OK [15:10:12] PROBLEM - Disk space on mw1141 is CRITICAL: Connection refused by host [15:10:13] PROBLEM - Disk space on mw33 is CRITICAL: Connection refused by host [15:10:13] PROBLEM - DPKG on db51 is CRITICAL: Connection refused by host [15:10:13] PROBLEM - RAID on db51 is CRITICAL: Connection refused by host [15:10:22] RECOVERY - Disk space on mw67 is OK: DISK OK [15:10:22] RECOVERY - Disk space on mw70 is OK: DISK OK [15:10:22] RECOVERY - Disk space on db46 is OK: DISK OK [15:10:22] PROBLEM - RAID on mw1080 is CRITICAL: Connection refused by host [15:10:32] RECOVERY - DPKG on mw55 is OK: All packages OK [15:10:32] RECOVERY - RAID on snapshot4 is OK: OK: no RAID installed [15:10:42] RECOVERY - Disk space on srv196 is OK: DISK OK [15:10:42] PROBLEM - Disk space on srv207 is CRITICAL: Connection refused by host [15:10:42] RECOVERY - RAID on srv208 is OK: OK: no RAID installed [15:10:42] RECOVERY - DPKG on srv210 is OK: All packages OK [15:10:42] PROBLEM - Disk space on mw1142 is CRITICAL: Connection refused by host [15:10:52] RECOVERY - Disk space on srv236 is OK: DISK OK [15:10:52] PROBLEM - RAID on mw1079 is CRITICAL: Connection refused by host [15:11:02] RECOVERY - Disk space on mw1146 is OK: DISK OK [15:11:02] RECOVERY - RAID on mw1001 is OK: OK: no RAID installed [15:11:12] PROBLEM - Disk space on cp1041 is CRITICAL: Connection refused by host [15:11:12] PROBLEM - mobile traffic loggers on cp1041 is CRITICAL: Connection refused by host [15:11:13] PROBLEM - RAID on cp1044 is CRITICAL: Connection refused by host [15:11:13] RECOVERY - RAID on bast1001 is OK: OK: no RAID installed [15:11:13] RECOVERY - RAID on virt3 is OK: OK: State is Optimal, checked 2 logical device(s) [15:11:13] RECOVERY - RAID on virt2 is OK: OK: State is Optimal, checked 2 logical device(s) [15:11:22] RECOVERY - RAID on srv272 is OK: OK: no RAID installed [15:11:22] PROBLEM - RAID on mw1088 is CRITICAL: Connection refused by host [15:11:32] RECOVERY - DPKG on db16 is OK: All packages OK [15:11:32] PROBLEM - RAID on db34 is CRITICAL: Connection refused by host [15:11:32] RECOVERY - MySQL disk space on db46 is OK: DISK OK [15:11:32] PROBLEM - Disk space on db51 is CRITICAL: Connection refused by host [15:11:32] PROBLEM - RAID on srv267 is CRITICAL: Connection refused by host [15:11:42] RECOVERY - Disk space on es1 is OK: DISK OK [15:11:42] RECOVERY - MySQL disk space on es4 is OK: DISK OK [15:11:42] PROBLEM - RAID on es2 is CRITICAL: Connection refused by host [15:11:42] RECOVERY - Disk space on es1002 is OK: DISK OK [15:11:52] RECOVERY - MySQL disk space on db18 is OK: DISK OK [15:11:52] RECOVERY - RAID on es1 is OK: OK: State is Optimal, checked 2 logical device(s) [15:11:52] RECOVERY - RAID on mw1092 is OK: OK: no RAID installed [15:11:52] RECOVERY - RAID on es1002 is OK: OK: State is Optimal, checked 2 logical device(s) [15:11:52] RECOVERY - RAID on db16 is OK: OK: 1 logical device(s) checked [15:11:52] RECOVERY - RAID on db11 is OK: OK: 1 logical device(s) checked [15:12:02] RECOVERY - MySQL disk space on db13 is OK: DISK OK [15:12:12] PROBLEM - Disk space on mw1104 is CRITICAL: Connection refused by host [15:12:12] PROBLEM - RAID on mw41 is CRITICAL: Connection refused by host [15:12:12] RECOVERY - RAID on mw46 is OK: OK: no RAID installed [15:12:22] RECOVERY - DPKG on db11 is OK: All packages OK [15:12:22] PROBLEM - Disk space on db45 is CRITICAL: Connection refused by host [15:12:22] RECOVERY - Disk space on mw55 is OK: DISK OK [15:12:32] RECOVERY - RAID on srv236 is OK: OK: no RAID installed [15:12:32] PROBLEM - DPKG on mw1104 is CRITICAL: Connection refused by host [15:12:32] RECOVERY - RAID on es4 is OK: OK: State is Optimal, checked 2 logical device(s) [15:12:42] PROBLEM - DPKG on cp1043 is CRITICAL: Connection refused by host [15:12:42] RECOVERY - Disk space on srv210 is OK: DISK OK [15:12:42] RECOVERY - DPKG on srv208 is OK: All packages OK [15:12:52] RECOVERY - Disk space on mw1075 is OK: DISK OK [15:13:02] RECOVERY - DPKG on srv236 is OK: All packages OK [15:13:02] PROBLEM - Disk space on srv267 is CRITICAL: Connection refused by host [15:13:02] RECOVERY - Disk space on srv272 is OK: DISK OK [15:13:02] RECOVERY - Disk space on es4 is OK: DISK OK [15:13:12] RECOVERY - DPKG on srv272 is OK: All packages OK [15:13:12] RECOVERY - Disk space on es1001 is OK: DISK OK [15:13:22] RECOVERY - RAID on db18 is OK: OK: 1 logical device(s) checked [15:13:42] PROBLEM - RAID on db45 is CRITICAL: Connection refused by host [15:13:42] PROBLEM - DPKG on mw1036 is CRITICAL: Connection refused by host [15:13:43] PROBLEM - DPKG on mw1037 is CRITICAL: Connection refused by host [15:13:43] RECOVERY - DPKG on mw1048 is OK: All packages OK [15:13:52] RECOVERY - RAID on mw1104 is OK: OK: no RAID installed [15:13:52] RECOVERY - RAID on mw1121 is OK: OK: no RAID installed [15:14:02] RECOVERY - RAID on mw1134 is OK: OK: no RAID installed [15:14:02] PROBLEM - Disk space on srv283 is CRITICAL: Connection refused by host [15:14:12] PROBLEM - RAID on mw1142 is CRITICAL: Connection refused by host [15:14:12] PROBLEM - RAID on mw1141 is CRITICAL: Connection refused by host [15:14:12] PROBLEM - DPKG on mw41 is CRITICAL: Connection refused by host [15:14:12] PROBLEM - RAID on srv235 is CRITICAL: Connection refused by host [15:14:22] RECOVERY - RAID on mw30 is OK: OK: no RAID installed [15:14:23] RECOVERY - Disk space on mw69 is OK: DISK OK [15:14:23] RECOVERY - RAID on mw67 is OK: OK: no RAID installed [15:14:23] PROBLEM - RAID on es3 is CRITICAL: Connection refused by host [15:14:32] PROBLEM - DPKG on srv267 is CRITICAL: Connection refused by host [15:14:32] RECOVERY - Disk space on db11 is OK: DISK OK [15:14:32] PROBLEM - DPKG on fenari is CRITICAL: Connection refused by host [15:14:42] PROBLEM - RAID on srv190 is CRITICAL: Connection refused by host [15:14:42] RECOVERY - MySQL disk space on es1 is OK: DISK OK [15:14:52] PROBLEM - DPKG on srv235 is CRITICAL: Connection refused by host [15:14:52] RECOVERY - DPKG on srv195 is OK: All packages OK [15:14:52] RECOVERY - Disk space on mw72 is OK: DISK OK [15:14:52] RECOVERY - DPKG on mw1074 is OK: All packages OK [15:15:02] RECOVERY - DPKG on virt3 is OK: All packages OK [15:15:02] PROBLEM - DPKG on mw1067 is CRITICAL: Connection refused by host [15:15:12] RECOVERY - RAID on srv271 is OK: OK: no RAID installed [15:15:13] PROBLEM - RAID on srv276 is CRITICAL: Connection refused by host [15:15:13] RECOVERY - DPKG on mw46 is OK: All packages OK [15:15:13] RECOVERY - DPKG on bast1001 is OK: All packages OK [15:15:13]