[00:56:58] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 280 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[01:01:58] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 280 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[01:28:28] <icinga-wm>	 PROBLEM - puppet last run on cp3008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:29:58] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 83500.467367 Seconds
[01:30:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: CRITICAL - Rep Delay is: 83544.138244 Seconds
[01:30:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 83545.24789 Seconds
[01:30:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2003 is CRITICAL: CRITICAL - Rep Delay is: 83551.295048 Seconds
[01:33:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2002 is OK: OK - Rep Delay is: 0.0 Seconds
[01:33:58] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 83746.675654 Seconds
[01:33:59] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 83746.703798 Seconds
[01:38:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: CRITICAL - Rep Delay is: 84024.410796 Seconds
[01:53:08] <icinga-wm>	 PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:53:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2002 is OK: OK - Rep Delay is: 6.235622 Seconds
[01:53:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 7.167741 Seconds
[01:53:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps2003 is OK: OK - Rep Delay is: 13.144717 Seconds
[01:53:58] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 51.681208 Seconds
[01:53:59] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 57.262289 Seconds
[01:53:59] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 57.290283 Seconds
[01:56:28] <icinga-wm>	 RECOVERY - puppet last run on cp3008 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[02:21:08] <icinga-wm>	 RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures
[02:22:18] <icinga-wm>	 PROBLEM - puppet last run on ms-be1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:28:11] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.18) (duration: 07m 56s)
[02:28:19] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:46:18] <icinga-wm>	 PROBLEM - puppet last run on achernar is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:51:18] <icinga-wm>	 RECOVERY - puppet last run on ms-be1009 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures
[02:53:54] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.19) (duration: 07m 36s)
[02:54:00] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:59:29] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun Apr  9 02:59:29 UTC 2017 (duration 5m 35s)
[02:59:35] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:04:18] <icinga-wm>	 PROBLEM - dhclient process on thumbor1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[03:05:09] <icinga-wm>	 RECOVERY - dhclient process on thumbor1002 is OK: PROCS OK: 0 processes with command name dhclient
[03:11:08] <icinga-wm>	 PROBLEM - puppet last run on mw1182 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:15:18] <icinga-wm>	 RECOVERY - puppet last run on achernar is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[03:16:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 3653.493023 Seconds
[03:17:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[03:18:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 3769.507173 Seconds
[03:20:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[03:38:08] <icinga-wm>	 RECOVERY - puppet last run on mw1182 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures
[03:49:28] <icinga-wm>	 PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 20 probes of 276 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[03:50:58] <icinga-wm>	 PROBLEM - puppet last run on stat1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[03:53:58] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 280 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[03:54:28] <icinga-wm>	 RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw is OK: OK - failed 18 probes of 276 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map
[03:58:58] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 280 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[04:10:48] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=652.00 Read Requests/Sec=408.40 Write Requests/Sec=0.50 KBytes Read/Sec=40392.80 KBytes_Written/Sec=17.60
[04:11:58] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 7004.463698 Seconds
[04:13:59] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[04:14:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 7129.555326 Seconds
[04:15:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[04:16:08] <icinga-wm>	 PROBLEM - puppet last run on mc1028 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[04:18:48] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=45.00 Read Requests/Sec=0.60 Write Requests/Sec=0.40 KBytes Read/Sec=4.40 KBytes_Written/Sec=4.40
[04:19:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 7429.572563 Seconds
[04:20:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[04:20:58] <icinga-wm>	 RECOVERY - puppet last run on stat1002 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures
[04:21:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 7553.54276 Seconds
[04:23:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[04:30:58] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 8144.765729 Seconds
[04:31:59] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[04:37:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 8513.47083 Seconds
[04:38:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[04:43:08] <icinga-wm>	 RECOVERY - puppet last run on mc1028 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures
[04:55:59] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 9645.182039 Seconds
[04:56:58] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[05:09:28] <icinga-wm>	 PROBLEM - puppet last run on cp3041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[05:29:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 11634.053206 Seconds
[05:31:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[05:37:28] <icinga-wm>	 RECOVERY - puppet last run on cp3041 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[05:39:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 12233.783922 Seconds
[05:40:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[05:45:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 12596.885176 Seconds
[05:46:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[05:55:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 13196.795254 Seconds
[05:57:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[06:03:08] <icinga-wm>	 PROBLEM - puppet last run on mw1216 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:05:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 13793.719078 Seconds
[06:07:09] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[06:25:08] <icinga-wm>	 PROBLEM - puppet last run on mc1030 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[06:31:08] <icinga-wm>	 RECOVERY - puppet last run on mw1216 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[06:40:58] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[06:45:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 16189.706154 Seconds
[06:45:58] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[06:47:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[06:54:08] <icinga-wm>	 RECOVERY - puppet last run on mc1030 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures
[06:57:58] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[07:02:58] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[07:03:58] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw1265 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.009 second response time
[07:04:08] <icinga-wm>	 PROBLEM - HHVM rendering on mw1265 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 1308 bytes in 0.001 second response time
[07:04:58] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw1265 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 613 bytes in 0.029 second response time
[07:05:08] <icinga-wm>	 RECOVERY - HHVM rendering on mw1265 is OK: HTTP OK: HTTP/1.1 200 OK - 75853 bytes in 0.102 second response time
[07:17:09] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 18109.740741 Seconds
[07:18:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[07:25:08] <icinga-wm>	 PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[07:30:08] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[07:34:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 19136.738844 Seconds
[07:35:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[07:39:58] <icinga-wm>	 PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo is CRITICAL: CRITICAL - failed 20 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[07:44:58] <icinga-wm>	 RECOVERY - IPv6 ping to ulsfo on ripe-atlas-ulsfo is OK: OK - failed 19 probes of 281 (alerts on 19) - https://atlas.ripe.net/measurements/1791309/#!map
[07:48:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 19973.696106 Seconds
[07:49:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[07:53:08] <icinga-wm>	 RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures
[08:28:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 22369.582751 Seconds
[08:29:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[08:30:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 22496.811046 Seconds
[08:32:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[08:33:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 22673.702944 Seconds
[08:34:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 22729.620497 Seconds
[08:34:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[08:38:08] <icinga-wm>	 PROBLEM - puppet last run on mc1019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[08:40:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[08:43:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 23269.616324 Seconds
[08:44:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[08:46:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 23456.713255 Seconds
[08:47:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[08:56:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 24056.657684 Seconds
[08:56:18] <wikibugs__>	 (03PS1) 10Urbanecm: Initial configuration for wbwikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347214 (https://phabricator.wikimedia.org/T162510)
[08:58:09] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 24173.798402 Seconds
[08:59:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[08:59:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[09:06:18] <icinga-wm>	 RECOVERY - puppet last run on mc1019 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[09:13:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 25076.873039 Seconds
[09:14:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[09:20:09] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 25492.583293 Seconds
[09:22:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[09:39:28] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:40:09] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 26692.962834 Seconds
[09:40:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 26697.071645 Seconds
[09:40:28] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[09:41:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[09:41:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[09:41:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 26766.757289 Seconds
[09:42:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[09:42:28] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[09:46:18] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time
[09:46:51] <DatGuy>	 what's happening :P
[09:47:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 27112.889844 Seconds
[09:47:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 27126.638808 Seconds
[09:48:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[09:49:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[09:50:43] <volans>	 gehel: FYI all those delays ^^^ seems to match with my theory, auto-vacuum is running on the master (maps1001)
[09:52:38] <gehel>	 volans: yeah, I have a patch that might help get better alerting. Coming up on Monday...
[09:54:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 27546.739515 Seconds
[09:56:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[09:59:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 27846.662603 Seconds
[10:01:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[10:07:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 28312.930035 Seconds
[10:08:09] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[10:18:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 28986.647251 Seconds
[10:20:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[10:25:09] <icinga-wm>	 PROBLEM - puppet last run on wtp1020 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[10:36:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 30052.975401 Seconds
[10:37:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[10:39:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 30246.74998 Seconds
[10:41:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[10:42:13] <wikibugs__>	 (03PS1) 10DatGuy: Initial configuration for dtywiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347217 (https://phabricator.wikimedia.org/T161529)
[10:49:06] <wikibugs__>	 06Operations, 10Wikimedia-Site-requests, 13Patch-For-Review: Create Wikipedia Doteli - https://phabricator.wikimedia.org/T161529#3166608 (10DatGuy) 05Open>03stalled Blocked for logo. Waiting to hear what "The Free Encyclopedia" is in Doteli.
[10:53:09] <icinga-wm>	 RECOVERY - puppet last run on wtp1020 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures
[11:02:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 31626.657022 Seconds
[11:04:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[11:24:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 32933.12555 Seconds
[11:28:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[11:50:28] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1169 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[11:52:37] <wikibugs>	 06Operations, 10Icinga: Update icinga to 2.x - https://phabricator.wikimedia.org/T162542#3166631 (10Paladox)
[11:56:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 34856.899012 Seconds
[11:57:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[11:57:28] <icinga-wm>	 PROBLEM - puppet last run on ms-be1033 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[12:08:28] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 35586.998671 Seconds
[12:11:28] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 0.0 Seconds
[12:12:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 35813.112297 Seconds
[12:13:08] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[12:26:18] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[12:26:28] <icinga-wm>	 RECOVERY - puppet last run on ms-be1033 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures
[12:28:18] <icinga-wm>	 PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[12:33:18] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[12:33:18] <icinga-wm>	 RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[12:41:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 37556.765675 Seconds
[12:43:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[12:48:18] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1003 is CRITICAL: CRITICAL - Rep Delay is: 37976.681533 Seconds
[12:49:08] <icinga-wm>	 PROBLEM - Postgres Replication Lag on maps1004 is CRITICAL: CRITICAL - Rep Delay is: 38033.462542 Seconds
[12:50:09] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1004 is OK: OK - Rep Delay is: 0.0 Seconds
[12:50:18] <icinga-wm>	 RECOVERY - Postgres Replication Lag on maps1003 is OK: OK - Rep Delay is: 0.0 Seconds
[13:04:07] <wikibugs__>	 06Operations, 10Icinga: Update icinga to 2.x - https://phabricator.wikimedia.org/T162542#3166673 (10faidon) 05Open>03declined
[13:21:46] <wikibugs>	 06Operations, 10Icinga: Update icinga to 2.x - https://phabricator.wikimedia.org/T162542#3166674 (10Paladox) Not sure why you closed it as declined.
[13:32:38] <icinga-wm>	 PROBLEM - puppet last run on mw1301 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[13:50:28] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1169 is OK: OK
[13:56:59] <wikibugs>	 06Operations, 10Phabricator, 07LDAP: Create a LDAP user for account Seanchen (Sean Chen) - https://phabricator.wikimedia.org/T162544#3166696 (10Paladox)
[14:01:38] <icinga-wm>	 RECOVERY - puppet last run on mw1301 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures
[14:19:18] <wikibugs>	 06Operations, 10Traffic: Select location for Asia Cache DC - https://phabricator.wikimedia.org/T156029#2962020 (10H-stt) >>! In T156029#3053235, @BBlack wrote: >  >>>! In T156029#3053179, @Gnom1 wrote: >> The goal is to //have Wikipedia's servers run on renewable energy//. It's as simple as that. >  > I don't...
[14:28:48] <icinga-wm>	 PROBLEM - puppet last run on cp3007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:44:38] <icinga-wm>	 PROBLEM - puppet last run on logstash1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[14:53:48] <icinga-wm>	 PROBLEM - Check health of redis instance on 6479 on rdb2005 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 127.0.0.1 on port 6479
[14:54:48] <icinga-wm>	 RECOVERY - Check health of redis instance on 6479 on rdb2005 is OK: OK: REDIS 2.8.17 on 127.0.0.1:6479 has 1 databases (db0) with 3063343 keys, up 16 days 22 hours - replication_delay is 0
[14:56:48] <icinga-wm>	 RECOVERY - puppet last run on cp3007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[15:12:38] <icinga-wm>	 RECOVERY - puppet last run on logstash1001 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures
[15:18:11] <wikibugs>	 06Operations, 10DNS, 10Traffic, 10Wikimedia-Language-setup: nan and minnan subdomain redirects are a mess - https://phabricator.wikimedia.org/T86915#3166755 (10Liuxinyu970226) After re-clarification of @stevenj81, https://incubator.wikimedia.org/wiki/Wt/nan now shows "This project uses a different ISO code...
[16:22:28] <icinga-wm>	 PROBLEM - puppet last run on analytics1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:40:28] <icinga-wm>	 PROBLEM - puppet last run on elastic1044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[16:50:28] <icinga-wm>	 RECOVERY - puppet last run on analytics1044 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures
[17:01:18] <icinga-wm>	 PROBLEM - puppet last run on db1091 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[17:09:28] <icinga-wm>	 RECOVERY - puppet last run on elastic1044 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[17:29:18] <icinga-wm>	 RECOVERY - puppet last run on db1091 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures
[18:23:38] <icinga-wm>	 PROBLEM - puppet last run on mx1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[18:40:42] <wikibugs>	 06Operations, 10DNS, 10Traffic, 10Wikimedia-Language-setup: nan and minnan subdomain redirects are a mess - https://phabricator.wikimedia.org/T86915#3166940 (10StevenJ81) I didn't really intend what I wrote as a resolution; I intended it as a stopgap until such time as this issue is completely resolved. I...
[18:51:38] <icinga-wm>	 RECOVERY - puppet last run on mx1001 is OK: OK: Puppet is currently enabled, last run 18 seconds ago with 0 failures
[18:54:18] <icinga-wm>	 PROBLEM - puppet last run on aqs1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:03:18] <icinga-wm>	 PROBLEM - puppet last run on rdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[19:23:18] <icinga-wm>	 RECOVERY - puppet last run on aqs1007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures
[19:29:27] <wikibugs>	 (03PS1) 10Hoo man: Change dumpwikidatattl to allow producing other flavors [puppet] - 10https://gerrit.wikimedia.org/r/347234 (https://phabricator.wikimedia.org/T155103)
[19:30:47] <wikibugs__>	 (03CR) 10jerkins-bot: [V: 04-1] Change dumpwikidatattl to allow producing other flavors [puppet] - 10https://gerrit.wikimedia.org/r/347234 (https://phabricator.wikimedia.org/T155103) (owner: 10Hoo man)
[19:31:18] <icinga-wm>	 RECOVERY - puppet last run on rdb1004 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures
[19:32:42] <wikibugs__>	 (03PS2) 10Hoo man: Change dumpwikidatattl to allow producing other flavors [puppet] - 10https://gerrit.wikimedia.org/r/347234 (https://phabricator.wikimedia.org/T155103)
[19:34:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] Change dumpwikidatattl to allow producing other flavors [puppet] - 10https://gerrit.wikimedia.org/r/347234 (https://phabricator.wikimedia.org/T155103) (owner: 10Hoo man)
[19:39:18] <wikibugs>	 06Operations, 10Icinga: Update icinga to 2.x - https://phabricator.wikimedia.org/T162542#3166994 (10Paladox) 05declined>03Open Re opening as no reason for decline.
[19:39:34] <wikibugs__>	 (03PS3) 10Hoo man: Change dumpwikidatattl to allow producing other flavors [puppet] - 10https://gerrit.wikimedia.org/r/347234 (https://phabricator.wikimedia.org/T155103)
[20:40:18] <icinga-wm>	 PROBLEM - puppet last run on cp1046 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[20:48:02] <Guest0888>	 !ops
[21:04:02] <wikibugs>	 (03PS1) 10Phuedx: pagePreviews: Enable NavPopups gadget detection [mediawiki-config] - 10https://gerrit.wikimedia.org/r/347291 (https://phabricator.wikimedia.org/T160081)
[21:08:18] <icinga-wm>	 RECOVERY - puppet last run on cp1046 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures
[21:44:37] <wikibugs__>	 06Operations, 10Traffic: certspotter: Error retrieving STH from log - https://phabricator.wikimedia.org/T159137#3057797 (10Volans) Since  a couple of days both `einsteinium` and `tegmen` are spamming root@ every hour with certspotter errors, this time seems that the DigiCert service is responding 400 for the c...
[21:49:48] <icinga-wm>	 PROBLEM - puppet last run on ms-be3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[21:53:44] <wikibugs>	 06Operations, 10MediaWiki-extensions-PageAssessments: Cronspam from terbium - https://phabricator.wikimedia.org/T145360#3167121 (10Volans) **Since Feb. 19th** we're getting one email every day from terbium with an error for each wiki (~900 lines email) with:  ``` The following extensions are required to be ins...
[22:47:58] <icinga-wm>	 RECOVERY - puppet last run on ms-be3002 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures
[23:03:43] <wikibugs>	 06Operations, 10MediaWiki-extensions-PageAssessments: Cronspam from terbium - https://phabricator.wikimedia.org/T145360#3167193 (10Peachey88) @Volans That would be {T159438} I believe
[23:15:28] <icinga-wm>	 PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:26:28] <icinga-wm>	 PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[23:32:56] <wikibugs__>	 06Operations, 10Icinga: Update icinga to 2.x - https://phabricator.wikimedia.org/T162542#3167201 (10faidon) 05Open>03declined Because it will take ten times as long to explain why than what it took you to open this task. You casually talked about a complicated weeks- or months-long project for an "upgrade"...
[23:43:28] <icinga-wm>	 RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures
[23:54:28] <icinga-wm>	 RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures
[23:54:38] <icinga-wm>	 PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: http status 500