[01:23:36] <icinga-wm>	 PROBLEM - puppet last run on mw1220 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[01:52:39] <icinga-wm>	 RECOVERY - puppet last run on mw1220 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures
[02:20:31] <logmsgbot>	 !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.1) (duration: 07m 17s)
[02:20:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[02:26:33] <logmsgbot>	 !log l10nupdate@tin ResourceLoader cache refresh completed at Sun May 14 02:26:33 UTC 2017 (duration 6m 2s)
[02:26:40] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:33:09] <icinga-wm>	 PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/share/GeoIP/GeoIP2-City.mmdb.gz]
[04:01:09] <icinga-wm>	 RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures
[04:08:19] <icinga-wm>	 PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=5768.60 Read Requests/Sec=1942.70 Write Requests/Sec=0.80 KBytes Read/Sec=33883.20 KBytes_Written/Sec=20.80
[04:17:19] <icinga-wm>	 RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=29.60 Read Requests/Sec=0.40 Write Requests/Sec=0.40 KBytes Read/Sec=2.80 KBytes_Written/Sec=13.20
[05:22:49] <icinga-wm>	 PROBLEM - nova-compute process on labvirt1009 is CRITICAL: PROCS CRITICAL: 2 processes with regex args ^/usr/bin/python /usr/bin/nova-compute
[05:23:49] <icinga-wm>	 RECOVERY - nova-compute process on labvirt1009 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/nova-compute
[07:33:49] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[08:22:49] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[09:13:19] <icinga-wm>	 PROBLEM - SSH on ms-be1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:14:09] <icinga-wm>	 RECOVERY - SSH on ms-be1019 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8 (protocol 2.0)
[09:25:49] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK
[09:25:59] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1169 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:25:59] <icinga-wm>	 PROBLEM - HHVM jobrunner on mw1259 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[09:27:49] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1259 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.138 second response time
[09:29:50] <icinga-wm>	 RECOVERY - HHVM jobrunner on mw1169 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 4.271 second response time
[11:11:49] <icinga-wm>	 PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers
[11:48:39] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[11:48:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[11:56:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[11:58:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[12:03:02] <wikibugs>	 (03CR) 10Hashar: [C: 031] Jenkins: install jdk, not just jre [puppet] - 10https://gerrit.wikimedia.org/r/348961 (owner: 10Chad)
[12:03:39] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[12:04:39] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[12:04:57] <wikibugs>	 (03CR) 10Hashar: [C: 031] "I guess that is to prepare the migration to role/profile/module scheme?   Should be a noop on contint1001 / contint2001 so feel free to de" [puppet] - 10https://gerrit.wikimedia.org/r/353357 (owner: 10Dzahn)
[12:11:49] <icinga-wm>	 RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK
[13:31:44] <wikibugs>	 (03Draft1) 10Paladox: Install openjdk jdk version instead of jre [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353765
[13:31:46] <wikibugs>	 (03PS2) 10Paladox: Install openjdk jdk version instead of jre [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353765
[13:33:27] <wikibugs>	 (03PS4) 10Paladox: Test: DO NOT MERGE [debs/gerrit] - 10https://gerrit.wikimedia.org/r/350440
[14:04:45] <wikibugs>	 (03Draft1) 10Paladox: Fix debian-rules-missing-recommended-target [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353766
[14:04:47] <wikibugs>	 (03PS2) 10Paladox: Fix debian-rules-missing-recommended-target [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353766
[14:09:17] <wikibugs>	 (03PS3) 10Paladox: Fix debian-rules-missing-recommended-target [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353766
[14:11:57] <wikibugs>	 (03PS4) 10Paladox: Fix debian-rules-missing-recommended-target [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353766
[14:20:47] <wikibugs>	 (03PS5) 10Paladox: Test: DO NOT MERGE [debs/gerrit] - 10https://gerrit.wikimedia.org/r/350440
[14:24:21] <wikibugs>	 (03PS5) 10Paladox: Fix debian-rules-missing-recommended-target [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353766
[14:34:19] <wikibugs>	 (03PS6) 10Paladox: Fix debian-rules-missing-recommended-target [debs/gerrit] - 10https://gerrit.wikimedia.org/r/353766
[15:14:50] <wikibugs>	 06Operations, 06Commons, 10Datasets-Archiving, 10Datasets-General-or-Unknown, 07Community-Wishlist-Survey-2016: Back up of Commons files - https://phabricator.wikimedia.org/T160229#3261065 (10Hydriz)
[16:12:09] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[16:15:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[16:15:39] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[16:17:19] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[16:29:19] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[16:32:19] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[16:33:09] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[16:33:39] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[16:33:39] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[19:33:42] <yannf>	 hi, since yesterday, I have a JS loading issue on Commons, I restarted my PC twice, and it didn't change anything
[19:34:16] <yannf>	 I have to purge a page 2 times for the JS to load
[20:33:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[20:36:39] <icinga-wm>	 RECOVERY - MegaRAID on labstore1003 is OK: OK: optimal, 5 logical, 34 physical
[20:41:39] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:08:38] <sjoerddebruin>	 yannf: one moment, let me see your personal JS
[21:12:35] <sjoerddebruin>	 yannf: I think I can make some improvements, permission to edit?
[21:13:09] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[21:13:40] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[21:15:19] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[21:15:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0]
[21:23:13] <yannf>	 sjoerddebruin, sure, please
[21:25:03] <sjoerddebruin>	 yannf: alright, done. I've changed all scripts to the preferred mw.loader.load and wrapped it in a "mw.loader.using" so they will load correctly.
[21:25:06] <sjoerddebruin>	 You'll probably experience less page shifting as well now.
[21:29:58] <yannf>	 thanks
[21:30:39] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[21:31:14] <sjoerddebruin>	 Let me know if this improves your situation. It did help for me. :)
[21:45:09] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[21:49:19] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:50:09] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:50:39] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[21:51:39] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[22:01:49] <icinga-wm>	 PROBLEM - puppet last run on dataset1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:05:09] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[22:05:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
[22:05:39] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
[22:06:19] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0]
[22:14:19] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[22:14:39] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[22:14:42] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[22:16:09] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[22:28:49] <icinga-wm>	 PROBLEM - puppet last run on gerrit2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[22:29:49] <icinga-wm>	 RECOVERY - puppet last run on dataset1001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures
[22:46:49] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:46:49] <icinga-wm>	 PROBLEM - MD RAID on ms-be1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:46:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[22:47:39] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1019 is OK: OK ferm input default policy is set
[22:47:49] <icinga-wm>	 RECOVERY - MD RAID on ms-be1019 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[22:47:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be1019 is OK: OK: Puppet is currently enabled, last run 18 minutes ago with 0 failures
[22:52:16] <thib>	 Request from (my ip) via cp1053 cp1053, Varnish XID 41240874
[22:52:17] <thib>	 Error: 503, Backend fetch failed at Sun, 14 May 2017 22:51:50 GMT
[22:53:19] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[22:54:14] <HaeB>	 https://status.wikimedia.org/ shows service disruptions too, so i guess/hope ops are aware
[22:54:43] <thib>	 ok
[22:55:09] <icinga-wm>	 PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[22:55:19] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[22:55:39] <icinga-wm>	 PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[22:56:39] <icinga-wm>	 PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0]
[22:57:19] <icinga-wm>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[22:57:49] <icinga-wm>	 RECOVERY - puppet last run on gerrit2001 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures
[22:58:19] <icinga-wm>	 PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0]
[23:00:19] <icinga-wm>	 PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0]
[23:05:09] <icinga-wm>	 RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:05:19] <icinga-wm>	 RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:05:39] <icinga-wm>	 RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:07:19] <icinga-wm>	 RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:09:39] <icinga-wm>	 RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0]
[23:42:19] <icinga-wm>	 PROBLEM - SSH on ms-be1019 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:42:49] <icinga-wm>	 PROBLEM - Check whether ferm is active by checking the default input chain on ms-be1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:42:59] <icinga-wm>	 PROBLEM - MD RAID on ms-be1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:42:59] <icinga-wm>	 PROBLEM - puppet last run on ms-be1019 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
[23:43:09] <icinga-wm>	 RECOVERY - SSH on ms-be1019 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.8 (protocol 2.0)
[23:43:39] <icinga-wm>	 RECOVERY - Check whether ferm is active by checking the default input chain on ms-be1019 is OK: OK ferm input default policy is set
[23:43:49] <icinga-wm>	 RECOVERY - MD RAID on ms-be1019 is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0
[23:43:49] <icinga-wm>	 RECOVERY - puppet last run on ms-be1019 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures