[00:11:37] (03PS1) 10Dzahn: mysql_wmf - autoload layout and lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170479 [00:22:50] (03PS1) 10Dzahn: swift_new: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170480 [00:23:39] (03PS1) 10John F. Lewis: base: autoload modules [puppet] - 10https://gerrit.wikimedia.org/r/170481 [00:27:05] (03CR) 10John F. Lewis: "http://puppet-compiler.wmflabs.org/469/change/170481/html/ shows no difference :)" [puppet] - 10https://gerrit.wikimedia.org/r/170481 (owner: 10John F. Lewis) [00:27:10] mutante ^ [00:27:37] JohnLewis: very nice [00:28:36] mutante: let's hope no more are as complex :p [00:38:03] (03PS1) 10John F. Lewis: beta: linting and autoload modules [puppet] - 10https://gerrit.wikimedia.org/r/170484 [00:40:04] (03PS1) 10John F. Lewis: bugzilla: linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/170485 [00:40:23] (03PS1) 10Dzahn: mysql - lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170486 [00:41:36] (03PS1) 10Hoo man: eswikivoyage: Give sysops "abusefilter-modify-restricted" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170487 (https://bugzilla.wikimedia.org/62321) [00:46:29] (03PS1) 10John F. Lewis: ceph: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170488 [00:50:51] (03PS1) 10John F. Lewis: contint: linting fixes and autoload module [puppet] - 10https://gerrit.wikimedia.org/r/170489 [00:52:02] PROBLEM - MySQL Replication Heartbeat on db1016 is CRITICAL: CRIT replication delay 308 seconds [00:52:53] PROBLEM - MySQL Slave Delay on db1016 is CRITICAL: CRIT replication delay 358 seconds [00:54:03] RECOVERY - MySQL Slave Delay on db1016 is OK: OK replication delay 0 seconds [00:54:19] (03PS1) 10John F. Lewis: cxserver: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170490 [00:54:38] RECOVERY - MySQL Replication Heartbeat on db1016 is OK: OK replication delay -0 seconds [00:54:51] (03PS1) 10Dzahn: mariadb - lint fixes [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/170491 [00:58:31] (03PS1) 10John F. Lewis: dataset: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170492 [00:59:51] (03PS1) 10John F. Lewis: deployment: fix lint [puppet] - 10https://gerrit.wikimedia.org/r/170493 [01:01:22] (03PS1) 10John F. Lewis: diamond: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170494 [01:05:00] (03PS1) 10John F. Lewis: dynamicproxy: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170495 [01:08:48] (03PS1) 10John F. Lewis: elasticsearch: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170496 [01:13:10] (03PS1) 10John F. Lewis: extdist: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170497 [01:13:28] PROBLEM - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=72%): [01:14:10] (03PS1) 10John F. Lewis: ferm: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170498 [01:14:41] that's enough filling the queue up givings ops some work to do work done :p [01:15:24] (03PS1) 10Dzahn: openstack: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170499 [01:15:44] JohnLewis: Stopppp flooding my inbox :D [01:15:45] yep, it is. thanks and see you after weekend then [01:16:01] hoo: it is flooding my inbox x2 :( [01:16:09] (03CR) 10jenkins-bot: [V: 04-1] openstack: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170499 (owner: 10Dzahn) [01:16:13] It is lint Friday? :D [01:16:28] hoo: lint November :p [01:16:48] (03Abandoned) 10Hoo man: Remove all references to pmtpa from role::cache [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man) [01:16:58] or, class month::november::lint :p [01:17:31] (03PS2) 10Dzahn: openstack: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170499 [01:18:06] Most of the things I touch in operations end up not abandoned :D (In reality it's probably way less than 50%) [01:18:11] (03CR) 10jenkins-bot: [V: 04-1] openstack: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170499 (owner: 10Dzahn) [01:18:12] - not [01:18:50] hoo: way more sounds more realistic to me :p [01:19:58] or you could just not abandon it and let it sit there [01:20:05] now, night. hoo: keep an eye on your inbox... if you can see it through all the patches I made [01:20:14] good night [01:20:56] (03Restored) 10Dzahn: Remove all references to pmtpa from role::cache [puppet] - 10https://gerrit.wikimedia.org/r/164273 (owner: 10Hoo man) [01:22:08] (03PS3) 10Dzahn: openstack: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170499 [01:22:47] (03CR) 10jenkins-bot: [V: 04-1] openstack: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170499 (owner: 10Dzahn) [01:59:13] (03CR) 1020after4: "since we need to use preamble.php for other reasons, I don't see much reason to use mod_rpaf. For one thing, mod_rpaf doesn't seem to supp" [puppet] - 10https://gerrit.wikimedia.org/r/168509 (owner: 1020after4) [02:28:10] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: Puppet has 1 failures [02:46:07] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [03:01:26] (03CR) 10KartikMistry: [C: 031] cxserver: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170490 (owner: 10John F. Lewis) [03:18:37] PROBLEM - Disk space on analytics1003 is CRITICAL: DISK CRITICAL - free space: /srv/log 11255 MB (3% inode=99%): [04:21:49] PROBLEM - Disk space on search1019 is CRITICAL: DISK CRITICAL - free space: /a 9742 MB (3% inode=99%): [06:23:50] PROBLEM - Disk space on ocg1002 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=73%): [06:27:22] RECOVERY - Disk space on ocg1003 is OK: DISK OK [06:27:25] RECOVERY - Disk space on ocg1002 is OK: DISK OK [06:30:32] PROBLEM - puppet last run on cp4014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:41] RECOVERY - Disk space on analytics1003 is OK: DISK OK [06:47:31] RECOVERY - puppet last run on cp4014 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:56:28] PROBLEM - CI: Puppet failure events on labmon1001 is CRITICAL: CRITICAL: integration.integration-slave1009.puppetagent.failed_events.value (33.33%) [07:22:15] RECOVERY - CI: Puppet failure events on labmon1001 is OK: OK: All targets OK [08:35:19] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 664 [08:40:11] PROBLEM - check_mysql on lutetium is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 963 [08:45:21] RECOVERY - check_mysql on lutetium is OK: Uptime: 1350762 Threads: 2 Questions: 15601453 Slow queries: 11547 Opens: 14165 Flush tables: 2 Open tables: 64 Queries per second avg: 11.550 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [10:16:27] PROBLEM - Apache HTTP on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:16:29] PROBLEM - HHVM rendering on mw1025 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:35:11] PROBLEM - HHVM rendering on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:36:21] RECOVERY - HHVM rendering on mw1024 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 9.057 second response time [10:44:00] PROBLEM - LVS HTTP IPv4 on hhvm-appservers.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:44:14] PROBLEM - Apache HTTP on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:44:34] PROBLEM - HHVM rendering on mw1021 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:44:34] PROBLEM - Apache HTTP on mw1026 is CRITICAL: Connection timed out [10:44:43] PROBLEM - HHVM rendering on mw1026 is CRITICAL: Connection timed out [10:44:50] PROBLEM - Apache HTTP on mw1029 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:44:52] PROBLEM - Apache HTTP on mw1030 is CRITICAL: Connection timed out [10:44:53] PROBLEM - HHVM rendering on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:44:53] PROBLEM - HHVM rendering on mw1029 is CRITICAL: Connection timed out [10:44:53] PROBLEM - Apache HTTP on mw1023 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:13] PROBLEM - HHVM rendering on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:14] PROBLEM - HHVM rendering on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:15] PROBLEM - Apache HTTP on mw1027 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:23] PROBLEM - Apache HTTP on mw1053 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:25] PROBLEM - Apache HTTP on mw1020 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:36] PROBLEM - HHVM rendering on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:45:45] PROBLEM - Apache HTTP on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:46:06] PROBLEM - Apache HTTP on mw1163 is CRITICAL: Connection timed out [10:46:14] PROBLEM - Apache HTTP on mw1018 is CRITICAL: Connection timed out [10:46:14] PROBLEM - HHVM rendering on mw1163 is CRITICAL: Connection timed out [10:46:14] PROBLEM - Apache HTTP on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:46:24] PROBLEM - HHVM rendering on mw1053 is CRITICAL: Connection timed out [10:46:35] PROBLEM - Apache HTTP on mw1022 is CRITICAL: Connection timed out [10:46:54] PROBLEM - Apache HTTP on mw1017 is CRITICAL: Connection timed out [10:47:04] PROBLEM - Apache HTTP on mw1019 is CRITICAL: Connection timed out [10:47:14] PROBLEM - HHVM rendering on mw1022 is CRITICAL: Connection timed out [10:47:19] PROBLEM - HHVM rendering on mw1019 is CRITICAL: Connection timed out [10:47:24] PROBLEM - HHVM rendering on mw1017 is CRITICAL: Connection timed out [10:47:24] PROBLEM - HHVM rendering on mw1030 is CRITICAL: Connection timed out [10:47:34] PROBLEM - HHVM rendering on mw1018 is CRITICAL: Connection timed out [10:48:04] RECOVERY - Apache HTTP on mw1031 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 1.451 second response time [10:48:06] PROBLEM - Apache HTTP on mw1028 is CRITICAL: Connection timed out [10:48:15] PROBLEM - HHVM rendering on mw1028 is CRITICAL: Connection timed out [10:48:17] PROBLEM - HHVM rendering on mw1023 is CRITICAL: Connection timed out [10:49:21] issue? [10:49:24] Request: POST http://en.wikisource.org/w/index.php?title=Author:George_Webbe_Dasent&action=submit, from 10.128.0.117 via cp1055 cp1055 ([10.64.32.107]:3128), Varnish XID 1619492431 [10:49:25] Forwarded for: 103.254.5.215, 10.128.0.118, 10.128.0.118, 10.128.0.117 [10:49:25] Error: 503, Service Unavailable at Sat, 01 Nov 2014 10:48:31 GMT [10:49:55] after lots of waiting, and three times getting same error [10:50:04] PROBLEM - nutcracker port on mw1163 is CRITICAL: Connection refused [10:50:47] now it gets there [10:50:52] dog slow [10:51:12] 10 minutes to get a simple page to save [10:51:14] PROBLEM - Apache HTTP on mw1031 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:52:32] Is ottomata truly on duty? he isn't here [10:52:55] RECOVERY - Apache HTTP on mw1024 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 1.132 second response time [10:55:27] PROBLEM - HTTP error ratio anomaly detection on tungsten is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 1 below the confidence bounds [10:56:17] PROBLEM - Apache HTTP on mw1024 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [10:59:15] <_joe_> shit [10:59:16] <_joe_> yes [10:59:37] <_joe_> sDrewth: on it [11:00:15] RECOVERY - Apache HTTP on mw1053 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 1.247 second response time [11:00:36] RECOVERY - HHVM rendering on mw1028 is OK: HTTP OK: HTTP/1.1 200 OK - 67488 bytes in 4.317 second response time [11:00:46] RECOVERY - HHVM rendering on mw1026 is OK: HTTP OK: HTTP/1.1 200 OK - 67488 bytes in 5.071 second response time [11:00:50] RECOVERY - Apache HTTP on mw1026 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 2.697 second response time [11:00:58] RECOVERY - Apache HTTP on mw1023 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.060 second response time [11:00:58] RECOVERY - Apache HTTP on mw1024 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.407 second response time [11:00:58] RECOVERY - Apache HTTP on mw1028 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 7.759 second response time [11:00:58] RECOVERY - HHVM rendering on mw1030 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 6.841 second response time [11:00:58] PROBLEM - RAID on mw1018 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [11:00:58] RECOVERY - HHVM rendering on mw1023 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.234 second response time [11:00:59] RECOVERY - Apache HTTP on mw1027 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.200 second response time [11:01:05] RECOVERY - HHVM rendering on mw1163 is OK: HTTP OK: HTTP/1.1 200 OK - 65222 bytes in 2.889 second response time [11:01:05] RECOVERY - HHVM rendering on mw1029 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.255 second response time [11:01:05] RECOVERY - HHVM rendering on mw1020 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.252 second response time [11:01:05] RECOVERY - HHVM rendering on mw1027 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.757 second response time [11:01:05] RECOVERY - HHVM rendering on mw1022 is OK: HTTP OK: HTTP/1.1 200 OK - 67495 bytes in 0.237 second response time [11:01:06] RECOVERY - Apache HTTP on mw1030 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.080 second response time [11:01:06] RECOVERY - HHVM rendering on mw1031 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.228 second response time [11:01:07] RECOVERY - HHVM rendering on mw1019 is OK: HTTP OK: HTTP/1.1 200 OK - 67494 bytes in 0.256 second response time [11:01:15] RECOVERY - Apache HTTP on mw1020 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.095 second response time [11:01:15] RECOVERY - HHVM rendering on mw1053 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.352 second response time [11:01:24] RECOVERY - Apache HTTP on mw1021 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.059 second response time [11:01:46] RECOVERY - Apache HTTP on mw1022 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.115 second response time [11:01:46] RECOVERY - HHVM rendering on mw1024 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.161 second response time [11:01:47] RECOVERY - LVS HTTP IPv4 on hhvm-appservers.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.1 200 OK - 65222 bytes in 0.211 second response time [11:01:52] RECOVERY - HHVM rendering on mw1025 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.181 second response time [11:01:53] RECOVERY - Apache HTTP on mw1025 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.071 second response time [11:01:56] RECOVERY - Apache HTTP on mw1018 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.110 second response time [11:01:56] RECOVERY - HHVM rendering on mw1021 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.171 second response time [11:02:18] RECOVERY - RAID on mw1018 is OK: OK: no RAID installed [11:02:18] RECOVERY - Apache HTTP on mw1031 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.043 second response time [11:02:18] RECOVERY - HHVM rendering on mw1017 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.294 second response time [11:02:18] RECOVERY - Apache HTTP on mw1019 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.060 second response time [11:02:19] RECOVERY - Apache HTTP on mw1163 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.076 second response time [11:02:19] RECOVERY - Apache HTTP on mw1029 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.054 second response time [11:02:19] RECOVERY - Apache HTTP on mw1017 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.058 second response time [11:02:26] PROBLEM - HTTP 5xx req/min on tungsten is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [11:02:27] RECOVERY - HHVM rendering on mw1018 is OK: HTTP OK: HTTP/1.1 200 OK - 67487 bytes in 0.232 second response time [11:04:28] (03PS1) 10Giuseppe Lavagetto: re-set hhvm percentage of anons to 10% until we have a stable fix for memory exhaustion. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170515 [11:04:32] <_joe_> sDrewth: is it ok now? [11:04:44] (03CR) 10Giuseppe Lavagetto: [C: 032] re-set hhvm percentage of anons to 10% until we have a stable fix for memory exhaustion. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170515 (owner: 10Giuseppe Lavagetto) [11:05:06] <_joe_> meh, wrong commit message [11:05:07] yes _joe_ better [11:05:21] th [11:05:52] x [11:05:58] appreciate your work :-) [11:06:10] <_joe_> you should not [11:06:27] well, stuff yah then! :-P [11:06:31] <_joe_> I'm responsible for this, as I said I was ok moving to 15% of the traffic to hhvm :P [11:06:50] <_joe_> and I should've known better. [11:06:54] * sDrewth puts _joe_ in a big box, and ships him to Mozambique [11:07:20] via Virgin Intergalactic [11:07:35] !log oblivian Synchronized wmf-config/CommonSettings.php: re-set hhvm to 5% of users (duration: 00m 05s) [11:07:43] <_joe_> errr [11:07:43] Logged the message, Master [11:07:49] <_joe_> "_ [11:10:45] hhvm has been hitting a few hurdles :-/ [11:11:36] <_joe_> sDrewth: no it's us making irrational decisions (me specifically). I think we're actually very near to the point where it's ready [11:11:49] <_joe_> we just need one patch; which may be around on monday [11:12:49] nice [11:13:00] shit, something has logged me out [11:13:25] weird, then logged me back in again [11:14:01] got an IP edit warning, then the pop-up to say reload page, you have been logged in [11:14:56] <_joe_> oh I have no idea about that. [11:15:14] +1 [11:16:50] RECOVERY - HTTP 5xx req/min on tungsten is OK: OK: Less than 1.00% above the threshold [250.0] [12:10:10] freaking hell. temporary session data loss is a repetitive and annoying thing [12:21:29] (03CR) 10PiRSquared17: "Does this need consensus?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170311 (owner: 10Hoo man) [12:28:11] RECOVERY - HTTP error ratio anomaly detection on tungsten is OK: OK: No anomaly detected [13:06:52] PROBLEM - Disk space on mw1163 is CRITICAL: DISK CRITICAL - free space: /tmp 0 MB (0% inode=98%): [13:08:00] RECOVERY - Disk space on mw1163 is OK: DISK OK [13:44:50] (03CR) 10Hashar: [C: 031] ferm: lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/170498 (owner: 10John F. Lewis) [13:49:43] (03CR) 10Hashar: [C: 031] "Confirmed the file renaming just changed arrow alignments with:" [puppet] - 10https://gerrit.wikimedia.org/r/170489 (owner: 10John F. Lewis) [14:22:52] PROBLEM - HHVM rendering on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:26:43] PROBLEM - Apache HTTP on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:27:02] RECOVERY - HHVM rendering on mw1017 is OK: HTTP OK: HTTP/1.1 200 OK - 67976 bytes in 4.622 second response time [14:27:32] RECOVERY - Apache HTTP on mw1017 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.133 second response time [14:49:21] PROBLEM - puppet last run on mw1213 is CRITICAL: CRITICAL: Puppet has 1 failures [15:03:14] PROBLEM - Apache HTTP on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:03:16] PROBLEM - HHVM rendering on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:03:55] RECOVERY - Apache HTTP on mw1017 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 1.592 second response time [15:04:15] RECOVERY - HHVM rendering on mw1017 is OK: HTTP OK: HTTP/1.1 200 OK - 67976 bytes in 2.532 second response time [15:07:03] RECOVERY - puppet last run on mw1213 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [15:09:03] PROBLEM - puppet last run on amslvs1 is CRITICAL: CRITICAL: puppet fail [15:21:44] PROBLEM - Apache HTTP on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:22:03] PROBLEM - HHVM rendering on mw1017 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:26:55] RECOVERY - Apache HTTP on mw1017 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 3.587 second response time [15:27:13] RECOVERY - puppet last run on amslvs1 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:28:24] RECOVERY - HHVM rendering on mw1017 is OK: HTTP OK: HTTP/1.1 200 OK - 67976 bytes in 7.159 second response time [15:47:45] (03PS1) 10Dereckson: Set timezone on cs.wikipedia and cs.wikinews to Europe/Prague [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170517 (https://bugzilla.wikimedia.org/71902) [16:56:04] PROBLEM - puppet last run on amssq33 is CRITICAL: CRITICAL: puppet fail [17:15:23] RECOVERY - puppet last run on amssq33 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [17:28:54] PROBLEM - Disk space on ocg1001 is CRITICAL: DISK CRITICAL - free space: / 350 MB (3% inode=72%): [18:09:55] PROBLEM - Host search-pool3.svc.eqiad.wmnet is DOWN: CRITICAL - Plugin timed out after 15 seconds [18:11:03] RECOVERY - Host search-pool3.svc.eqiad.wmnet is UP: PING OK - Packet loss = 0%, RTA = 2.42 ms [18:11:07] manybubbles, is that you? [18:51:55] (03PS1) 10Dereckson: Improving comments for wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170526 [19:18:43] (03PS1) 10Dereckson: Removed $wgMaxArticleSize [mediawiki-config] - 10https://gerrit.wikimedia.org/r/170527 [19:30:16] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [19:51:00] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [22:33:09] PROBLEM - Disk space on analytics1003 is CRITICAL: DISK CRITICAL - free space: /srv/log 11260 MB (3% inode=99%):