[00:35:13] <wikibugs>	 (03PS1) 10Tim Starling: Do a 301 redirect for wiki requests to URLs starting with /? [puppet] - 10https://gerrit.wikimedia.org/r/411522
[00:36:13] <wikibugs>	 (03CR) 10Tim Starling: "Untested" [puppet] - 10https://gerrit.wikimedia.org/r/411522 (owner: 10Tim Starling)
[01:02:01] <wikibugs>	 (03PS1) 10Andrew Bogott: labweb horizon: switch config files to version 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/411545
[01:02:09] <wikibugs>	 (03PS2) 10Andrew Bogott: labweb horizon: switch config files to version 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/411545
[01:02:42] <wikibugs>	 (03CR) 10Andrew Bogott: [C: 032] labweb horizon: switch config files to version 'ocata' [puppet] - 10https://gerrit.wikimedia.org/r/411545 (owner: 10Andrew Bogott)
[01:16:22] <wikibugs>	 (03PS1) 10Andrew Bogott: labweb horizon: share memcached among labwebs [puppet] - 10https://gerrit.wikimedia.org/r/411546 (https://phabricator.wikimedia.org/T187506)
[01:17:02] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] labweb horizon: share memcached among labwebs [puppet] - 10https://gerrit.wikimedia.org/r/411546 (https://phabricator.wikimedia.org/T187506) (owner: 10Andrew Bogott)
[01:23:42] <wikibugs>	 (03PS2) 10Andrew Bogott: labweb horizon: share memcached among labwebs [puppet] - 10https://gerrit.wikimedia.org/r/411546 (https://phabricator.wikimedia.org/T187506)
[01:24:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] labweb horizon: share memcached among labwebs [puppet] - 10https://gerrit.wikimedia.org/r/411546 (https://phabricator.wikimedia.org/T187506) (owner: 10Andrew Bogott)
[01:25:38] <wikibugs>	 (03PS3) 10Andrew Bogott: labweb horizon: share memcached among labwebs [puppet] - 10https://gerrit.wikimedia.org/r/411546 (https://phabricator.wikimedia.org/T187506)
[01:25:47] <wikibugs>	 (03CR) 10jerkins-bot: [V: 04-1] labweb horizon: share memcached among labwebs [puppet] - 10https://gerrit.wikimedia.org/r/411546 (https://phabricator.wikimedia.org/T187506) (owner: 10Andrew Bogott)
[01:40:01] <wikibugs>	 (03PS4) 10Andrew Bogott: labweb horizon: share memcached among labwebs [puppet] - 10https://gerrit.wikimedia.org/r/411546 (https://phabricator.wikimedia.org/T187506)
[02:15:36] <icinga-wm>	 PROBLEM - Disk space on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:17:46] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:17:55] <icinga-wm>	 PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:18:06] <icinga-wm>	 PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:18:06] <icinga-wm>	 PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:18:25] <icinga-wm>	 PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:18:26] <icinga-wm>	 PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:18:45] <icinga-wm>	 PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds
[02:21:10] <wikibugs>	 (03PS1) 10Krinkle: webperf: Add some commments to navtiming test cases [puppet] - 10https://gerrit.wikimedia.org/r/411547
[02:25:25] <icinga-wm>	 RECOVERY - Check systemd state on stat1005 is OK: OK - running: The system is fully operational
[02:25:35] <icinga-wm>	 RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient
[02:25:55] <icinga-wm>	 RECOVERY - configured eth on stat1005 is OK: OK - interfaces up
[02:26:15] <icinga-wm>	 RECOVERY - DPKG on stat1005 is OK: All packages OK
[02:26:15] <icinga-wm>	 RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0
[02:26:26] <icinga-wm>	 PROBLEM - puppet last run on lvs4007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues
[02:27:46] <icinga-wm>	 RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures
[02:48:45] <icinga-wm>	 RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Sat 2018-02-17 02:48:38 UTC.
[02:51:26] <icinga-wm>	 RECOVERY - puppet last run on lvs4007 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures
[03:12:40] <logmsgbot>	 !log demon@tin Pruned MediaWiki: 1.31.0-wmf.17 (duration: 04m 32s)
[03:12:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:15:20] <logmsgbot>	 !log demon@tin Pruned MediaWiki: 1.31.0-wmf.20 [keeping static files] (duration: 01m 17s)
[03:15:33] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[03:26:15] <icinga-wm>	 PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 857.55 seconds
[03:58:19] <wikibugs>	 (03CR) 10Legoktm: [C: 031] Remove redundant wgTemplateSandboxEditNamespaces addition [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363531 (owner: 10Legoktm)
[04:04:16] <icinga-wm>	 RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 164.11 seconds
[10:29:56] <wikibugs>	 10Operations: Remove dpatrick from security@ - https://phabricator.wikimedia.org/T187615#3980810 (10Reedy)
[10:47:06] <wikibugs>	 10Operations: Remove dpatrick from security@ - https://phabricator.wikimedia.org/T187615#3980852 (10Reedy)
[12:45:58] <wikibugs>	 (03CR) 10Hashar: [C: 031] "Chad wrote:" [puppet] - 10https://gerrit.wikimedia.org/r/411211 (owner: 10Muehlenhoff)
[14:58:26] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1228 is CRITICAL: CRITICAL - load average: 35.27, 33.92, 32.23
[16:30:04] <wikibugs>	 10Puppet, 10cloud-services-team (Kanban): role::puppet::self referenced in puppet_ssldir.rb - https://phabricator.wikimedia.org/T187622#3981036 (10Andrew)
[16:37:43] <wikibugs>	 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1043 - https://phabricator.wikimedia.org/T187542#3981046 (10jcrespo) The usage of the tafs is ok.  Note the substitution host is already online and in production, and the old hosts set as spare. What we wanted to to...
[16:42:13] <wikibugs>	 (03PS1) 10Andrew Bogott: remove role::toollabs::puppetmaster and toollabs::puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/411614 (https://phabricator.wikimedia.org/T182810)
[16:42:15] <wikibugs>	 (03PS1) 10Andrew Bogott: remove role::puppet::self [puppet] - 10https://gerrit.wikimedia.org/r/411615 (https://phabricator.wikimedia.org/T182810)
[16:42:17] <wikibugs>	 (03PS1) 10Andrew Bogott: remove 'puppet' module [puppet] - 10https://gerrit.wikimedia.org/r/411616 (https://phabricator.wikimedia.org/T182810)
[16:57:40] <jynus>	 enwiki query traffic increased a 20% very quickly a few hours ago, that is >20K queries per second increse in a few hours, and 10x the number of writes 
[17:00:46] <icinga-wm>	 RECOVERY - High CPU load on API appserver on mw1228 is OK: OK - load average: 8.18, 13.93, 23.30
[17:03:21] <jynus>	 it started around 14:54
[17:31:55] <wikibugs>	 10Operations, 10Ops-Access-Requests: Requesting shell access and access to groups 'analytics-privatedata-users' and 'researchers' for katielin (katie) - https://phabricator.wikimedia.org/T187623#3981063 (10katielin)
[17:33:38] <twentyafterfour>	 !log restarting apache on phab1001 to clear deadlocked workers. refs T182832
[17:33:53] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[17:37:17] <wikibugs>	 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3981076 (10mmodell) @elukey, @dzahn:  do you think that...
[17:37:54] <wikibugs>	 10Operations, 10Phabricator, 10Release-Engineering-Team, 10User-Elukey: Phabricator down due to "Failed to `proc_open()`: proc_open() expects parameter 2 to be array" - https://phabricator.wikimedia.org/T186620#3981078 (10mmodell)
[17:37:59] <wikibugs>	 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban), 10User-Elukey: Apache on phab1001 is gradually leaking worker processes which are stuck in "Gracefully finishing" state - https://phabricator.wikimedia.org/T182832#3981080 (10mmodell)
[17:42:25] <icinga-wm>	 PROBLEM - Disk space on rhenium is CRITICAL: DISK CRITICAL - free space: / 1720 MB (3% inode=96%)
[17:59:11] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team, 10hardware-requests: eqiad: (4) systems for CirrusSearch Elasticssearch replica service - https://phabricator.wikimedia.org/T187627#3981122 (10bd808)
[18:01:01] <wikibugs>	 10Operations, 10Cloud-VPS, 10cloud-services-team, 10hardware-requests: eqiad: (4) systems for CirrusSearch Elasticssearch replica service - https://phabricator.wikimedia.org/T187627#3981122 (10bd808) @robh you may be able to find an email thread titled "FY17/18: Putting a live copy of CirrusSearch data in...
[19:16:25] <icinga-wm>	 PROBLEM - HHVM rendering on mw2128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:17:15] <icinga-wm>	 RECOVERY - HHVM rendering on mw2128 is OK: HTTP OK: HTTP/1.1 200 OK - 74327 bytes in 0.254 second response time
[19:42:25] <icinga-wm>	 PROBLEM - Nginx local proxy to apache on mw2123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[19:43:16] <icinga-wm>	 RECOVERY - Nginx local proxy to apache on mw2123 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 617 bytes in 0.184 second response time
[22:11:55] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1226 is CRITICAL: CRITICAL - load average: 33.83, 33.42, 32.03
[22:30:05] <icinga-wm>	 PROBLEM - puppet last run on stat1005 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 4 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[cdh::hadoop::directory /user/spark]
[22:44:26] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1345 is CRITICAL: CRITICAL - load average: 51.74, 49.96, 48.19
[23:01:35] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1345 is CRITICAL: CRITICAL - load average: 52.54, 48.63, 48.20
[23:32:20] <wikibugs>	 10Operations, 10DBA, 10MediaWiki-General-or-Unknown, 10MW-1.31-release-notes (WMF-deploy-2018-02-20 (1.31.0-wmf.22)), and 2 others: Regularly purge expired temporary userrights from DB tables - https://phabricator.wikimedia.org/T176754#3981254 (10MarcoAurelio) My advice is to do this off-SWAT. Talk to @gre...
[23:32:36] <icinga-wm>	 PROBLEM - High CPU load on API appserver on mw1345 is CRITICAL: CRITICAL - load average: 49.77, 48.03, 48.04
[23:37:26] <icinga-wm>	 PROBLEM - Apache HTTP on mw2130 is CRITICAL: CRITICAL - Socket timeout after 10 seconds
[23:38:16] <icinga-wm>	 RECOVERY - Apache HTTP on mw2130 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.121 second response time