[00:03:44] (03PS3) 10Dzahn: delete *.email.donate.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/278353 (https://phabricator.wikimedia.org/T130414) [00:04:15] (03CR) 10jenkins-bot: [V: 04-1] delete *.email.donate.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/278353 (https://phabricator.wikimedia.org/T130414) (owner: 10Dzahn) [00:04:24] (03PS1) 10Krinkle: speed-tests: Add samples for MobileFrontend lazy-load images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280140 (https://phabricator.wikimedia.org/T124390) [00:06:23] (03PS4) 10Dzahn: delete *.email.donate.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/278353 (https://phabricator.wikimedia.org/T130414) [00:07:39] (03PS2) 10Krinkle: speed-tests: Add samples for MobileFrontend lazy-load images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280140 (https://phabricator.wikimedia.org/T124390) [00:09:48] RECOVERY - puppet last run on ms-be3004 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [00:11:46] (03PS3) 10Krinkle: speed-tests: Add samples for MobileFrontend lazy-load images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280140 (https://phabricator.wikimedia.org/T124390) [00:11:59] (03CR) 10Krinkle: [C: 032] speed-tests: Add samples for MobileFrontend lazy-load images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280140 (https://phabricator.wikimedia.org/T124390) (owner: 10Krinkle) [00:12:24] !log mw2090 - reinstalled, re-signed puppet/salt [00:12:25] (03Merged) 10jenkins-bot: speed-tests: Add samples for MobileFrontend lazy-load images [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280140 (https://phabricator.wikimedia.org/T124390) (owner: 10Krinkle) [00:12:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:15:18] !log krinkle@tin Synchronized docroot/wikipedia.org/speed-tests/: (no message) (duration: 00m 32s) [00:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:34:57] PROBLEM - HHVM processes on mw2090 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [00:37:57] PROBLEM - Check size of conntrack table on mw2090 is CRITICAL: CHECK_NRPE: Error - Could not complete SSL handshake. [00:50:18] (03CR) 10GWicke: [C: 031] Bump s-maxage for purged endpoints. [puppet] - 10https://gerrit.wikimedia.org/r/280091 (owner: 10Ppchelko) [01:00:08] !log restbase start update to 3ea08751a8 [01:00:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:02:57] RECOVERY - Check size of conntrack table on mw2090 is OK: OK: nf_conntrack is 0 % full [01:03:26] RECOVERY - HHVM processes on mw2090 is OK: PROCS OK: 6 processes with command name hhvm [01:07:00] !log restbase finish update to 3ea08751a8 [01:07:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:07:08] PROBLEM - HHVM rendering on mw2090 is CRITICAL: Connection refused [01:07:57] PROBLEM - Apache HTTP on mw2090 is CRITICAL: Connection refused [01:11:52] (03CR) 10Tim Landscheidt: create-dbuser change how user grant files are created (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/279995 (owner: 10Rush) [01:17:56] RECOVERY - HHVM rendering on mw2090 is OK: HTTP OK: HTTP/1.1 200 OK - 67969 bytes in 8.992 second response time [01:18:27] RECOVERY - Apache HTTP on mw2090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 614 bytes in 0.131 second response time [01:19:57] PROBLEM - puppet last run on mw2067 is CRITICAL: CRITICAL: puppet fail [01:22:47] mw2090 has just been installed. taking that [01:23:01] 2067.. dunno yet [01:45:27] (03PS2) 10Krinkle: Bump s-maxage for purged endpoints. [puppet] - 10https://gerrit.wikimedia.org/r/280091 (owner: 10Ppchelko) [01:45:45] (03PS1) 10Dzahn: dsh: add wasat, add comments [puppet] - 10https://gerrit.wikimedia.org/r/280144 (https://phabricator.wikimedia.org/T129930) [01:46:03] 6Operations: Security audit for tftp on Carbon - https://phabricator.wikimedia.org/T122210#1898315 (10Dzahn) This seems very related to T130350. [01:46:57] RECOVERY - puppet last run on mw2067 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [01:49:01] (03PS2) 10Dzahn: dsh: add wasat, add comments [puppet] - 10https://gerrit.wikimedia.org/r/280144 (https://phabricator.wikimedia.org/T129930) [01:50:00] (03CR) 10Dzahn: [C: 032] dsh: add wasat, add comments [puppet] - 10https://gerrit.wikimedia.org/r/280144 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [01:59:37] PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 25.00% of data above the critical threshold [100000000.0] [02:02:35] (03PS1) 10Dzahn: re-activate mw2090 in conftool [puppet] - 10https://gerrit.wikimedia.org/r/280145 [02:08:36] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 55.17% of data above the critical threshold [5000000.0] [02:22:47] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [02:27:53] !log mwdeploy@tin sync-l10n completed (1.27.0-wmf.18) (duration: 11m 34s) [02:27:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:28:17] RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [02:30:51] 6Operations: Security audit for tftp on Carbon - https://phabricator.wikimedia.org/T122210#2156932 (10Krenair) I can't see that ticket. Vendor communications? [02:36:39] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Mar 29 02:36:39 UTC 2016 (duration 8m 46s) [02:36:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:40:48] RECOVERY - mediawiki-installation DSH group on wasat is OK: OK [02:43:30] (03PS1) 10BBlack: CP cookie: update for HTTP/2 [puppet] - 10https://gerrit.wikimedia.org/r/280147 (https://phabricator.wikimedia.org/T118892) [02:43:32] (03PS1) 10BBlack: tlsproxy: use do_spdy to control http2-vs-spdy [puppet] - 10https://gerrit.wikimedia.org/r/280148 (https://phabricator.wikimedia.org/T96848) [02:43:34] (03PS1) 10BBlack: configure cp1008 for http2 [puppet] - 10https://gerrit.wikimedia.org/r/280149 (https://phabricator.wikimedia.org/T96848) [02:45:09] (03CR) 10BBlack: [C: 032] CP cookie: update for HTTP/2 [puppet] - 10https://gerrit.wikimedia.org/r/280147 (https://phabricator.wikimedia.org/T118892) (owner: 10BBlack) [02:45:37] !log mwscript deleteEqualMessages.php --wiki zhwikinews [02:45:39] !log mwscript deleteEqualMessages.php --wiki zhwikisource [02:45:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:45:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:49:39] (03PS2) 10BBlack: configure cp1008 for http2 [puppet] - 10https://gerrit.wikimedia.org/r/280149 (https://phabricator.wikimedia.org/T96848) [02:49:41] (03PS2) 10BBlack: tlsproxy: use do_spdy to control http2-vs-spdy [puppet] - 10https://gerrit.wikimedia.org/r/280148 (https://phabricator.wikimedia.org/T96848) [02:50:56] 6Operations, 6Performance-Team, 10Traffic, 13Patch-For-Review: Update CP cookie VCL once HTTP/2 support lands - https://phabricator.wikimedia.org/T118892#2156942 (10BBlack) 5Open>3Resolved a:3BBlack [02:50:59] 6Operations, 6Performance-Team, 10Traffic, 13Patch-For-Review: Support HTTP/2 - https://phabricator.wikimedia.org/T96848#2156944 (10BBlack) [02:51:28] (03CR) 10BBlack: [C: 032] tlsproxy: use do_spdy to control http2-vs-spdy [puppet] - 10https://gerrit.wikimedia.org/r/280148 (https://phabricator.wikimedia.org/T96848) (owner: 10BBlack) [02:54:01] (03CR) 10BBlack: [C: 032] configure cp1008 for http2 [puppet] - 10https://gerrit.wikimedia.org/r/280149 (https://phabricator.wikimedia.org/T96848) (owner: 10BBlack) [03:17:11] 6Operations: Security audit for tftp on Carbon - https://phabricator.wikimedia.org/T122210#2156979 (10Dzahn) No, it's set to "Security - Other confidential issue " [03:29:13] 6Operations, 10ops-eqiad, 10Traffic: investigate radon crash - https://phabricator.wikimedia.org/T131053#2157006 (10BBlack) Still up, turning traffic back on for now... [03:29:26] !log cr1/2-eqiad: re-activating static routes for ns0/ns1 (ipv4/ipv6) pointing to radon - [03:29:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:53:57] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [04:02:01] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.055 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [04:15:11] PROBLEM - Host mw2027 is DOWN: PING CRITICAL - Packet loss = 100% [04:16:41] RECOVERY - Host mw2027 is UP: PING OK - Packet loss = 0%, RTA = 38.05 ms [05:23:00] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [05:28:20] PROBLEM - Check size of conntrack table on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:29] PROBLEM - RAID on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:30] PROBLEM - dhclient process on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:31] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:39] PROBLEM - restbase endpoints health on restbase-test2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:49] PROBLEM - configured eth on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:49] PROBLEM - restbase endpoints health on restbase2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:59] PROBLEM - graphoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:28:59] PROBLEM - restbase endpoints health on restbase2005 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:10] PROBLEM - restbase endpoints health on restbase-test2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:10] PROBLEM - restbase endpoints health on restbase2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:20] PROBLEM - puppet last run on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:29] PROBLEM - salt-minion processes on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:31] PROBLEM - restbase endpoints health on restbase2006 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:31] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:40] PROBLEM - DPKG on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:29:40] PROBLEM - Disk space on alsafi is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:30:00] PROBLEM - graphoid endpoints health on scb2002 is CRITICAL: /{domain}/v1/{format}/{title}/{revid}/{id} (retrieve PNG from mediawiki.org) is CRITICAL: Test retrieve PNG from mediawiki.org returned the unexpected status 400 (expecting: 200) [05:30:09] PROBLEM - restbase endpoints health on restbase2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:30:09] PROBLEM - restbase endpoints health on restbase-test2003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:32:48] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.089 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [06:11:46] PROBLEM - SSH on alsafi is CRITICAL: Server answer [06:13:26] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [06:29:36] PROBLEM - puppet last run on cp3017 is CRITICAL: CRITICAL: puppet fail [06:29:37] PROBLEM - puppet last run on scb1001 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:46] PROBLEM - puppet last run on snapshot1007 is CRITICAL: CRITICAL: puppet fail [06:29:47] PROBLEM - puppet last run on mw2081 is CRITICAL: CRITICAL: Puppet has 2 failures [06:30:36] PROBLEM - puppet last run on cp4010 is CRITICAL: CRITICAL: puppet fail [06:30:46] PROBLEM - puppet last run on mw1251 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:15] PROBLEM - puppet last run on lvs2002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:36] PROBLEM - puppet last run on db1056 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:55] PROBLEM - puppet last run on mw2018 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:25] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 3 failures [06:32:46] PROBLEM - puppet last run on mw2045 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:15] PROBLEM - NTP on alsafi is CRITICAL: NTP CRITICAL: No response from NTP server [06:37:56] PROBLEM - SSH on alsafi is CRITICAL: Server answer [06:41:17] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [06:49:16] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Puppet has 1 failures [06:53:46] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [06:53:55] akosiaris: dah, possible to delete pushed tag for apertium? [06:55:55] RECOVERY - puppet last run on scb1001 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [06:55:57] RECOVERY - puppet last run on db1056 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:56:05] RECOVERY - puppet last run on mw2081 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [06:56:46] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [06:56:47] RECOVERY - puppet last run on cp4010 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [06:56:56] RECOVERY - puppet last run on mw1251 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [06:57:07] RECOVERY - puppet last run on mw2045 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [06:57:26] RECOVERY - puppet last run on lvs2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:36] RECOVERY - puppet last run on cp3017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:45] RECOVERY - puppet last run on snapshot1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:06] RECOVERY - puppet last run on mw2018 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:00:02] (03CR) 10Gehel: [C: 031] Remove unused import in labs [puppet] - 10https://gerrit.wikimedia.org/r/279896 (owner: 10Ladsgroup) [07:01:53] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.073 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [07:07:12] PROBLEM - BGP status on cr1-eqord is CRITICAL: Missing argument in sprintf at /usr/lib/nagios/plugins/check_bgp line 182. [07:10:53] PROBLEM - SSH on alsafi is CRITICAL: Server answer [07:12:41] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [07:15:52] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [07:21:22] PROBLEM - SSH on alsafi is CRITICAL: Server answer [07:24:41] RECOVERY - BGP status on cr1-eqord is OK: OK: host 208.80.154.198, sessions up: 23, down: 0, shutdown: 2 [07:24:51] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [07:28:31] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/2/0: down - Transit: NTT (service ID 253066) {#11376} [10Gbps]BR [07:33:22] mobrovac: who can help in https://phabricator.wikimedia.org/T131145 for cxserver? [07:33:52] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [07:37:01] PROBLEM - BGP status on cr1-eqord is CRITICAL: Missing argument in sprintf at /usr/lib/nagios/plugins/check_bgp line 182. [07:37:02] PROBLEM - SSH on alsafi is CRITICAL: Server answer [07:38:52] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [07:44:01] RECOVERY - BGP status on cr1-eqord is OK: OK: host 208.80.154.198, sessions up: 25, down: 0, shutdown: 0 [07:45:51] PROBLEM - SSH on alsafi is CRITICAL: Server answer [07:49:22] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [07:50:41] PROBLEM - puppet last run on db2029 is CRITICAL: CRITICAL: puppet fail [07:54:40] (03CR) 10KartikMistry: "Please try again?" [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/269115 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [07:58:11] PROBLEM - SSH on alsafi is CRITICAL: Server answer [08:09:45] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [08:10:12] 6Operations, 10ops-eqiad: upgrade package_builder machine with SSD - https://phabricator.wikimedia.org/T130759#2157239 (10MoritzMuehlenhoff) Yeah, using one of the unused SSDs would be great, 300 GB SSDs are perfectly fine, even the Linux or OpenJDK builds are only < 1 GB (plus 10 at build time) [08:19:26] RECOVERY - puppet last run on db2029 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:21:56] PROBLEM - puppet last run on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:22:07] PROBLEM - salt-minion processes on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:22:26] PROBLEM - spamassassin on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:22:26] PROBLEM - DPKG on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:22:27] PROBLEM - configured eth on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:22:46] PROBLEM - RAID on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:22:56] PROBLEM - Check size of conntrack table on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:23:06] PROBLEM - Disk space on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:23:16] PROBLEM - dhclient process on mx1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [08:27:36] PROBLEM - Exim SMTP on mx1001 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [08:28:19] (03CR) 10Jcrespo: [C: 04-1] Refactor of the CAs certificate genearation (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/279694 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [08:28:48] (03CR) 10Ema: [C: 032 V: 032] Install VMODs on Varnish 4 instances [puppet] - 10https://gerrit.wikimedia.org/r/279617 (https://phabricator.wikimedia.org/T124281) (owner: 10Ema) [08:33:05] PROBLEM - SSH on alsafi is CRITICAL: Server answer [08:35:50] (03CR) 10Volans: Refactor of the CAs certificate genearation (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/279694 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [08:38:26] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [08:39:05] (03CR) 10Jcrespo: "onlyif" (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/279694 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [08:45:27] PROBLEM - SSH on alsafi is CRITICAL: Server answer [08:45:29] !log upgraded chromium on osmium [08:45:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:48:55] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [08:54:06] PROBLEM - SSH on alsafi is CRITICAL: Server answer [08:55:56] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [09:00:06] PROBLEM - NTP on mx1001 is CRITICAL: NTP CRITICAL: No response from NTP server [09:02:34] so we've lost mx1001 and alsafi? [09:04:46] (03CR) 10Jcrespo: "Did you check hiera correctness for enable/disabled?" [puppet] - 10https://gerrit.wikimedia.org/r/280089 (https://phabricator.wikimedia.org/T129930) (owner: 10Dzahn) [09:05:02] !log hard-resetting mx1001, I/O-stuck (qemu bug?) [09:05:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:05:25] RECOVERY - DPKG on mx1001 is OK: All packages OK [09:05:26] RECOVERY - configured eth on mx1001 is OK: OK - interfaces up [09:05:37] RECOVERY - RAID on mx1001 is OK: OK: no RAID installed [09:05:55] RECOVERY - Check size of conntrack table on mx1001 is OK: OK: nf_conntrack is 0 % full [09:05:56] RECOVERY - Disk space on mx1001 is OK: DISK OK [09:06:07] RECOVERY - dhclient process on mx1001 is OK: PROCS OK: 0 processes with command name dhclient [09:06:36] RECOVERY - puppet last run on mx1001 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [09:06:36] PROBLEM - SSH on alsafi is CRITICAL: Server answer [09:06:45] RECOVERY - salt-minion processes on mx1001 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:06:46] RECOVERY - Exim SMTP on mx1001 is OK: OK - Certificate will expire on 09/22/2016 18:01. [09:07:05] RECOVERY - spamassassin on mx1001 is OK: PROCS OK: 3 processes with args spamd [09:08:05] !log hard-resetting alsafi, I/O-stuck (qemu bug?) [09:08:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:08:25] RECOVERY - SSH on alsafi is OK: SSH OK - OpenSSH_6.7p1 Debian-5+deb8u1 (protocol 2.0) [09:08:35] RECOVERY - RAID on alsafi is OK: OK: no RAID installed [09:08:35] RECOVERY - graphoid endpoints health on scb2001 is OK: All endpoints are healthy [09:08:37] RECOVERY - Check size of conntrack table on alsafi is OK: OK: nf_conntrack is 0 % full [09:08:45] RECOVERY - restbase endpoints health on restbase-test2002 is OK: All endpoints are healthy [09:08:47] RECOVERY - restbase endpoints health on restbase2001 is OK: All endpoints are healthy [09:08:55] RECOVERY - configured eth on alsafi is OK: OK - interfaces up [09:08:55] RECOVERY - restbase endpoints health on restbase2006 is OK: All endpoints are healthy [09:09:15] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [09:09:15] RECOVERY - dhclient process on alsafi is OK: PROCS OK: 0 processes with command name dhclient [09:09:15] RECOVERY - graphoid endpoints health on scb2002 is OK: All endpoints are healthy [09:09:26] RECOVERY - restbase endpoints health on restbase2002 is OK: All endpoints are healthy [09:09:26] RECOVERY - DPKG on alsafi is OK: All packages OK [09:09:46] RECOVERY - salt-minion processes on alsafi is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [09:09:55] RECOVERY - Disk space on alsafi is OK: DISK OK [09:09:56] RECOVERY - restbase endpoints health on restbase-test2003 is OK: All endpoints are healthy [09:09:56] RECOVERY - restbase endpoints health on restbase2003 is OK: All endpoints are healthy [09:09:56] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [09:10:06] RECOVERY - restbase endpoints health on restbase2005 is OK: All endpoints are healthy [09:10:06] RECOVERY - restbase endpoints health on restbase-test2001 is OK: All endpoints are healthy [09:10:50] (03PS1) 10Ema: APT pinning for varnishkafka [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/280162 (https://phabricator.wikimedia.org/T122880) [09:11:15] PROBLEM - puppet last run on alsafi is CRITICAL: CRITICAL: Puppet has 1 failures [09:16:10] (03PS1) 10Ema: Varnishkafka APT pinning moved to submodule [puppet] - 10https://gerrit.wikimedia.org/r/280163 (https://phabricator.wikimedia.org/T122880) [09:22:44] (03PS2) 10Volans: Refactor of the CAs certificate genearation [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/279694 (https://phabricator.wikimedia.org/T111654) [09:24:27] RECOVERY - NTP on mx1001 is OK: NTP OK: Offset -0.002329468727 secs [09:27:35] RECOVERY - NTP on alsafi is OK: NTP OK: Offset -0.003122806549 secs [09:32:27] PROBLEM - Kafka Broker Replica Max Lag on kafka1022 is CRITICAL: CRITICAL: 65.52% of data above the critical threshold [5000000.0] [09:34:05] PROBLEM - DPKG on labmon1001 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:35:47] RECOVERY - DPKG on labmon1001 is OK: All packages OK [09:45:33] (03CR) 10Jonas Kress (WMDE): "What is blocking this? Can we push this to a point where we can merge it?" [puppet] - 10https://gerrit.wikimedia.org/r/274864 (https://phabricator.wikimedia.org/T126730) (owner: 10Smalyshev) [09:49:59] (03CR) 10Alexandros Kosiaris: "dpkg-source: error: syntax error in lttoolbox/debian/control at line 60: duplicate field Provides found" [debs/contenttranslation/lttoolbox] - 10https://gerrit.wikimedia.org/r/269115 (https://phabricator.wikimedia.org/T124137) (owner: 10KartikMistry) [09:50:10] !log powercycle ms-be2008, got stuck again while diagnosing failed /dev/sdl [09:50:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:54:26] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [09:57:26] RECOVERY - Kafka Broker Replica Max Lag on kafka1022 is OK: OK: Less than 50.00% above the threshold [1000000.0] [10:00:19] (03PS1) 10DCausse: Bump CirrusSearchRequestSet rev to latest [puppet] - 10https://gerrit.wikimedia.org/r/280167 [10:01:45] (03Restored) 10ArielGlenn: tox integration to run flake8 [dumps] - 10https://gerrit.wikimedia.org/r/242494 (https://phabricator.wikimedia.org/T55354) (owner: 10Hashar) [10:02:31] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Various inline comments (as well as in the commit message)" (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/278555 (owner: 10Sabya) [10:02:46] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.044 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [10:09:45] (03PS4) 10Reedy: Remove old throttle rules. Swap array() -> [] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/278454 [10:09:50] (03CR) 10Reedy: [C: 032] Remove old throttle rules. Swap array() -> [] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/278454 (owner: 10Reedy) [10:09:56] 6Operations: Security audit for tftp on Carbon - https://phabricator.wikimedia.org/T122210#2157277 (10Krenair) Nope. Look at the visibility policy, it doesn't match the security option. When you go to view it, it says 'Members of the project "acl*operations-team" can take this action.' [10:10:20] (03Merged) 10jenkins-bot: Remove old throttle rules. Swap array() -> [] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/278454 (owner: 10Reedy) [10:10:58] 6Operations, 10ops-codfw: ms-be2008.codfw.wmnet: slot=1 dev=sdl failed - https://phabricator.wikimedia.org/T131147#2157278 (10fgiunchedi) [10:11:32] !log reedy@tin Synchronized wmf-config/throttle.php: Remove old throttle rules (duration: 00m 44s) [10:11:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:11:54] (03PS1) 10Hashar: tox integration to run flake8 [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280169 (https://phabricator.wikimedia.org/T55354) [10:12:47] 6Operations: Allocate 2 analytics machines to experiment with a jupyterhub notebook service - https://phabricator.wikimedia.org/T130760#2145652 (10mark) For non-production / just testing usage, this is approved. [10:13:05] (03PS1) 10Reedy: Remove upload7 references [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) [10:13:14] (03CR) 10ArielGlenn: [C: 032] tox integration to run flake8 [dumps] - 10https://gerrit.wikimedia.org/r/242494 (https://phabricator.wikimedia.org/T55354) (owner: 10Hashar) [10:13:59] (03CR) 10ArielGlenn: [C: 032] tox integration to run flake8 [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280169 (https://phabricator.wikimedia.org/T55354) (owner: 10Hashar) [10:16:41] (03PS2) 10Ema: Move Varnishkafka APT pinning to role definition [puppet] - 10https://gerrit.wikimedia.org/r/280163 (https://phabricator.wikimedia.org/T122880) [10:18:31] (03Abandoned) 10Ema: APT pinning for varnishkafka [puppet/varnishkafka] - 10https://gerrit.wikimedia.org/r/280162 (https://phabricator.wikimedia.org/T122880) (owner: 10Ema) [10:24:15] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 120, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-5/0/2: down - Core: cr2-ulsfo:xe-1/3/0 (Zayo, OGYX/124337//ZYO, 38.8ms) {#11541} [10Gbps wave]BR [10:24:57] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 75, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/3/0: down - Core: cr1-codfw:xe-5/0/2 (Zayo, OGYX/124337//ZYO, 38.8ms) {#?} [10Gbps wave]BR [10:25:04] (03PS1) 10Hashar: Fix a few flake8 errors [dumps] - 10https://gerrit.wikimedia.org/r/280172 [10:33:18] (03CR) 10Muehlenhoff: [C: 031] "Very nice! It doesn't seem lax to me, BTW. One possible enhancement would be SystemCallFilter, but this requires quite some effort to get " [puppet] - 10https://gerrit.wikimedia.org/r/279952 (owner: 10BBlack) [10:34:24] (03CR) 10ArielGlenn: "I think there's more stuff to remove in InitialiseSettings, i.e. wikimania2005wiki and others, if I am not misreading things." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) (owner: 10Reedy) [10:42:00] (03CR) 10Reedy: "reedy@ubuntu64-web-esxi:~/git/operations/mediawiki-config$ grep upload7 wmf-config/*" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) (owner: 10Reedy) [10:42:56] (03PS3) 10Ema: Move Varnishkafka APT pinning to role definition [puppet] - 10https://gerrit.wikimedia.org/r/280163 (https://phabricator.wikimedia.org/T122880) [10:47:31] (03CR) 10ArielGlenn: "So it was my eyes then. The labs ones can stay, (do they have swift yet?)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) (owner: 10Reedy) [11:01:13] (03CR) 10Hashar: [C: 04-1] "That is still being used on beta cluster which serves upload material out of a NFS share :-(" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) (owner: 10Reedy) [11:01:46] (03CR) 10Reedy: "There are already overrides for labs in place as per the grep above?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280170 (https://phabricator.wikimedia.org/T129586) (owner: 10Reedy) [11:03:10] 6Operations, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: /mnt/upload7 does not exist anywhere, yet it is referenced in multiple places in wmf-config - https://phabricator.wikimedia.org/T129586#2157338 (10hashar) `/mnt/upload7` is still being used on beta cluster which serves upload material out of a N... [11:05:28] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 122, down: 0, dormant: 0, excluded: 0, unused: 0 [11:06:17] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [11:10:41] (03PS2) 10Faidon Liambotis: nagios: rewrite check_bgp from scratch [puppet] - 10https://gerrit.wikimedia.org/r/279971 [11:12:34] 6Operations, 6Analytics-Kanban, 13Patch-For-Review: nf_conntrack warnings for kafka hosts - https://phabricator.wikimedia.org/T131028#2157352 (10MoritzMuehlenhoff) Connection rates are back to normal, around 115k on all kafka1* hosts. [11:13:14] (03CR) 10Faidon Liambotis: "Re-rewrote it between PS1 and PS2 :)" [puppet] - 10https://gerrit.wikimedia.org/r/279971 (owner: 10Faidon Liambotis) [11:13:16] (03PS3) 10Faidon Liambotis: nagios: rewrite check_bgp from scratch [puppet] - 10https://gerrit.wikimedia.org/r/279971 [11:13:34] (03CR) 10Faidon Liambotis: [C: 032 V: 032] "Seems to work; post-merge reviews welcome." [puppet] - 10https://gerrit.wikimedia.org/r/279971 (owner: 10Faidon Liambotis) [11:26:38] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [11:28:28] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [11:28:45] (03PS1) 10ArielGlenn: dumps: clean up directory layout so sample files, tools are clearly marked [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280181 [11:32:25] (03CR) 10ArielGlenn: [C: 032 V: 032] dumps: clean up directory layout so sample files, tools are clearly marked [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280181 (owner: 10ArielGlenn) [11:33:56] PROBLEM - BGP status on cr2-esams is CRITICAL: BGP CRITICAL - The requested table is empty or does not exist [11:34:43] (ignore that) [11:37:08] (03PS1) 10ArielGlenn: flake8 dumps production scripts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280188 [11:40:34] (03CR) 10ArielGlenn: [C: 032 V: 032] flake8 dumps production scripts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280188 (owner: 10ArielGlenn) [11:42:14] (03PS1) 10ArielGlenn: update tox.ini to exclude all dirs but prod dump scripts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280190 [11:51:02] !log Jenkins / Zuul lagging out trying to catch up with a huge number of changes I have sent [11:51:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:51:10] they will catch up just fine [11:51:14] but there is some delay in processing... [11:53:31] (03CR) 10Hashar: "check experimental" [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280190 (owner: 10ArielGlenn) [12:09:34] (03CR) 10ArielGlenn: [C: 032 V: 032] update tox.ini to exclude all dirs but prod dump scripts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280190 (owner: 10ArielGlenn) [12:15:32] 6Operations, 10hardware-requests: new labstore hardware for eqiad - https://phabricator.wikimedia.org/T126089#2157381 (10faidon) [12:15:36] 6Operations, 10ops-eqiad, 6DC-Ops: testing: r430 server / h800 controller / md1200 shelf - https://phabricator.wikimedia.org/T127490#2157378 (10faidon) 5Resolved>3Open Re-opening, as that server is still allocated, running and in-puppet — please keep this or another task open to cleanup (and cleanup! :))... [12:23:14] (03PS1) 10ArielGlenn: flake8 for dumps/tools directory [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280195 [12:26:03] (03PS2) 10ArielGlenn: flake8 for xmldumps-backup/tools directory [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280195 [12:27:05] (03CR) 10Hashar: "recheck" [dumps] - 10https://gerrit.wikimedia.org/r/280172 (owner: 10Hashar) [12:27:45] 6Operations, 10Continuous-Integration-Config, 10Dumps-Generation, 13Patch-For-Review, 7WorkType-Maintenance: operations/dumps repo should pass flake8 - https://phabricator.wikimedia.org/T114249#2157425 (10hashar) We have some basic tox/flake8 setup on the repository. CI now triggers a run of 'tox'. [12:30:28] 6Operations: Add monitoring metric for connection tracking table usage - https://phabricator.wikimedia.org/T131150#2157431 (10MoritzMuehlenhoff) [12:33:13] (03CR) 10ArielGlenn: [C: 032 V: 032] Fix a few flake8 errors [dumps] - 10https://gerrit.wikimedia.org/r/280172 (owner: 10Hashar) [12:33:17] (03PS17) 10Elukey: Remove loglines cache to mitigate a possible memory leak. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/276439 (https://phabricator.wikimedia.org/T124278) [12:34:23] (03CR) 10ArielGlenn: [C: 032 V: 032] flake8 for xmldumps-backup/tools directory [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280195 (owner: 10ArielGlenn) [12:35:44] (03PS1) 10Elukey: Varnish 4 API porting. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/280198 (https://phabricator.wikimedia.org/T124278) [12:36:38] (03PS1) 10ArielGlenn: enable jenkins tox on xmldumps-backup/tools [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280199 [12:36:50] (03PS2) 10ArielGlenn: enable jenkins tox on xmldumps-backup/tools [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280199 [12:37:36] RECOVERY - DPKG on maps-test2002 is OK: All packages OK [12:37:42] (03CR) 10ArielGlenn: [C: 032 V: 032] enable jenkins tox on xmldumps-backup/tools [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280199 (owner: 10ArielGlenn) [12:44:15] RECOVERY - puppet last run on maps-test2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [12:44:25] (03PS18) 10Elukey: Remove loglines cache to mitigate a possible memory leak. [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/276439 (https://phabricator.wikimedia.org/T124278) [12:48:07] (03CR) 10ArielGlenn: [C: 031] "These should cover all salt needs. publish.runner from peers uses the same ports as everything else, which is the only other thing I woul" [puppet] - 10https://gerrit.wikimedia.org/r/276419 (owner: 10Muehlenhoff) [12:51:42] PROBLEM - MariaDB disk space on silver is CRITICAL: DISK CRITICAL - free space: / 0 MB (0% inode=57%) [12:52:22] looking [12:52:27] that is me [12:52:35] ah, ok [12:52:36] trying to make a copy of the backups, solved [12:52:43] the problem is real, I created it [12:52:52] ack [12:53:06] "I fixed it, too" [12:53:22] (03PS1) 10ArielGlenn: move dumps incremental sample files and docs into main dump subdirs [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280200 [12:53:32] RECOVERY - MariaDB disk space on silver is OK: DISK OK [12:54:07] whew [12:54:22] best problems: someone creates it, knows so, fixes it immediately [12:54:37] it should not have created a problem for users, / got filed up, not /a [12:54:50] but I will check the request log [12:55:24] silver is another of those old hosts that we created with a 7GB / [12:55:36] and we need to update to more modern sizes [12:57:35] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 68.97% of data above the critical threshold [5000000.0] [12:59:26] RECOVERY - DPKG on maps-test2003 is OK: All packages OK [12:59:30] yes we do [13:01:01] (03PS2) 10ArielGlenn: move dumps incremental sample files and docs into main dump subdirs [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280200 [13:02:15] totally my fault, though- I am not going to avoid that [13:02:27] (03PS1) 10Ema: Varnishkafka: use VSL query on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/280202 (https://phabricator.wikimedia.org/T124278) [13:03:09] but as far as I can see, not visible errors [13:03:35] wikitech's mediawiki needs some love, though [13:03:46] RECOVERY - DPKG on maps-test2004 is OK: All packages OK [13:03:56] RECOVERY - DPKG on maps-test2001 is OK: All packages OK [13:08:45] !log unheld nodejs on maps* [13:08:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:14:42] !log restoring (but not provisioning) older backup from labswiki sql [13:14:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:14:56] 6Operations, 10Salt: on labcontrol1001, /var/cache/salt has too many files! - https://phabricator.wikimedia.org/T129224#2157530 (10ArielGlenn) p:5Normal>3High [13:15:13] 6Operations, 10Salt: Many minions fail to connect to salt master since 10:39 - https://phabricator.wikimedia.org/T129841#2157531 (10ArielGlenn) p:5Normal>3High [13:15:37] (03PS1) 10Gehel: Allow browser to cache /portal static files. [puppet] - 10https://gerrit.wikimedia.org/r/280204 (https://phabricator.wikimedia.org/T126280) [13:15:59] 6Operations: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2157535 (10MoritzMuehlenhoff) alsafi needed to be rebooted today and several of the aggregators failed to start (see "systemctl list-units | grep failed") [13:18:12] (03CR) 10Volans: [C: 032] "I've chat with Jcrespo on IRC, merging and testing it on the main repo with the puppet compiler." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/279694 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [13:18:24] +1 [13:21:00] (03PS2) 10Volans: [WIP] DB: Expose Puppet SSL certs and generate CA cert [puppet] - 10https://gerrit.wikimedia.org/r/279596 (https://phabricator.wikimedia.org/T111654) [13:21:46] 6Operations, 10Dumps-Generation: determine hardware needs for dumps in eqiad and codfw - https://phabricator.wikimedia.org/T118154#2157553 (10ArielGlenn) 5Resolved>3Open a:5RobH>3ArielGlenn [13:24:36] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [13:25:49] (03PS2) 10Muehlenhoff: Add ferm rules for saltmaster [puppet] - 10https://gerrit.wikimedia.org/r/276419 [13:26:15] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add ferm rules for saltmaster [puppet] - 10https://gerrit.wikimedia.org/r/276419 (owner: 10Muehlenhoff) [13:27:02] (03CR) 10ArielGlenn: [C: 032] move dumps incremental sample files and docs into main dump subdirs [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280200 (owner: 10ArielGlenn) [13:28:11] (03CR) 10Elukey: [C: 031] Varnishkafka: use VSL query on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/280202 (https://phabricator.wikimedia.org/T124278) (owner: 10Ema) [13:29:55] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [13:32:41] (03PS1) 10Muehlenhoff: Enable base::firewall on additional maps slaves [puppet] - 10https://gerrit.wikimedia.org/r/280208 [13:32:43] (03PS1) 10Muehlenhoff: Enable base::firewall on maps master [puppet] - 10https://gerrit.wikimedia.org/r/280209 [13:33:35] (03PS1) 10ArielGlenn: flake8 for incremental dump scripts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280210 [13:33:40] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.132 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [13:34:09] (03CR) 10DCausse: "safe to merge, hive table has been updated and refinery-camus is already deployed with the new schema." [puppet] - 10https://gerrit.wikimedia.org/r/280167 (owner: 10DCausse) [13:34:19] (03PS2) 10ArielGlenn: flake8 for incremental dump scripts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280210 [13:35:16] (03CR) 10ArielGlenn: [C: 032 V: 032] flake8 for incremental dump scripts [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280210 (owner: 10ArielGlenn) [13:35:18] (03PS2) 10Ema: Varnishkafka: use VSL query on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/280202 (https://phabricator.wikimedia.org/T124278) [13:35:27] (03CR) 10Ema: [C: 032 V: 032] Varnishkafka: use VSL query on Varnish 4 [puppet] - 10https://gerrit.wikimedia.org/r/280202 (https://phabricator.wikimedia.org/T124278) (owner: 10Ema) [13:37:51] (03PS1) 10ArielGlenn: move incremental dump scripts into main dumps directory [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280211 [13:38:26] (03CR) 10jenkins-bot: [V: 04-1] move incremental dump scripts into main dumps directory [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280211 (owner: 10ArielGlenn) [13:40:42] (03PS2) 10ArielGlenn: move incremental dump scripts into main dumps directory [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280211 [13:42:38] (03CR) 10ArielGlenn: [C: 032] move incremental dump scripts into main dumps directory [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280211 (owner: 10ArielGlenn) [13:45:19] 6Operations, 10hardware-requests, 5codfw-rollout, 3codfw-rollout-Jan-Mar-2016: MediaWiki maintenance host for codfw (terbium's equivalent) - https://phabricator.wikimedia.org/T126987#2157582 (10RobH) [13:46:08] _joe_: around? [13:47:01] kart_: he's on vacation [13:49:34] (03CR) 10Ottomata: [C: 031] Move Varnishkafka APT pinning to role definition [puppet] - 10https://gerrit.wikimedia.org/r/280163 (https://phabricator.wikimedia.org/T122880) (owner: 10Ema) [13:51:25] (03PS1) 10Muehlenhoff: Enable base::firewall on neodymium [puppet] - 10https://gerrit.wikimedia.org/r/280213 [13:52:54] (03PS4) 10Ema: Move Varnishkafka APT pinning to role definition [puppet] - 10https://gerrit.wikimedia.org/r/280163 (https://phabricator.wikimedia.org/T122880) [13:53:12] (03CR) 10Ema: [C: 032 V: 032] Move Varnishkafka APT pinning to role definition [puppet] - 10https://gerrit.wikimedia.org/r/280163 (https://phabricator.wikimedia.org/T122880) (owner: 10Ema) [13:53:26] !log repooled maps-test2002 (nodejs being put on hold prevented generation of ferm rules provided by service::node, this has been fixed) [13:53:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:56:58] moritzm: okay! [14:24:59] (03CR) 10Ottomata: [C: 031] "Don't have a lot of context but looks good to me!" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/276439 (https://phabricator.wikimedia.org/T124278) (owner: 10Elukey) [14:29:23] (03PS2) 10Ottomata: Bump CirrusSearchRequestSet rev to latest [puppet] - 10https://gerrit.wikimedia.org/r/280167 (owner: 10DCausse) [14:29:31] (03CR) 10Ottomata: [C: 032 V: 032] Bump CirrusSearchRequestSet rev to latest [puppet] - 10https://gerrit.wikimedia.org/r/280167 (owner: 10DCausse) [14:29:45] (03CR) 10Ottomata: "Ok, thanks for checking that!" [puppet] - 10https://gerrit.wikimedia.org/r/280167 (owner: 10DCausse) [14:34:15] (03PS1) 10Dereckson: Use extension registration for ImageMap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280223 (https://phabricator.wikimedia.org/T119117) [14:34:37] 6Operations, 6Analytics-Kanban, 13Patch-For-Review: nf_conntrack warnings for kafka hosts - https://phabricator.wikimedia.org/T131028#2157677 (10elukey) ``` elukey@neodymium:~$ sudo salt kafka10*.eqiad.wmnet cmd.run 'sysctl net.netfilter.nf_conntrack_count' kafka1012.eqiad.wmnet: net.netfilter.nf_conntra... [14:36:26] 6Operations, 6Analytics-Kanban, 13Patch-For-Review: nf_conntrack warnings for kafka hosts - https://phabricator.wikimedia.org/T131028#2157678 (10Ottomata) Yeah, if it went back down, then it isn't related to the ApiAction logging in MW. That is still happening. [14:36:56] (03PS1) 10Dereckson: Use extension registration for SyntaxHighlight_GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280224 (https://phabricator.wikimedia.org/T119117) [14:38:03] (03PS2) 10Dereckson: Use extension registration for ImageMap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280223 (https://phabricator.wikimedia.org/T119117) [14:39:36] (03PS1) 10Dereckson: Use extension registration for DoubleWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280225 (https://phabricator.wikimedia.org/T119117) [14:43:12] (03PS1) 10Dereckson: Use extension registration for Poem [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280226 (https://phabricator.wikimedia.org/T119117) [14:44:03] (03PS1) 10Dereckson: Use extension registration for UnicodeConverter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280227 (https://phabricator.wikimedia.org/T119117) [14:46:04] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [14:46:40] my restore failed [14:48:19] no, it didn't [14:48:29] strange, log said error [14:49:08] ah, it is an empty file [14:53:06] (03PS2) 10Muehlenhoff: Enable base::firewall on additional maps slaves [puppet] - 10https://gerrit.wikimedia.org/r/280208 [14:53:31] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable base::firewall on additional maps slaves [puppet] - 10https://gerrit.wikimedia.org/r/280208 (owner: 10Muehlenhoff) [14:54:09] !log depooled maps-test2003 (to apply ferm) [14:54:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:58:35] RECOVERY - puppet last run on maps-test2003 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [14:58:49] 6Operations, 10media-storage: Unable to delete, restore/undelete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2157724 (10fgiunchedi) leaving this open until we have a more permanent solution, codfw... [14:58:54] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [14:59:14] 6Operations, 10media-storage: Unable to delete, restore/undelete, move or upload new versions of files on several wikis ("inconsistent state within the internal storage backends") - https://phabricator.wikimedia.org/T128096#2157728 (10fgiunchedi) leaving this open until we have a more permanent solution, codfw... [15:00:05] anomie ostriches thcipriani marktraceur Krenair: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160329T1500). Please do the needful. [15:00:05] Dereckson: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:08] Hi. [15:00:47] Dereckson: hello. I can SWAT today. [15:01:02] those extension registration patches could be ... fun [15:01:04] 6Operations, 10ops-codfw: ms-be2008.codfw.wmnet: slot=1 dev=sdl failed - https://phabricator.wikimedia.org/T131147#2157744 (10Papaul) @fgiunchedi there is an open task to purchase some 2TB disks so waiting on that T130376 [15:01:23] 6Operations, 10ops-codfw: ms-be2008.codfw.wmnet: slot=1 dev=sdl failed - https://phabricator.wikimedia.org/T131147#2157747 (10Papaul) p:5Triage>3Normal [15:01:47] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279897 (https://phabricator.wikimedia.org/T131033) (owner: 10Dereckson) [15:02:14] !log repooled maps-test2003 and depooled maps-test2004 (to apply ferm) [15:02:15] That's why legotkm and me think it's less risky to do them one by one instead of a batch transition, so we can deploy and test one per one too. [15:02:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:03:00] (03Merged) 10jenkins-bot: HD logo for da.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279897 (https://phabricator.wikimedia.org/T131033) (owner: 10Dereckson) [15:03:05] There is no $wmg → $wg migration for this lot, so it should be without specific config surprise. [15:04:13] RECOVERY - puppet last run on maps-test2004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [15:04:26] right, it does look safer than other extension registration related config changes I've seen [15:06:08] !log repooled maps-test2004 [15:06:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:07:08] 6Operations, 10ops-codfw: ms-be2008.codfw.wmnet: slot=1 dev=sdl failed - https://phabricator.wikimedia.org/T131147#2157752 (10fgiunchedi) [15:07:32] !log thcipriani@tin Synchronized static/images/project-logos: SWAT: HD logo for da.wikipedia [[gerrit:279897]] (duration: 00m 38s) [15:07:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:08:18] 6Operations, 10EventBus, 6Services, 15User-mobrovac, 7service-deployment-requests: New Service Request - Change Propagation - https://phabricator.wikimedia.org/T128463#2157755 (10mobrovac) a:5akosiaris>3mobrovac [15:08:22] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: HD logo for da.wikipedia [[gerrit:279897]] (duration: 00m 27s) [15:08:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:08:26] ^ Dereckson check please [15:08:33] Works. [15:09:20] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280057 (https://phabricator.wikimedia.org/T131027) (owner: 10Dereckson) [15:10:03] (03Merged) 10jenkins-bot: Logo for ast.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280057 (https://phabricator.wikimedia.org/T131027) (owner: 10Dereckson) [15:12:00] !log thcipriani@tin Synchronized static/images/project-logos/astwiktionary.png: SWAT: Logo for ast.wiktionary [[gerrit:280057]] (duration: 00m 27s) [15:12:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:12:46] Not purged. [15:14:33] Dereckson: should be now, check please [15:14:52] 6Operations, 13Patch-For-Review, 7Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2157764 (10fgiunchedi) [15:14:53] Works. Strange, it seems it's okay on https://ast.wiktionary.org/static/images/project-logos/astwiktionary.png, but not ok on https://ast.wiktionary.org/w/static/images/project-logos/astwiktionary.png [15:17:52] (Feedback from da.wiki: "It's beautiful, thank you very much! Greetings") [15:18:08] :D [15:19:18] (03PS1) 10ArielGlenn: fixups for incremental dumps after the previous pylints and etc [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280232 [15:20:14] hmm, dunno why the purge didn't get w/static...tried purging w/static explicitly to no avail :\ [15:20:42] not a blocker, as the logo is picked from /static [15:21:48] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280223 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:22:30] (03Merged) 10jenkins-bot: Use extension registration for ImageMap [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280223 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:22:35] 6Operations, 10RESTBase-Cassandra: restbase1007 not assembling raid after reboot - https://phabricator.wikimedia.org/T130930#2157776 (10fgiunchedi) p:5Triage>3Normal a:3fgiunchedi [15:23:25] (03PS1) 10Volans: Resolve conflict on /etc/mysql/ssl directory [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/280234 (https://phabricator.wikimedia.org/T111654) [15:24:34] (03PS1) 10Elukey: Add TcpConnStates diamond collector to the kafka broker role. [puppet] - 10https://gerrit.wikimedia.org/r/280235 [15:26:25] !log thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for ImageMap [[gerrit:280223]] (duration: 00m 31s) [15:26:27] ^ Dereckson check please [15:26:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:26:40] (03CR) 10ArielGlenn: [C: 032] fixups for incremental dumps after the previous pylints and etc [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280232 (owner: 10ArielGlenn) [15:27:12] ImageMap still working. [15:27:14] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280224 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:27:52] (03Merged) 10jenkins-bot: Use extension registration for SyntaxHighlight_GeSHi [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280224 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:29:39] !log thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for SyntaxHighlight_GeSHi [[gerrit:280224]] (duration: 00m 28s) [15:29:42] ^ Dereckson check please [15:29:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:30:00] GeShi still working. [15:30:46] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280225 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:31:23] (03Merged) 10jenkins-bot: Use extension registration for DoubleWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280225 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:33:04] !log thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for DoubleWiki [[gerrit:280225]] (duration: 00m 28s) [15:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:33:10] ^ Dereckson check please [15:33:13] Nope. [15:33:50] kart_: yup. git tag -d v1.0 ; git push origin refs/tags/v1.0 [15:33:55] or so the docs say [15:33:57] Dereckson: ? [15:34:02] Erm [15:34:03] fr.wikisource broken [15:34:17] Were you doing mantianence today folks? [15:34:23] reverting. [15:34:38] Getting a lot of content encoding errors (en.wikisource) [15:35:08] ShakespeareFan00: yeah. it's being reverted [15:35:21] What was the planned changes? [15:35:31] Is rhere a log enntry somewhere? [15:35:41] ShakespeareFan00: to switch DoubleWiki to the new extension registration system [15:35:47] ShakespeareFan00: apparently https://gerrit.wikimedia.org/r/280225 . see above [15:36:00] !log thcipriani@tin Synchronized wmf-config: SWAT: Revert Use extension registration for DoubleWiki (duration: 00m 27s) [15:36:01] (03PS3) 10Ppchelko: Bump s-maxage for purged endpoints. [puppet] - 10https://gerrit.wikimedia.org/r/280091 [15:36:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:36:07] ^ Should be revertted now [15:36:09] PROBLEM - puppet last run on restbase2004 is CRITICAL: CRITICAL: Puppet last ran 3 days ago [15:36:18] Works again. [15:36:43] Thanks [15:37:15] (03PS8) 10Andrew Bogott: labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 (owner: 10Alex Monk) [15:37:25] I've opened https://phabricator.wikimedia.org/T131159 to fix the extension. [15:37:44] MatmaRex: could you add a stacktrace from Logstash? [15:37:51] (03PS1) 10Thcipriani: Revert "Use extension registration for DoubleWiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280238 [15:38:24] (03CR) 10jenkins-bot: [V: 04-1] labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 (owner: 10Alex Monk) [15:38:37] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280238 (owner: 10Thcipriani) [15:38:58] (03PS1) 10ArielGlenn: remove dead media rsync script from puppet/files [puppet] - 10https://gerrit.wikimedia.org/r/280239 [15:39:10] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [15:39:10] Dereckson: uhh, maybe. [15:39:19] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [1000.0] [15:39:29] PROBLEM - Eqiad HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [15:39:45] Dereckson: what was the error? there doesn't seem to be anything interesting, or i don't know where to look [15:39:46] (03Merged) 10jenkins-bot: Revert "Use extension registration for DoubleWiki" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280238 (owner: 10Thcipriani) [15:39:48] (03CR) 10Andrew Bogott: "Puppet compiler says:" [puppet] - 10https://gerrit.wikimedia.org/r/268921 (owner: 10Alex Monk) [15:39:50] (03CR) 10Ottomata: "Cool, let's add this to the main::broker class too" [puppet] - 10https://gerrit.wikimedia.org/r/280235 (owner: 10Elukey) [15:39:55] (03CR) 10Andrew Bogott: [C: 04-1] labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 (owner: 10Alex Monk) [15:39:59] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [15:40:35] MatmaRex: a general 50x The server experiences difficulties [15:41:09] (03CR) 10Gehel: [C: 031] "LGTM. I love when changes remove m,ore code than they add..." [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/280234 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [15:41:39] (03CR) 10ArielGlenn: [C: 032] remove dead media rsync script from puppet/files [puppet] - 10https://gerrit.wikimedia.org/r/280239 (owner: 10ArielGlenn) [15:42:17] thcipriani: to ease the reverts we should for next migration rebase against master before merge, so it's --ff, and we can git revert instead of git revert -m 2 oh no it were the 1 [15:42:23] Dereckson: there's nothing in logstash, or i don't know how to use it to find it, sorry. [15:43:02] ^ that's what I'm seeing, too. Nothing I could see blew up in logstash :\ [15:43:18] Thanky you for looking. [15:43:45] (03PS2) 10Thcipriani: Use extension registration for Poem [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280226 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:43:56] (03CR) 10Jcrespo: "require?" (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/280234 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [15:44:41] Dereckson: ready to continue? [15:44:44] Yes. [15:44:49] (03PS9) 10Alex Monk: labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 [15:45:09] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280226 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:45:18] (03PS1) 10ArielGlenn: move rsyncmedia python script to unused dir [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280240 [15:45:44] (03Merged) 10jenkins-bot: Use extension registration for Poem [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280226 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:45:47] (03PS2) 10Elukey: Add TcpConnStates diamond collector to the kafka broker role. [puppet] - 10https://gerrit.wikimedia.org/r/280235 [15:46:08] (03CR) 10jenkins-bot: [V: 04-1] labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 (owner: 10Alex Monk) [15:46:10] (03PS3) 10Ottomata: Add TcpConnStates diamond collector to the kafka broker role. [puppet] - 10https://gerrit.wikimedia.org/r/280235 (owner: 10Elukey) [15:46:39] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:46:40] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:46:49] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:47:10] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [15:47:40] (03PS10) 10Alex Monk: labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 [15:48:00] !log thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for Poem [[gerrit:280226]] (duration: 00m 28s) [15:48:02] ^ Dereckson check please [15:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:48:09] (03CR) 10Volans: Resolve conflict on /etc/mysql/ssl directory (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/280234 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [15:48:13] thcipriani: Poem still working [15:48:24] (03CR) 10ArielGlenn: [C: 032 V: 032] move rsyncmedia python script to unused dir [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280240 (owner: 10ArielGlenn) [15:48:33] (03CR) 10jenkins-bot: [V: 04-1] labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 (owner: 10Alex Monk) [15:49:01] (03PS1) 10Dereckson: Use extension registration for DoubleWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280242 (https://phabricator.wikimedia.org/T119117) [15:49:13] (03CR) 10Ottomata: [C: 032] Add TcpConnStates diamond collector to the kafka broker role. [puppet] - 10https://gerrit.wikimedia.org/r/280235 (owner: 10Elukey) [15:49:18] (03PS2) 10Thcipriani: Use extension registration for UnicodeConverter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280227 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:49:30] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280227 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:50:13] (03Merged) 10jenkins-bot: Use extension registration for UnicodeConverter [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280227 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [15:51:11] (03PS11) 10Alex Monk: labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 [15:52:12] !log thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for UnicodeConverter [[gerrit:280227]] (duration: 00m 28s) [15:52:14] ^ Dereckson check please [15:52:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:52:33] UnicodeConverter still working. [15:52:42] MatmaRex, thcipriani > for DoubleWiki, I made a mistake at https://gerrit.wikimedia.org/r/#/c/280225/1/wmf-config/CommonSettings.php [15:53:50] That's why there isn't anything in logstash: it weren't an exception thrown by the extension, or an error after all is loaded [15:54:04] arg. I should have caught that. [15:54:23] Change 280242 for take 2, with the correct load instruction: https://gerrit.wikimedia.org/r/#/c/280242/1/wmf-config/CommonSettings.php [15:54:26] (03PS12) 10Alex Monk: labs dnsrecursor IP aliasing: work on all projects, not just some arbitrary ones [puppet] - 10https://gerrit.wikimedia.org/r/268921 [15:55:25] Dereckson: we could get that one out in this window, there's still time if you're up for it. [15:55:48] Okay, I'm adding it to the page. [15:55:53] thanks [15:58:35] !log thcipriani@tin Synchronized wmf-config: SWAT: Use extension registration for DoubleWiki [[gerrit:280242]] (duration: 00m 28s) [15:58:36] ^ Dereckson check please [15:59:35] Nor broken. [15:59:35] morebost seems slow? [15:59:35] *morebots [15:59:35] And still working. Works. [15:59:37] Dereckson: great. thanks! [15:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:59:46] Thanks for the deploy. [16:04:40] jouncebot: next [16:04:40] In 0 hour(s) and 55 minute(s): Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160329T1700) [16:04:57] hmm, but no puppet swat notification [16:06:43] hrmm, odd [16:06:43] _joe_ gehel the time has come (a little bit ago for puppet SWAT) https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160329T1600 :) [16:06:43] I think joe is on vacation, so it's gehel ? [16:06:43] yeah jouncebot left when it had to announce it [16:06:43] yup, _joe_'s out [16:06:43] damn, I thought I had last week on SWAT, not this week... [16:06:45] anyway, here I am... [16:07:13] greg-g: hmm. if i wanted someone to run a few maintenance scripts, how would i schedule it? [16:07:17] heh, thcipriani, only scap changes for today's puppetswat [16:07:21] that's gonna be fun [16:07:24] gehel: I'm the only one with patches. One is beta only and already on cherry-picked to beta. The other should be a no-op for anything using scap. [16:07:33] no, i'm here too! [16:07:36] MatmaRex: uno momento, on a call [16:07:40] no hurry [16:07:45] mobrovac: oh, didn't see :P [16:07:51] I see another change by mobrovac ... [16:08:06] yarp, didn't refresh the page after morning swat :) [16:08:07] scap too [16:08:09] thcipriani: give me a few minutes to have a look into this (I was not prepared... [16:08:11] s/on call/on a telephone call/ greg-g :P [16:10:04] gehel: i can confirm https://gerrit.wikimedia.org/r/#/c/275905/1 is a no-op, there isn't any repo that doesn't have "/" in its def (which is what this PS affects) [16:12:11] (03PS2) 10Gehel: Use repo_path instead of repo for deploy-local [puppet] - 10https://gerrit.wikimedia.org/r/275905 (owner: 10Thcipriani) [16:12:47] (defined in hieradata/common/role/deployment.yaml) [16:12:56] (03PS1) 10ArielGlenn: move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 [16:13:00] (03PS3) 10Dereckson: Use repo_path instead of repo for deploy-local [puppet] - 10https://gerrit.wikimedia.org/r/275905 (owner: 10Thcipriani) [16:13:15] thcipriani: looks like deplyoing 275905 is just about merging the change, no explicit action anywhere. Correct? [16:13:22] (03PS1) 10DCausse: CirrusSearch: Add new rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280245 (https://phabricator.wikimedia.org/T127896) [16:13:48] Dereckson: ^ ? [16:14:00] (hehe, very informative) [16:14:08] (03CR) 10jenkins-bot: [V: 04-1] CirrusSearch: Add new rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280245 (https://phabricator.wikimedia.org/T127896) (owner: 10DCausse) [16:14:10] (03CR) 10jenkins-bot: [V: 04-1] move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 (owner: 10ArielGlenn) [16:14:13] (PS3: fixed a typoin commit message) [16:14:16] gehel: I believe that's is correct. [16:14:20] Dereckson: thanks :) [16:14:34] thcipriani: how do I test it after deployment? [16:15:01] 6Operations, 10RESTBase, 13Patch-For-Review: install restbase1010-restbase1015 - https://phabricator.wikimedia.org/T128107#2157857 (10Eevans) >>! In T128107#2145299, @Cmjohnson wrote: > @fgiunchedi restbase1014 is ready for you. I cannot do restbase1015 until both restabse1005/1006 are taken offline. I do n... [16:15:08] (03CR) 10Gehel: [C: 032] Use repo_path instead of repo for deploy-local [puppet] - 10https://gerrit.wikimedia.org/r/275905 (owner: 10Thcipriani) [16:15:15] gehel: note that you also need to restart the puppetmaster for this change to take effect [16:15:49] mobrovac: right! Lemme check... [16:16:01] (03CR) 10Lydia Pintscher: "Thanks for the poke!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280003 (owner: 10Jforrester) [16:16:03] gehel: this would require installing a package on a new server with this provider. No one has used this functionality as yet. [16:16:25] thcipriani: ok, so let's go for the merge and the restart of puppetmasters... [16:16:26] i did thcipriani! just last friday! [16:16:31] :P [16:16:34] (03PS2) 10DCausse: CirrusSearch: Add new rescore profiles [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280245 (https://phabricator.wikimedia.org/T127896) [16:16:53] mobrovac: ok, no one has done it without a '/' in the name yet then :P [16:17:01] :P [16:17:49] !log restarting puppetmaster on palladium [16:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:18:24] (03PS2) 10ArielGlenn: move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 [16:18:41] (03PS3) 10ArielGlenn: move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 [16:20:13] (03CR) 10jenkins-bot: [V: 04-1] move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 (owner: 10ArielGlenn) [16:20:19] 7Blocked-on-Operations, 6Operations, 10RESTBase-Cassandra: expand raid0 in restbase200[1-6] - https://phabricator.wikimedia.org/T127951#2157863 (10Eevans) Are we firm enough in our plans (vis-a-vis T130218) at this point that we could move forward the remainder of these raid0 expansions? [16:20:30] !log upgrading some packages on silver, restarting bacula-fd [16:20:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:21:57] MatmaRex: ok, so... if they could complete in a reasonable timeframe (1-2 hours) then just scheduling on the deploy calendar wiki as normal (avoid overlapping with other things, do it when people are around, etc). If it's a long running thing, let's chat :) [16:23:08] greg-g: https://phabricator.wikimedia.org/T131157#2157861 - shouldn't take more than an hour to run all 47. but i can't actually do it myself (not a deployer), so i'd need a helper [16:23:50] (03PS4) 10ArielGlenn: move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 [16:24:12] (03PS5) 10ArielGlenn: move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 [16:25:56] (03CR) 10ArielGlenn: [C: 032] move scripts for generating upload dir lists from dumps repo into puppet [puppet] - 10https://gerrit.wikimedia.org/r/280244 (owner: 10ArielGlenn) [16:30:05] (03PS1) 10Ladsgroup: Remove use of git clone [puppet] - 10https://gerrit.wikimedia.org/r/280247 [16:30:40] PROBLEM - Varnish HTTP upload-backend - port 3128 on cp4015 is CRITICAL: Connection refused [16:31:41] (03CR) 10jenkins-bot: [V: 04-1] Remove use of git clone [puppet] - 10https://gerrit.wikimedia.org/r/280247 (owner: 10Ladsgroup) [16:31:47] (03CR) 10Ladsgroup: "Firstly, we need a patch in ores-config." [puppet] - 10https://gerrit.wikimedia.org/r/280247 (owner: 10Ladsgroup) [16:32:30] RECOVERY - Varnish HTTP upload-backend - port 3128 on cp4015 is OK: HTTP OK: HTTP/1.1 200 OK - 187 bytes in 0.151 second response time [16:32:41] ACKNOWLEDGEMENT - puppet last run on ms-be2008 is CRITICAL: CRITICAL: Puppet has 1 failures Filippo Giunchedi T131147 [16:33:35] (03PS2) 10Ladsgroup: Remove use of git clone [puppet] - 10https://gerrit.wikimedia.org/r/280247 [16:34:09] !log restarting apache on palladium, strontium and rhodium. Restart should be graceful. In case it is not, puppet errors will happen. [16:34:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:34:17] please all cross your fingers with me... [16:34:29] (03PS2) 10ArielGlenn: pylint and pep8 for runphpscriptlet [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280110 [16:34:31] (03PS1) 10ArielGlenn: move listwikiuploaddirs and runphpscriptlet to unused [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280248 [16:34:52] (03PS1) 10Dzahn: phab/exim: disable rewrite for maint-announce mail [puppet] - 10https://gerrit.wikimedia.org/r/280249 [16:36:00] (03CR) 10ArielGlenn: [C: 032 V: 032] pylint and pep8 for runphpscriptlet [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280110 (owner: 10ArielGlenn) [16:36:15] MatmaRex: gotcha [16:37:07] (03PS2) 10Dzahn: phab/exim: disable rewrite for maint-announce mail [puppet] - 10https://gerrit.wikimedia.org/r/280249 [16:37:08] 6Operations, 10Traffic, 7Varnish: Improve varnish stop for backend instances - https://phabricator.wikimedia.org/T131163#2157924 (10ema) [16:37:11] (03CR) 10Volans: "I've run in addition some unit test locally (nothing commitable yet due to other dependencies/configuration issues with rspec and given th" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/280234 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [16:37:19] (03CR) 10ArielGlenn: [C: 032 V: 032] move listwikiuploaddirs and runphpscriptlet to unused [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280248 (owner: 10ArielGlenn) [16:38:11] (03PS3) 10Dzahn: phab/exim: disable rewrite for maint-announce mail [puppet] - 10https://gerrit.wikimedia.org/r/280249 [16:38:22] MatmaRex: looking at the task/patch, I say Aaron should/can do it :) (if he reviews it) [16:39:10] PROBLEM - puppet last run on cp3037 is CRITICAL: CRITICAL: puppet fail [16:39:12] (03CR) 10DCausse: "This patch is available on suggesty inside /vagrant/settings.d/13-pageviews_rescore.php" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280245 (https://phabricator.wikimedia.org/T127896) (owner: 10DCausse) [16:40:31] PROBLEM - puppet last run on cp2005 is CRITICAL: CRITICAL: Puppet has 1 failures [16:41:26] ^ those puppet failures are most probably me... [16:42:00] PROBLEM - puppet last run on mw1126 is CRITICAL: CRITICAL: Puppet has 1 failures [16:42:00] PROBLEM - puppet last run on mw1237 is CRITICAL: CRITICAL: Puppet has 1 failures [16:42:51] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2157971 (10Gnom1) [16:42:59] PROBLEM - puppet last run on elastic2011 is CRITICAL: CRITICAL: Puppet has 1 failures [16:43:09] PROBLEM - puppet last run on mw2178 is CRITICAL: CRITICAL: Puppet has 1 failures [16:43:09] PROBLEM - puppet last run on mw2059 is CRITICAL: CRITICAL: Puppet has 1 failures [16:43:21] gehel: if you force the run on one of those host it works? [16:43:30] PROBLEM - puppet last run on mw1175 is CRITICAL: CRITICAL: Puppet has 1 failures [16:43:31] PROBLEM - puppet last run on lithium is CRITICAL: CRITICAL: Puppet has 1 failures [16:43:32] to ensure is a transient failure ;) [16:43:40] PROBLEM - puppet last run on labvirt1001 is CRITICAL: CRITICAL: Puppet has 2 failures [16:43:46] volans: checking right now... [16:43:51] PROBLEM - puppet last run on mw2011 is CRITICAL: CRITICAL: Puppet has 1 failures [16:44:21] RECOVERY - puppet last run on cp2005 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [16:44:34] thcipriani: should have traded places with you for puppetswat your two no-ops will be taking the whole puppetswat window :P [16:44:48] mobrovac: :( [16:44:58] thcipriani: yep, sorry for that ... [16:45:19] (03CR) 10Dzahn: [C: 032] phab/exim: disable rewrite for maint-announce mail [puppet] - 10https://gerrit.wikimedia.org/r/280249 (owner: 10Dzahn) [16:45:27] puppet seems to be recovering just fine. I will keep an eye out... [16:45:31] gehel: no problem: sorry the noop required a puppetmaster restart. [16:45:50] at least I'm learning stuff! [16:46:05] yup, so it's not in vain! [16:46:49] mobrovac: something about the best way to learn to swim... [16:47:02] hehehe [16:47:05] exactly [16:48:06] I still can't connect to rhodium. Does anyone knows if it is still running? [16:48:22] (03PS1) 10ArielGlenn: pylint and pep8 for listmediaperproject [dumps] (ariel) - 10https://gerrit.wikimedia.org/r/280254 [16:49:10] (03PS1) 10Dereckson: Wikipedia course, Prague throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280255 (https://phabricator.wikimedia.org/T131152) [16:49:20] RECOVERY - puppet last run on labvirt1001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [16:50:54] (03CR) 10Luke081515: [C: 031] Wikipedia course, Prague throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280255 (https://phabricator.wikimedia.org/T131152) (owner: 10Dereckson) [16:52:12] (03CR) 10Jcrespo: [C: 031] Resolve conflict on /etc/mysql/ssl directory (032 comments) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/280234 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [16:52:14] (03Abandoned) 10Milimetric: [WIP] Add dashiki module and role [puppet] - 10https://gerrit.wikimedia.org/r/237079 (https://phabricator.wikimedia.org/T110351) (owner: 10Milimetric) [16:52:39] thcipriani: lemme check just a bit more before deploying 279392 [16:52:44] (03Abandoned) 10Milimetric: [WIP] Add statistics mount [puppet] - 10https://gerrit.wikimedia.org/r/239577 (https://phabricator.wikimedia.org/T111845) (owner: 10Milimetric) [16:53:10] PROBLEM - DPKG on alsafi is CRITICAL: DPKG CRITICAL dpkg reports broken packages [16:53:55] alsafi is me [16:55:20] PROBLEM - Auth DNS on labs-ns1.wikimedia.org is CRITICAL: CRITICAL - Plugin timed out while executing system call [16:55:48] thcipriani, mobrovac: sorry this did not go as well as planned... Can you reschedule the remaining changes for next Puppet SWAT? [16:56:40] RECOVERY - puppet last run on alsafi is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [16:56:50] RECOVERY - DPKG on alsafi is OK: All packages OK [16:56:59] sure gehel, no worries [16:58:18] mobrovac: thanks! And beer is on me next time we meet! (Or chocolate if you want...) [16:58:25] gehel: yup. will do. thanks for getting out the scap provider change :) [16:58:49] gehel: beer's much much better :P [16:58:58] * mobrovac doesn't like the taste of sugar and sweet things [16:59:12] thcipriani, mobrovac: at least I have read your patches, I should be better prepared for Thursday... [16:59:32] mobrovac: you don't know swiss chocolate... I'm sure I can find one that you like... [17:00:04] yurik gwicke cscott arlolra subbu: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160329T1700). [17:00:09] gehel: i do know it, but still ... not even those >80% ones make me happy [17:02:10] RECOVERY - Auth DNS on labs-ns1.wikimedia.org is OK: DNS OK: 0.065 seconds response time. nagiostest.beta.wmflabs.org returns 208.80.155.135 [17:03:16] no parsoid deploy today [17:05:41] RECOVERY - puppet last run on cp3037 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [17:06:45] (03PS3) 10Ladsgroup: Remove use of git clone [puppet] - 10https://gerrit.wikimedia.org/r/280247 [17:07:39] 10Ops-Access-Reviews: Root on labtest* for Krenair - https://phabricator.wikimedia.org/T131166#2158024 (10Andrew) [17:07:39] 6Operations, 10Ops-Access-Reviews: Root on labtest* for Krenair - https://phabricator.wikimedia.org/T131166#2158036 (10Andrew) [17:07:40] RECOVERY - puppet last run on lithium is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [17:08:11] 6Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews: Root on labtest* for Krenair - https://phabricator.wikimedia.org/T131166#2158052 (10Dzahn) [17:08:21] RECOVERY - puppet last run on mw1175 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:08:25] (03PS5) 1020after4: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 [17:08:29] RECOVERY - puppet last run on mw1237 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:08:29] RECOVERY - puppet last run on mw1126 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:08:30] RECOVERY - puppet last run on mw2011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:08:31] RECOVERY - puppet last run on elastic2011 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:09:11] RECOVERY - puppet last run on mw2178 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:10:10] RECOVERY - puppet last run on mw2059 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:11:03] (03CR) 1020after4: "Addressed mobrovac's comments and moved keyholder stuff to a separate class in anticipation of needing to include it in a standalone deplo" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [17:11:10] PROBLEM - Kafka Broker Replica Max Lag on kafka1013 is CRITICAL: CRITICAL: 65.52% of data above the critical threshold [5000000.0] [17:11:44] (03CR) 1020after4: "I'll test this on beta and +1 when it's verified working" [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [17:12:24] seems that all puppet icinga alerts are now resloved... [17:17:10] PROBLEM - Outgoing network saturation on labstore1003 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [100000000.0] [17:17:53] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2157971 (10Krenair) Which NDA did you sign and how can your signature be verified? [17:30:16] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2158266 (10Gnom1) I signed an NDA called [[ https://meta.wikimedia.org/wiki/Confidentiality_agreement_for_nonpublic_information | "Confidentiality agreement for nonpublic information" ]], see [... [17:32:56] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:33:36] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [17:34:36] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [17:35:08] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [17:36:25] !log starting branch cut for wmf.19 [17:36:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:39:49] (03CR) 10Mobrovac: Assign roles::ores::web, roles::ores::worker to SCB (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/278990 (https://phabricator.wikimedia.org/T124201) (owner: 10Alexandros Kosiaris) [17:42:07] RECOVERY - Outgoing network saturation on labstore1003 is OK: OK: Less than 10.00% above the threshold [75000000.0] [17:45:17] RECOVERY - Kafka Broker Replica Max Lag on kafka1013 is OK: OK: Less than 50.00% above the threshold [1000000.0] [17:53:02] 6Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews: Root on labtest* for Krenair - https://phabricator.wikimedia.org/T131166#2158321 (10Andrew) This was approved just now, I'll write a patch later on today. [17:58:10] Hey, I enabled a role in beta but while it tries to install pakcages in modules/base/manifests/standard_packages.pp It can't install ncdu and dstat: https://wikitech.wikimedia.org/w/index.php?title=Special:NovaInstance&action=consoleoutput&instanceid=39080b03-385b-4aaf-9055-c8c591a35321&project=deployment-prep®ion=eqiad [17:58:55] (and another instance). Am I doing something wrong or it's an expected behavior, the instance is a jessie 8.3 [17:59:53] (03PS1) 10Andrew Bogott: Replace labtestweb-roots with labtest-roots, add Krenair [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) [18:01:14] when I log in to the instance, and try to run the given command in log, it says they are already installed [18:02:54] 6Operations, 10ops-codfw, 10RESTBase-Cassandra: restbase2004.codfw.wmnet: Failed disk/RAID - https://phabricator.wikimedia.org/T130990#2158351 (10Papaul) a:3fgiunchedi Disk replacement complete. [18:04:12] (03PS1) 10Elukey: Add diamond nf_conntrack counter. [puppet] - 10https://gerrit.wikimedia.org/r/280265 (https://phabricator.wikimedia.org/T131150) [18:05:15] (03CR) 10jenkins-bot: [V: 04-1] Add diamond nf_conntrack counter. [puppet] - 10https://gerrit.wikimedia.org/r/280265 (https://phabricator.wikimedia.org/T131150) (owner: 10Elukey) [18:05:18] (03PS2) 10Elukey: Add diamond nf_conntrack counter. [puppet] - 10https://gerrit.wikimedia.org/r/280265 (https://phabricator.wikimedia.org/T131150) [18:05:26] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:05:34] 6Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 13Patch-For-Review: Root on labtest* for Krenair - https://phabricator.wikimedia.org/T131166#2158377 (10Andrew) a:3Andrew [18:06:19] (03CR) 10jenkins-bot: [V: 04-1] Add diamond nf_conntrack counter. [puppet] - 10https://gerrit.wikimedia.org/r/280265 (https://phabricator.wikimedia.org/T131150) (owner: 10Elukey) [18:06:57] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:07:15] 6Operations, 10hardware-requests: Allocate 2 analytics machines to experiment with a jupyterhub notebook service - https://phabricator.wikimedia.org/T130760#2158379 (10yuvipanda) [18:08:13] (03PS3) 10Elukey: Add diamond nf_conntrack counter. [puppet] - 10https://gerrit.wikimedia.org/r/280265 (https://phabricator.wikimedia.org/T131150) [18:08:37] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [18:08:57] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [18:09:11] (03CR) 10jenkins-bot: [V: 04-1] Add diamond nf_conntrack counter. [puppet] - 10https://gerrit.wikimedia.org/r/280265 (https://phabricator.wikimedia.org/T131150) (owner: 10Elukey) [18:12:50] (03PS4) 10Elukey: Add diamond nf_conntrack counter. [puppet] - 10https://gerrit.wikimedia.org/r/280265 (https://phabricator.wikimedia.org/T131150) [18:13:17] PROBLEM - citoid endpoints health on scb2001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:14:07] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:14:26] PROBLEM - citoid endpoints health on scb2002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:14:37] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [18:15:06] RECOVERY - citoid endpoints health on scb2001 is OK: All endpoints are healthy [18:15:47] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [18:16:06] RECOVERY - citoid endpoints health on scb2002 is OK: All endpoints are healthy [18:16:16] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [18:17:02] (03PS4) 10Ladsgroup: Remove use of git clone [puppet] - 10https://gerrit.wikimedia.org/r/280247 [18:20:55] 6Operations, 6Discovery, 10Maps, 10hardware-requests: Maps back end hardware - https://phabricator.wikimedia.org/T131180#2158446 (10EBernhardson) [18:24:11] 6Operations, 10ops-codfw, 10RESTBase-Cassandra: restbase2004.codfw.wmnet: Failed disk/RAID - https://phabricator.wikimedia.org/T130990#2158487 (10Eevans) [18:24:43] 6Operations, 10hardware-requests: Allocate 2 analytics machines to experiment with a jupyterhub notebook service - https://phabricator.wikimedia.org/T130760#2158505 (10yuvipanda) Awesome, thanks @mark! @robh I'd like to rename these to notebook1001 / 1002, and keep them in the analytics vlan (where they alrea... [18:24:48] (03CR) 10Rush: [C: 031] "approved in ops meeting thanks" [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) (owner: 10Andrew Bogott) [18:25:02] 6Operations, 10ops-codfw, 10RESTBase-Cassandra: restbase2004.codfw.wmnet: Failed disk/RAID - https://phabricator.wikimedia.org/T130990#2152993 (10Eevans) [18:25:21] 6Operations, 6Discovery, 10Maps, 10hardware-requests: Maps back end hardware - https://phabricator.wikimedia.org/T131180#2158431 (10Yurik) [18:26:50] 6Operations, 10hardware-requests, 3Discovery-Search-Sprint: Relevance forge hardware - https://phabricator.wikimedia.org/T131184#2158516 (10EBernhardson) [18:28:54] (03PS1) 10BBlack: varnish: use KillMode=process in systemd unit file [puppet] - 10https://gerrit.wikimedia.org/r/280268 [18:29:38] (03CR) 10BBlack: [C: 032 V: 032] varnish: use KillMode=process in systemd unit file [puppet] - 10https://gerrit.wikimedia.org/r/280268 (owner: 10BBlack) [18:30:44] 6Operations, 10ops-eqiad, 10RESTBase-Cassandra: restbase1007.eqiad.wmnet CPU temperature? - https://phabricator.wikimedia.org/T130370#2158529 (10Eevans) >>! In T130370#2151132, @Cmjohnson wrote: > Re-applied thermal paste. Let's wait the weekend before closing the task. FWIW, I don't see any more temperat... [18:31:07] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-greg: Requesting access to production for SWAT deploy for dereckson - https://phabricator.wikimedia.org/T129365#2158530 (10Dzahn) Has been approved (with normal 3 day waiting period) [18:31:31] 6Operations, 10ops-eqiad, 10RESTBase-Cassandra: restbase1007.eqiad.wmnet CPU temperature? - https://phabricator.wikimedia.org/T130370#2158531 (10Cmjohnson) 5Open>3Resolved a:3Cmjohnson outstanding! Resolving this task please re-open if it happens again. [18:31:35] (03CR) 10Volans: [C: 032] Resolve conflict on /etc/mysql/ssl directory [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/280234 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [18:33:12] 6Operations, 10hardware-requests: Allocate 2 analytics machines to experiment with a jupyterhub notebook service - https://phabricator.wikimedia.org/T130760#2158548 (10RobH) Ok, you can make that happen in a few steps: * Update the page https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions * c... [18:39:56] !log alsafi deleted service template file remnant [18:40:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:41:53] (03PS3) 10Volans: [WIP] DB: Expose Puppet SSL certs and generate CA cert [puppet] - 10https://gerrit.wikimedia.org/r/279596 (https://phabricator.wikimedia.org/T111654) [18:42:36] (03PS2) 10Andrew Bogott: Replace labtestweb-roots with labtest-roots, add Krenair [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) [18:43:45] renaming a group, while the GID stays the same [18:43:47] might cause issues [18:44:03] just a feeling it might break on puppet run [18:48:16] (03PS3) 10Andrew Bogott: Replace labtestweb-roots with labtest-roots, add Krenair [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) [18:50:10] (03CR) 10Dzahn: "could you add the new group to hieradata/hosts/rutherfordium.yaml ? that is the people.wm.org server and supposed to have all admin groups" [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) (owner: 10Andrew Bogott) [18:51:38] !log performing schema change on testwiki T130692 [18:51:38] T130692: Add new indexes from eec016ece6d2b30addcdf3d3efcc2ba59b10e858 to production databases - https://phabricator.wikimedia.org/T130692 [18:51:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:55:25] (03CR) 10Jdlrobson: "I've been asked by Jon Katz to get this setup on reading web staging so he can preview it before we enable it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/279934 (https://phabricator.wikimedia.org/T113243) (owner: 10Florianschmidtwelzow) [19:00:04] thcipriani: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160329T1900). Please do the needful. [19:00:23] okie doke. [19:01:52] thcipriani: just closed that one blocker [19:02:06] greg-g: saw that, I was just replying, thanks :) [19:02:20] :) [19:04:24] (03PS1) 10Thcipriani: Group0 to 1.27.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280274 [19:06:27] !log thcipriani@tin Started scap: testwiki to php-1.27.0-wmf.19 and rebuild l10ncache [19:06:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:17:59] 6Operations, 10DNS, 10Fundraising-Backlog, 10Traffic, 10fundraising-tech-ops: Updating DNS records for Major Gifts subdomain (benefactors.wikimedia.org) - https://phabricator.wikimedia.org/T130937#2158625 (10DStrine) a:3Jgreen [19:20:01] (03PS6) 1020after4: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 [19:20:31] (03PS7) 1020after4: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 [19:32:34] (03PS8) 1020after4: Hieraize keyholder::agent configuration [puppet] - 10https://gerrit.wikimedia.org/r/279198 [19:42:27] (03PS1) 10Cmjohnson: Reclaiming cp1056/57 and cp1069/70 -Removed site.pp standard entry and removed from dhcpd file. [puppet] - 10https://gerrit.wikimedia.org/r/280277 [19:45:42] (03CR) 10Cmjohnson: [C: 032] Reclaiming cp1056/57 and cp1069/70 -Removed site.pp standard entry and removed from dhcpd file. [puppet] - 10https://gerrit.wikimedia.org/r/280277 (owner: 10Cmjohnson) [19:47:40] !Log disabling puppet on hosts cp1056/57 and cp1069/70. All are being reclaimed [19:47:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:50:04] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2158748 (10Dzahn) a:5Dzahn>3None [19:54:00] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2158755 (10Dzahn) Sorry,i'm not sure what the right process and NDA is for the list. @Mark @Robh as fellow list admins, do you have advice? [19:55:28] (03PS3) 10Rush: create-dbusers notify service on script change [puppet] - 10https://gerrit.wikimedia.org/r/280001 [19:56:36] PROBLEM - HHVM rendering on mw1128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:56:46] PROBLEM - puppet last run on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:57:31] 6Operations: Port Ganglia aggregator setup to systemd - https://phabricator.wikimedia.org/T124197#2158760 (10Dzahn) alsafi wasn't supposed to have the aggregator anymore. that class was applied on it in the past for testing but then removed. this issue popped up because a file in /etc/systemd/system was not rem... [19:57:35] PROBLEM - DPKG on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:57:44] (03CR) 10Rush: [V: 032] create-dbusers notify service on script change [puppet] - 10https://gerrit.wikimedia.org/r/280001 (owner: 10Rush) [19:57:45] PROBLEM - SSH on mw1128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:58:06] PROBLEM - configured eth on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:58:27] PROBLEM - dhclient process on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:58:46] PROBLEM - nutcracker port on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:58:56] PROBLEM - salt-minion processes on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [19:58:56] PROBLEM - Apache HTTP on mw1128 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:59:05] PROBLEM - nutcracker process on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:00:15] RECOVERY - dhclient process on mw1128 is OK: PROCS OK: 0 processes with command name dhclient [20:00:45] RECOVERY - salt-minion processes on mw1128 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [20:01:06] PROBLEM - Check size of conntrack table on mw1128 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [20:01:16] RECOVERY - SSH on mw1128 is OK: SSH OK - OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.6 (protocol 2.0) [20:01:36] RECOVERY - configured eth on mw1128 is OK: OK - interfaces up [20:01:55] PROBLEM - Apache HTTP on mw1139 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50395 bytes in 0.303 second response time [20:02:05] RECOVERY - puppet last run on mw1128 is OK: OK: Puppet is currently enabled, last run 14 minutes ago with 0 failures [20:02:17] PROBLEM - HHVM rendering on mw1139 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50392 bytes in 0.006 second response time [20:02:25] RECOVERY - nutcracker port on mw1128 is OK: TCP OK - 0.000 second response time on port 11212 [20:02:36] RECOVERY - nutcracker process on mw1128 is OK: PROCS OK: 1 process with UID = 108 (nutcracker), command name nutcracker [20:02:46] RECOVERY - DPKG on mw1128 is OK: All packages OK [20:02:47] RECOVERY - Check size of conntrack table on mw1128 is OK: OK: nf_conntrack is 0 % full [20:02:54] !log thcipriani@tin Finished scap: testwiki to php-1.27.0-wmf.19 and rebuild l10ncache (duration: 56m 27s) [20:02:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:06:54] 6Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#2045104 (10Nemo_bis) problemsdonating@ was recently mentioned as being active and recommended https://meta.wikimedia.org/w/index.php?title=Fundraising&diff=15151567&oldid=15148573 [20:07:57] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2158779 (10RobH) I'm not entirely certain, as I was under the impression the ops list was WMF only (with Domas being the only exception). However, that seems incorrect, as there are a number o... [20:08:02] (03CR) 10Thcipriani: [C: 032] Group0 to 1.27.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280274 (owner: 10Thcipriani) [20:08:31] (03Merged) 10jenkins-bot: Group0 to 1.27.0-wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280274 (owner: 10Thcipriani) [20:09:58] 6Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#2158803 (10CCogdill_WMF) Hey @Dzahn, sorry I missed the ping on this. Nemo is right, problemsdonating@ is even referenced on our donation form on donate.wikimedia.org. This is a high tra... [20:10:49] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.27.0-wmf.19 [20:10:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:13:19] (03PS1) 10Cmjohnson: Removing dns entries for cp1056 and cp1057. Ensureing asset #'s are there [dns] - 10https://gerrit.wikimedia.org/r/280281 [20:14:52] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2158816 (10Krenair) The ops list has been open to non-WMF-staff with NDAs for years. [20:15:16] (03CR) 10Cmjohnson: [C: 032] Removing dns entries for cp1056 and cp1057. Ensureing asset #'s are there [dns] - 10https://gerrit.wikimedia.org/r/280281 (owner: 10Cmjohnson) [20:24:46] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/). [20:27:07] 6Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#2158821 (10Dzahn) Hi! thank you. In that case i would just like to move them over to be controlled by OIT, like we did with other group aliases. It's not even about problemsdonating@,... [20:29:15] 6Operations, 10Mail, 10fundraising-tech-ops: (re)move problemsdonating aliases - https://phabricator.wikimedia.org/T127488#2158824 (10Dzahn) @bbogaert could we add "problems.donating@ , problemdonating@ and problem.donating@" as variants / aliases of the existing problemsdonating@ address? [20:29:43] !log thcipriani@tin Purged l10n cache for 1.27.0-wmf.15 [20:29:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:33:49] (03PS3) 10Dzahn: admin: create shell account for dereckson [puppet] - 10https://gerrit.wikimedia.org/r/279565 (https://phabricator.wikimedia.org/T129365) [20:34:33] (03CR) 10Dzahn: [C: 032] admin: create shell account for dereckson [puppet] - 10https://gerrit.wikimedia.org/r/279565 (https://phabricator.wikimedia.org/T129365) (owner: 10Dzahn) [20:35:09] (03PS1) 10BBlack: Code cleanup [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/280294 [20:35:11] (03PS1) 10BBlack: Remove format.key feature [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/280295 [20:40:20] thcipriani: would you mind to deploy https://gerrit.wikimedia.org/r/#/c/280255 once you've finished the MediaWiki train please? This is a throttle rule for an event planned tomorrow, with the IP only available after the morning SWAT. [20:42:22] Dereckson: sure. [20:42:35] (03CR) 10Thcipriani: [C: 032] Wikipedia course, Prague throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280255 (https://phabricator.wikimedia.org/T131152) (owner: 10Dereckson) [20:43:27] thanks. [20:44:25] (03PS2) 10Dzahn: admin: add dereckson to deployers [puppet] - 10https://gerrit.wikimedia.org/r/279566 (https://phabricator.wikimedia.org/T129365) [20:44:26] Dereckson: or we could solve this another way [20:44:34] (03CR) 10Dzahn: [C: 032] admin: add dereckson to deployers [puppet] - 10https://gerrit.wikimedia.org/r/279566 (https://phabricator.wikimedia.org/T129365) (owner: 10Dzahn) [20:44:52] (03CR) 10Ottomata: [C: 031] "Daw, it's such a nice feature though! :/" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/280295 (owner: 10BBlack) [20:45:27] (03Merged) 10jenkins-bot: Wikipedia course, Prague throttle rule [mediawiki-config] - 10https://gerrit.wikimedia.org/r/280255 (https://phabricator.wikimedia.org/T131152) (owner: 10Dereckson) [20:46:01] 6Operations, 10Traffic, 7Varnish: Improve varnish stop for backend instances - https://phabricator.wikimedia.org/T131163#2158870 (10BBlack) 5Open>3Resolved a:3BBlack https://gerrit.wikimedia.org/r/#/c/280268/ [20:48:16] !log welcome Dereckson to Mediawiki deployers [20:48:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:48:22] !log thcipriani@tin Synchronized wmf-config/throttle.php: Wikipedia course, Prague throttle rule [[gerrit:280255]] (duration: 00m 38s) [20:48:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:48:41] ^ Dereckson should have made you sync that one it seems :D [20:48:58] Thanks anyway for the sync. [20:49:07] np :) [20:49:25] your user is being created on bast1001 right now [20:49:35] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [20:49:46] then you'll need ssh config to proxy through it [20:50:02] Dereckson: wanna try ssh bast1001.wikimedia.org first? [20:50:04] Dereckson: welcome :) feel free to ping me any time you have any questions, I'll point you in the right direction :) [20:50:27] mutante: okay. Thanks greg-g. [20:53:15] 6Operations, 6Labs, 10Monitoring, 10wikitech.wikimedia.org: Bacula recovery of sql files from silver/wikitech fails - https://phabricator.wikimedia.org/T131195#2158892 (10jcrespo) [20:54:03] Dereckson: https://phabricator.wikimedia.org/P2830 [20:54:11] dereckson@bast1001:~$ id [20:54:12] uid=2362(dereckson) gid=500(wikidev) groups=500(wikidev),705(deployment) [20:54:16] It works. [20:54:45] Dereckson: cool, then the next step is getting to tin.eqiad.wmnet [20:54:49] via the bastion [20:55:02] Dereckson: which timeslot are you most around for? 15:00 UTC or 23:00 UTC? [20:56:58] 6Operations, 6Labs, 10Monitoring, 10wikitech.wikimedia.org: Bacula recovery of sql files from silver/wikitech fails - https://phabricator.wikimedia.org/T131195#2158913 (10jcrespo) I did a last try trying to recover all possible files within a month, and I got some extra errors: ``` 29-Mar 20:39 helium.eq... [20:57:55] 6Operations, 10ops-eqiad, 10hardware-requests: reclaim to spares: cp1056, cp1057, cp1069, cp1070 - https://phabricator.wikimedia.org/T130884#2158915 (10Cmjohnson) [20:58:09] 6Operations, 10ops-eqiad, 10hardware-requests: reclaim to spares: cp1056, cp1057, cp1069, cp1070 - https://phabricator.wikimedia.org/T130884#2149499 (10Cmjohnson) racktables updated [21:08:10] mutante: works: [21:08:11] dereckson@tin:~$ id [21:08:12] uid=2362(dereckson) gid=500(wikidev) groups=500(wikidev),705(deployment) [21:08:32] Dereckson: :) great, then you should be all set [21:11:15] Thanks. [21:13:14] 6Operations, 10Ops-Access-Requests, 13Patch-For-Review, 15User-greg: Requesting access to production for SWAT deploy for dereckson - https://phabricator.wikimedia.org/T129365#2158960 (10Dzahn) 5Open>3Resolved Dereckson was able to login on tin via bast1001. [21:13:27] 6Operations, 10Ops-Access-Requests, 15User-greg: Requesting access to production for SWAT deploy for dereckson - https://phabricator.wikimedia.org/T129365#2158963 (10Dzahn) [21:14:12] I sent a subscription to the operations mailing list per https://wikitech.wikimedia.org/wiki/SWAT_deploys#New_SWAT_Team_member_check-list [21:14:25] I think you missed a step actually [21:15:07] people with deployment access need to be added to the wmf-deployment group in gerrit [21:15:09] Could someone add me to https://gerrit.wikimedia.org/r/#/admin/groups/21,members? [21:15:34] done [21:15:37] Thanks. [21:16:08] congrats [21:17:25] Dereckson: yay! [21:17:54] I added an extra entry to the list, sorry [21:20:37] 6Operations, 10Wikimedia-Site-Requests, 5Security: ACL configuration for url-downloader.wikimedia.org allowing upload.wikimedia.org - https://phabricator.wikimedia.org/T130695#2158994 (10Bawolff) [21:21:39] 6Operations, 10Security-Reviews, 6Security-Team, 10Wikimedia-Site-Requests: ACL configuration for url-downloader.wikimedia.org allowing upload.wikimedia.org - https://phabricator.wikimedia.org/T130695#2158998 (10csteipp) [21:22:37] 6Operations, 10Security-Reviews, 6Security-Team, 10Wikimedia-Site-Requests: ACL configuration for url-downloader.wikimedia.org allowing upload.wikimedia.org - https://phabricator.wikimedia.org/T130695#2159000 (10Bawolff) For reference, relavent file is templates/url_downloader/squid.conf.erb in operations/... [21:24:21] (03PS4) 10Andrew Bogott: Replace labtestweb-roots with labtest-roots, add Krenair [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) [21:31:06] (03CR) 10Dzahn: [C: 031] Replace labtestweb-roots with labtest-roots, add Krenair [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) (owner: 10Andrew Bogott) [21:35:37] (03PS6) 10Andrew Bogott: Stop using old labs-ns0 and labs-ns1, move ns2/ns3 to ns0/ns1 [puppet] - 10https://gerrit.wikimedia.org/r/279945 (https://phabricator.wikimedia.org/T131052) [21:35:39] (03PS1) 10Andrew Bogott: Labtest: Remove an errant reference to labs-ns0.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/280329 [21:36:03] (03PS5) 10Andrew Bogott: Replace labtestweb-roots with labtest-roots, add Krenair [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) [21:39:46] (03CR) 10Andrew Bogott: [C: 032] Replace labtestweb-roots with labtest-roots, add Krenair [puppet] - 10https://gerrit.wikimedia.org/r/280264 (https://phabricator.wikimedia.org/T131166) (owner: 10Andrew Bogott) [21:40:36] (03PS2) 10Andrew Bogott: Labtest: Remove an errant reference to labs-ns0.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/280329 [21:42:25] (03CR) 10Andrew Bogott: [C: 032] Labtest: Remove an errant reference to labs-ns0.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/280329 (owner: 10Andrew Bogott) [21:43:32] (03CR) 10Volans: "Runs of Puppet compiler are available here:" [puppet] - 10https://gerrit.wikimedia.org/r/279596 (https://phabricator.wikimedia.org/T111654) (owner: 10Volans) [21:45:00] (03CR) 10Krinkle: [C: 04-1] "Looks fine, but I'd recommend splitting so that the entry point html at "^/" has mandatory must-revalidate and a shorter client" [puppet] - 10https://gerrit.wikimedia.org/r/280204 (https://phabricator.wikimedia.org/T126280) (owner: 10Gehel) [21:56:46] PROBLEM - Kafka Broker Replica Max Lag on kafka1014 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [5000000.0] [22:00:20] (03CR) 1020after4: [C: 031] "ok this compiles and doesn't seem to have broken beta cluster" [puppet] - 10https://gerrit.wikimedia.org/r/279198 (owner: 1020after4) [22:03:32] (03CR) 10Andrew Bogott: "The puppet compiler likes this now. It doesn't show any changes in the recursor files, which may be because there aren't any... I'll add " [puppet] - 10https://gerrit.wikimedia.org/r/268921 (owner: 10Alex Monk) [22:04:48] 6Operations, 10Ops-Access-Requests, 10Ops-Access-Reviews, 13Patch-For-Review: Root on labtest* for Krenair - https://phabricator.wikimedia.org/T131166#2159183 (10Andrew) 5Open>3Resolved [22:05:55] 6Operations, 6Labs, 10Monitoring, 10wikitech.wikimedia.org: Bacula recovery of sql files from silver/wikitech fails - https://phabricator.wikimedia.org/T131195#2158892 (10Dzahn) I also tried this multiple times and to different restore clients. Either i got an empty file with that same error about decrypti... [22:11:15] (03PS2) 10Dzahn: re-activate mw2090 in conftool [puppet] - 10https://gerrit.wikimedia.org/r/280145 [22:20:17] (03CR) 10Dzahn: [C: 032] re-activate mw2090 in conftool [puppet] - 10https://gerrit.wikimedia.org/r/280145 (owner: 10Dzahn) [22:21:11] !log Deployed patch for T127420 to wmf18 and wmf19 [22:21:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:23:38] (03CR) 10Dzahn: "uploaded in 2014 for eyeballs-only. rm self" [puppet] - 10https://gerrit.wikimedia.org/r/172700 (owner: 10ArielGlenn) [22:25:06] RECOVERY - Kafka Broker Replica Max Lag on kafka1014 is OK: OK: Less than 50.00% above the threshold [1000000.0] [22:29:02] 6Operations, 10DBA, 13Patch-For-Review: Investigate/decom db2001-db2007 - https://phabricator.wikimedia.org/T125827#2159300 (10Dzahn) So what's the actual blocker? Is there really one since Moritz says he only added so they get updates? [22:29:11] jynus: https://phabricator.wikimedia.org/T124697 [22:29:17] Any preliminary results? [22:33:34] not yet, every time I want to work on that, some disaster happens somewhere [22:34:16] I've also been told something I did not counted with, and that means I will have to change the db weights [22:34:25] (terbium on dallas) [22:39:45] 6Operations, 10DBA, 13Patch-For-Review: Investigate/decom db2001-db2007 - https://phabricator.wikimedia.org/T125827#2159370 (10jcrespo) I need to see the destination of the disks to have at least working complete servers before the failover. Some es2 hosts and these have to be checked to try to solve codfw... [22:45:38] 6Operations, 10Traffic, 7HTTPS: irc.wikimedia.org talks HTTP but not HTTPS - https://phabricator.wikimedia.org/T130981#2152741 (10Dzahn) That redirect only exists because it used to be an "It works!" Apache site in the past and i thought it was ugly so redirected it to that meta page a long time ago. So it's... [22:47:21] Deployed patch for T123653 to wmf18 and wmf19 [22:47:30] !log Deployed patch for T123653 to wmf18 and wmf19 [22:47:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:50:54] 6Operations, 10OCG-General-or-Unknown, 6Scrum-of-Scrums, 6Services: The OCG cleanup cache script doesn't work properly - https://phabricator.wikimedia.org/T120079#2159381 (10Dzahn) [22:50:55] 6Operations, 10OCG-General-or-Unknown, 6Services: OCG should not be contacted directly from the appservers but only via LVS - https://phabricator.wikimedia.org/T120077#2159382 (10Dzahn) [22:50:57] 6Operations: Increase size of root partition on ocg* servers - https://phabricator.wikimedia.org/T130591#2159380 (10Dzahn) [22:51:00] 6Operations, 6Services: reinstall OCG servers - https://phabricator.wikimedia.org/T84723#2159383 (10Dzahn) [22:56:06] 6Operations, 6Discovery, 7Elasticsearch: Icinga should alert on free disk space < 15% - https://phabricator.wikimedia.org/T130329#2132816 (10Dzahn) It currently alerts by default at 6% / 3% ``` # nrpe_check_disk_options - Default options for checking disks. Defaults to checking #... [22:56:25] 6Operations, 6Discovery, 7Elasticsearch: Icinga should alert on free disk space < 15% - https://phabricator.wikimedia.org/T130329#2159394 (10Dzahn) a:3Dzahn [22:56:45] 6Operations, 6Discovery, 7Elasticsearch: Icinga should alert on free disk space < 15% on Elasticsearch hosts - https://phabricator.wikimedia.org/T130329#2132816 (10Dzahn) [22:57:26] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2159396 (10RobH) >>! In T131165#2158816, @Krenair wrote: > The ops list has been open to non-WMF-staff with NDAs for years. That list of volunteers was Domas for a very long time. We have the... [22:59:31] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2159398 (10greg) Volunteer NDA, see also, Dereckson now a swat deployer will need to be on the ops@ list. HOWEVER, I don't think the ops@ list is the right list for this conversation, personal... [22:59:39] 6Operations, 10Ops-Access-Requests, 15User-greg: Requesting access to production for SWAT deploy for dereckson - https://phabricator.wikimedia.org/T129365#2159399 (10Dereckson) 5Resolved>3Open a:5Dzahn>3None [23:00:04] RoanKattouw ostriches Krenair MaxSem: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160329T2300). [23:00:04] Jdlrobson MatmaRex: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:20] Still up, Dereckson? [23:01:57] 6Operations, 10Ops-Access-Requests, 15User-greg: Requesting access to production for SWAT deploy for dereckson - https://phabricator.wikimedia.org/T129365#2159404 (10Dereckson) > Your request to the Ops mailing list > > Subscription request > > has been rejected by the list moderator. The moderator ga... [23:02:24] \o [23:03:19] Krenair: yes I'm [23:04:03] jenkins looks unhappy with that second commit jdlrobson [23:04:07] (03PS1) 10Dzahn: irc.wikimedia.org - remove Apache [puppet] - 10https://gerrit.wikimedia.org/r/280342 (https://phabricator.wikimedia.org/T130981) [23:04:09] (03PS1) 10Dzahn: elastic: change disk space monitoring to alert at 15% [puppet] - 10https://gerrit.wikimedia.org/r/280343 (https://phabricator.wikimedia.org/T130329) [23:04:13] Dereckson, want to do this swat? [23:04:14] Krenair lemme have a look [23:04:58] Krenair: npm ERR! enoent ENOENT, open '/mnt/jenkins-workspace/workspace/mwext-qunit/src/node_modules/karma/node_modules/socket.io/node_modules/socket.io-client/node_modules/engine.io-client/node_modules/has-cors/package.json' [23:05:08] 6Operations, 10OCG-General-or-Unknown, 6Scrum-of-Scrums, 6Services, 7Technical-Debt: The OCG cleanup cache script doesn't work properly - https://phabricator.wikimedia.org/T120079#2159423 (10RobLa-WMF) [23:05:10] recheck should solve it [23:05:18] (03PS2) 10Dzahn: elastic: change disk space monitoring to alert at 15% [puppet] - 10https://gerrit.wikimedia.org/r/280343 (https://phabricator.wikimedia.org/T130329) [23:05:24] Krenair: not especially this evening. I haven't look the patches nor read all the documentation [23:06:04] jdlrobson: there's a bug for that failure [23:06:08] (which no one responded to) [23:06:35] ok [23:06:40] (03PS3) 10Dzahn: elastic: change disk space monitoring to alert at 15% [puppet] - 10https://gerrit.wikimedia.org/r/280343 (https://phabricator.wikimedia.org/T130329) [23:07:11] (03PS4) 10Dzahn: elastic: change disk space monitoring to alert at 15% [puppet] - 10https://gerrit.wikimedia.org/r/280343 (https://phabricator.wikimedia.org/T130329) [23:07:29] gehel: still around ? ^ that should just work [23:07:32] Dereckson, we usually have people do their first deployments with someone watching over - I personally had to do the literal wrist-slapping once :P [23:08:57] Dereckson, if you want someone to look over, I can help with that [23:09:06] 6Operations, 10Deployment-Systems, 6Release-Engineering-Team: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2159431 (10Dzahn) Reedy, how does that (purging usual localisation caches) currently get triggered? [23:09:55] Okay, thanks MaxSem. [23:10:18] 6Operations, 10Deployment-Systems, 6Release-Engineering-Team: setup automatic deletion of old l10nupdate - https://phabricator.wikimedia.org/T130317#2159432 (10Reedy) >>! In T130317#2159431, @Dzahn wrote: > Reedy, how does that (purging usual localisation caches) currently get triggered? Manually, so just a... [23:11:16] Dereckson, then let me know when you're ready - we can do screen sharing or something [23:11:46] PROBLEM - Kafka Broker Replica Max Lag on kafka1020 is CRITICAL: CRITICAL: 53.33% of data above the critical threshold [5000000.0] [23:12:15] 6Operations, 7Documentation, 7LDAP: Review list of LDAP groups and document exactly what kind of access they can be allowed to provide - https://phabricator.wikimedia.org/T129788#2116378 (10Dzahn) "nda" has been created to be able to let non-wmf volunteers have access to tools like icinga and graphite withou... [23:14:09] MaxSem: what do you think it's the best way to share a tmux? [23:14:11] MaxSem, I think he said not today [23:14:33] Krenair, I wansn't suggsting today [23:14:36] (Indeed, would be more confortable after having browsed the documentation) [23:15:02] Dereckson, can do [23:17:45] (03PS1) 10GWicke: Staging: Remove inactive IPs from Cassandra seeds [puppet] - 10https://gerrit.wikimedia.org/r/280349 [23:19:05] 6Operations, 10Traffic, 10domains: Register nlwikipedia.org to prevent squatting - https://phabricator.wikimedia.org/T128968#2159456 (10Dzahn) We could do this, but minus the redirect. That would mean nlwikipedia.org would simply be not found, like for example http://www.wikipedia.es/ but it would still be... [23:19:37] (03Abandoned) 10GWicke: Staging: Remove inactive IPs from Cassandra seeds [puppet] - 10https://gerrit.wikimedia.org/r/280349 (owner: 10GWicke) [23:21:07] (03PS1) 10GWicke: Staging: Clean up restbase seeds [puppet] - 10https://gerrit.wikimedia.org/r/280351 [23:22:00] 6Operations, 7Puppet: Reboot during puppet run causes /var/lib/puppet/state/agent_catalog_run.lock to be left and puppet to not start running again - https://phabricator.wikimedia.org/T127602#2048113 (10Dzahn) could we hook into **molly-guard**? That already runs when a user tries to reboot the machine and mak... [23:22:15] RECOVERY - Kafka Broker Replica Max Lag on kafka1020 is OK: OK: Less than 50.00% above the threshold [1000000.0] [23:22:30] (03CR) 10Ppchelko: [C: 031] Staging: Clean up restbase seeds [puppet] - 10https://gerrit.wikimedia.org/r/280351 (owner: 10GWicke) [23:22:42] 6Operations, 7Documentation, 7LDAP: Review list of LDAP groups and document exactly what kind of access they can be allowed to provide - https://phabricator.wikimedia.org/T129788#2159473 (10Krenair) Yes, {T129786} exists to get rid of the distinction and just have all of those users in an nda group. [23:24:19] 6Operations, 10Deployment-Systems: error on tin:/srv/mediawiki-staging: insufficient permission for adding an object to repository database .git/objects - https://phabricator.wikimedia.org/T127093#2032127 (10Dzahn) This keeps happening every once in a while, then usually a root manually fixes it. It happens wh... [23:26:14] 6Operations, 6Discovery, 7Elasticsearch, 13Patch-For-Review: Icinga should alert on free disk space < 15% on Elasticsearch hosts - https://phabricator.wikimedia.org/T130329#2159484 (10Dzahn) also see T126158 btw [23:27:00] (03CR) 10Dzahn: [C: 04-1] "meanwhile this has already been done. you can already set the check disk parameters in hiera" [puppet] - 10https://gerrit.wikimedia.org/r/193834 (owner: 10ArielGlenn) [23:27:00] so, is anyone deploying? Krenair MaxSem? [23:27:13] * MaxSem scratches head [23:27:21] guess I can [23:28:15] (03CR) 10Dzahn: "use "nrpe_check_disk_options" in hiera. that's all you need" [puppet] - 10https://gerrit.wikimedia.org/r/193834 (owner: 10ArielGlenn) [23:28:56] MatmaRex, no wm18 needed? [23:29:16] MaxSem: no, i want this in wmf.19. it missed the cut [23:29:23] ok [23:30:07] (03CR) 10Krinkle: "Remove the class as well?" [puppet] - 10https://gerrit.wikimedia.org/r/280342 (https://phabricator.wikimedia.org/T130981) (owner: 10Dzahn) [23:30:30] MatmaRex, is UW under gate-and-commit? [23:30:49] zuul doesn't seem to be reacting to it [23:30:50] yes? i think no. not sure what you're asking [23:30:56] jenkins should merge the change [23:31:04] well, maybe it's broken again [23:31:15] i see it on https://integration.wikimedia.org/zuul/ now [23:31:21] now I see it - took it a bit of time [23:31:26] 6Operations, 10Deployment-Systems: error on tin:/srv/mediawiki-staging: insufficient permission for adding an object to repository database .git/objects - https://phabricator.wikimedia.org/T127093#2159494 (10thcipriani) 5Open>3Invalid Yeah, this was happening with some frequency around the time I filed thi... [23:33:17] MaxSem: Krenair so who is swatting my changes? if anyone? [23:33:38] me [23:33:42] it's in zuul [23:34:01] cool. Some how missed the ping - i think my irc cloud is playing up again [23:34:19] Thanks MaxSem [23:36:13] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2159525 (10Dzahn) >>! In T131165#2159398, @greg wrote: > Volunteer NDA, see also, Dereckson now a swat deployer will need to be on the ops@ list. That would mean having to sign L2, afaict. >... [23:38:20] 6Operations, 10DBA, 13Patch-For-Review: Investigate/decom db2001-db2007 - https://phabricator.wikimedia.org/T125827#2159526 (10Dzahn) Got it, thanks for explaining. [23:38:42] 6Operations, 10Wikimedia-Mailing-lists: add User:Gnom to ops mailing list - https://phabricator.wikimedia.org/T131165#2159528 (10MaxSem) 5Open>3declined Please start a discussion on wikitech-l. ops@ is for non-public matters only. [23:45:01] MaxSem: zuul is done yay [23:48:22] !log maxsem@tin Synchronized php-1.27.0-wmf.18/extensions/WikidataPageBanner/: https://gerrit.wikimedia.org/r/#/c/280328/ (duration: 00m 38s) [23:48:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:48:26] jdlrobson, please test ^^^ [23:49:56] MaxSem: on it [23:50:17] !log maxsem@tin Synchronized php-1.27.0-wmf.19/extensions/UploadWizard/: https://gerrit.wikimedia.org/r/#/c/280339/ (duration: 00m 33s) [23:50:19] MatmaRex, please test ^^^ [23:50:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:50:55] MaxSem: verified on testwiki. thanks [23:53:49] looks good to me MaxSem Thanks! [23:54:59] !log maxsem@tin Synchronized php-1.27.0-wmf.19/extensions/WikidataPageBanner/: https://gerrit.wikimedia.org/r/#/c/280327/ (duration: 00m 33s) [23:55:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:55:13] jdlrobson, ^