[00:03:37] James_F: That sounds like Thiemo from WMDE owns that tool [00:04:02] hoo|away: Yeah; is he still around? Does he use Phabricator? Etc. [00:04:11] James_F: Yep, all of that [00:04:19] subscribed him, guess he'll see it tomorrow at work [00:04:20] hoo|away: OK, let's hope. [00:05:15] James_F: Also moar tool roots would be nice [00:05:34] hoo|away: How many do we have? [00:06:24] 15, most of them are ops [00:09:04] Eh. [01:17:25] 6operations, 10Parsoid, 6Services: Lets consider upgrading our nodejs installs to iojs (once decent Debian packages are ready) - https://phabricator.wikimedia.org/T91855#1099423 (10GWicke) There are jessie iojs packages from nodesource: - https://deb.nodesource.com/iojs_1.x/pool/main/i/iojs/ - repo key: htt... [02:05:15] (03PS1) 10Tim Starling: More secure permissions on conf cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195196 [02:05:36] !log l10nupdate Synchronized php-1.25wmf19/cache/l10n: (no message) (duration: 00m 01s) [02:05:43] Logged the message, Master [02:06:43] !log LocalisationUpdate completed (1.25wmf19) at 2015-03-09 02:05:40+00:00 [02:06:50] Logged the message, Master [02:07:09] !log l10nupdate Synchronized php-1.25wmf20/cache/l10n: (no message) (duration: 00m 01s) [02:07:15] Logged the message, Master [02:08:17] !log LocalisationUpdate completed (1.25wmf20) at 2015-03-09 02:07:13+00:00 [02:08:22] Logged the message, Master [02:10:42] (03PS1) 10Jalexander: Disable anonymous page creation on swWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195197 (https://phabricator.wikimedia.org/T44894) [02:16:47] (03CR) 10Ori.livneh: [C: 031] More secure permissions on conf cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195196 (owner: 10Tim Starling) [02:19:07] (03CR) 10Hoo man: "hoo@terbium:~$ sudo -u www-data bash -c "umask"" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195196 (owner: 10Tim Starling) [02:21:31] !log LocalisationUpdate ResourceLoader cache refresh completed at Mon Mar 9 02:20:28 UTC 2015 (duration 20m 27s) [02:21:39] Logged the message, Master [02:40:42] (03CR) 10Tim Starling: "What is your point, Hoo? 022 is the default." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195196 (owner: 10Tim Starling) [02:41:38] (03PS2) 10MZMcBride: Disable anonymous page creation on swWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195197 (https://phabricator.wikimedia.org/T44894) (owner: 10Jalexander) [02:42:27] (03CR) 10Hoo man: "I initially thought we set it to something "weird" in puppet, but apparently we don't... should have been a bit more explicit here, just w" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195196 (owner: 10Tim Starling) [02:44:55] (03CR) 10Hoo man: [C: 031] More secure permissions on conf cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195196 (owner: 10Tim Starling) [03:06:25] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 75039 MB (3% inode=99%): [03:32:34] PROBLEM - puppet last run on amssq36 is CRITICAL: CRITICAL: Puppet has 1 failures [03:50:24] RECOVERY - puppet last run on amssq36 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [04:09:18] (03PS1) 10BBlack: depool cp3016 and mark cp4014 [puppet] - 10https://gerrit.wikimedia.org/r/195201 [04:09:33] (03CR) 10BBlack: [C: 032 V: 032] depool cp3016 and mark cp4014 [puppet] - 10https://gerrit.wikimedia.org/r/195201 (owner: 10BBlack) [04:14:24] PROBLEM - Host cp3016 is DOWN: PING CRITICAL - Packet loss = 100% [04:14:35] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Puppet has 1 failures [04:22:38] (03PS1) 10BBlack: depool amssq3[56] for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195202 [04:22:52] (03CR) 10BBlack: [C: 032 V: 032] depool amssq3[56] for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195202 (owner: 10BBlack) [04:32:09] PROBLEM - puppet last run on amssq41 is CRITICAL: CRITICAL: Puppet has 1 failures [04:33:09] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [04:33:58] PROBLEM - Host amssq35 is DOWN: PING CRITICAL - Packet loss = 100% [04:35:59] RECOVERY - Host amssq35 is UP: PING OK - Packet loss = 0%, RTA = 88.48 ms [04:36:19] PROBLEM - Host amssq36 is DOWN: PING CRITICAL - Packet loss = 100% [04:38:19] RECOVERY - Host amssq36 is UP: PING OK - Packet loss = 0%, RTA = 88.24 ms [04:42:50] PROBLEM - DPKG on cp3016 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [04:43:59] RECOVERY - DPKG on cp3016 is OK: All packages OK [04:51:41] RECOVERY - puppet last run on amssq41 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [04:51:50] PROBLEM - Varnish HTTP upload-backend on cp3016 is CRITICAL: Connection refused [04:52:59] RECOVERY - Varnish HTTP upload-backend on cp3016 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.200 second response time [04:55:12] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: RESTBase deploy access and shell on Cassandra cluster for eevans - https://phabricator.wikimedia.org/T91134#1099512 (10Eevans) Here is my public key: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDdE3SnW3e2DxOwJ9iclnbne01cjmoyp6irLeolPpMLcyKXx44eaK44qqhyfRnEoM3... [04:56:21] (03PS1) 10BBlack: remove sparse_super2 from cache fs flags [puppet] - 10https://gerrit.wikimedia.org/r/195205 [04:56:23] (03PS1) 10BBlack: repool cp3016 [puppet] - 10https://gerrit.wikimedia.org/r/195206 [04:56:25] (03PS1) 10BBlack: depool cp4020+cp1069 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195207 [04:56:42] (03CR) 10BBlack: [C: 032 V: 032] remove sparse_super2 from cache fs flags [puppet] - 10https://gerrit.wikimedia.org/r/195205 (owner: 10BBlack) [04:57:00] (03CR) 10BBlack: [C: 032 V: 032] repool cp3016 [puppet] - 10https://gerrit.wikimedia.org/r/195206 (owner: 10BBlack) [04:57:11] (03CR) 10BBlack: [C: 032 V: 032] depool cp4020+cp1069 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195207 (owner: 10BBlack) [05:06:23] PROBLEM - Host cp1069 is DOWN: PING CRITICAL - Packet loss = 100% [05:07:23] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 3 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [05:07:53] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 3 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [05:11:12] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [05:11:53] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [05:14:41] (03PS1) 10BBlack: repool amssq35 [puppet] - 10https://gerrit.wikimedia.org/r/195209 [05:14:43] RECOVERY - Host cp1069 is UP: PING OK - Packet loss = 0%, RTA = 1.10 ms [05:14:56] (03CR) 10BBlack: [C: 032 V: 032] repool amssq35 [puppet] - 10https://gerrit.wikimedia.org/r/195209 (owner: 10BBlack) [05:19:33] PROBLEM - Host cp4020 is DOWN: PING CRITICAL - Packet loss = 100% [05:24:36] (03PS1) 10BBlack: repool amssq36 [puppet] - 10https://gerrit.wikimedia.org/r/195210 [05:24:51] (03CR) 10BBlack: [C: 032 V: 032] repool amssq36 [puppet] - 10https://gerrit.wikimedia.org/r/195210 (owner: 10BBlack) [05:37:35] (03PS1) 10BBlack: repool cp1069 [puppet] - 10https://gerrit.wikimedia.org/r/195211 [05:37:48] (03CR) 10BBlack: [C: 032 V: 032] repool cp1069 [puppet] - 10https://gerrit.wikimedia.org/r/195211 (owner: 10BBlack) [05:44:15] PROBLEM - salt-minion processes on cp1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [05:46:43] (03PS1) 10BBlack: depool cp1055 [puppet] - 10https://gerrit.wikimedia.org/r/195212 [05:47:09] (03CR) 10BBlack: [C: 032 V: 032] depool cp1055 [puppet] - 10https://gerrit.wikimedia.org/r/195212 (owner: 10BBlack) [05:53:46] PROBLEM - Host cp1055 is DOWN: PING CRITICAL - Packet loss = 100% [05:59:16] RECOVERY - Host cp1055 is UP: PING OK - Packet loss = 0%, RTA = 10.02 ms [06:00:02] (03PS1) 10BBlack: depool cp3012 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195213 [06:02:56] (03CR) 10BBlack: [C: 032 V: 032] depool cp3012 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195213 (owner: 10BBlack) [06:06:35] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 10 data above and 4 below the confidence bounds [06:09:14] PROBLEM - Host cp3012 is DOWN: PING CRITICAL - Packet loss = 100% [06:14:27] (03CR) 10Cwek: "The VE is enabled for beta-test on the Main Namespace.I think it can be supported to the Draft Namespace." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [06:14:52] (03PS1) 10BBlack: repool cp1055 [puppet] - 10https://gerrit.wikimedia.org/r/195214 [06:15:23] (03CR) 10BBlack: [C: 032 V: 032] repool cp1055 [puppet] - 10https://gerrit.wikimedia.org/r/195214 (owner: 10BBlack) [06:17:53] PROBLEM - salt-minion processes on cp3002 is CRITICAL: Connection refused by host [06:18:03] PROBLEM - configured eth on cp3002 is CRITICAL: Connection refused by host [06:18:13] PROBLEM - dhclient process on cp3002 is CRITICAL: Connection refused by host [06:22:50] PROBLEM - Varnish HTTP mobile-backend on cp4020 is CRITICAL: Connection refused [06:23:00] PROBLEM - Varnish HTTP mobile-frontend on cp4020 is CRITICAL: Connection refused [06:23:19] PROBLEM - Varnish traffic logger on cp4020 is CRITICAL: Connection refused by host [06:23:40] PROBLEM - Varnishkafka log producer on cp4020 is CRITICAL: Connection refused by host [06:23:49] PROBLEM - configured eth on cp4020 is CRITICAL: Connection refused by host [06:23:59] PROBLEM - DPKG on cp4020 is CRITICAL: Connection refused by host [06:24:00] PROBLEM - dhclient process on cp4020 is CRITICAL: Connection refused by host [06:24:30] PROBLEM - HTTPS on cp4020 is CRITICAL: Return code of 255 is out of bounds [06:24:50] RECOVERY - configured eth on cp4020 is OK: NRPE: Unable to read output [06:25:00] RECOVERY - DPKG on cp4020 is OK: All packages OK [06:25:00] RECOVERY - dhclient process on cp4020 is OK: PROCS OK: 0 processes with command name dhclient [06:25:10] RECOVERY - Varnish HTTP mobile-frontend on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 349 bytes in 0.156 second response time [06:25:29] RECOVERY - Varnish traffic logger on cp4020 is OK: PROCS OK: 2 processes with command name varnishncsa [06:25:40] RECOVERY - HTTPS on cp4020 is OK: SSLXNN OK - 36 OK [06:25:50] RECOVERY - Varnishkafka log producer on cp4020 is OK: PROCS OK: 1 process with command name varnishkafka [06:26:09] RECOVERY - Varnish HTTP mobile-backend on cp4020 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.157 second response time [06:28:19] PROBLEM - puppet last run on cp3014 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:20] PROBLEM - puppet last run on amssq35 is CRITICAL: CRITICAL: Puppet has 1 failures [06:28:30] (03PS1) 10BBlack: repool cp4020 [puppet] - 10https://gerrit.wikimedia.org/r/195215 [06:28:30] PROBLEM - puppet last run on mw1222 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:10] PROBLEM - puppet last run on mw1176 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:20] PROBLEM - puppet last run on mw1092 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:30] PROBLEM - puppet last run on mw1065 is CRITICAL: CRITICAL: Puppet has 2 failures [06:29:40] PROBLEM - puppet last run on mw1052 is CRITICAL: CRITICAL: Puppet has 1 failures [06:29:46] (03CR) 10BBlack: [C: 032 V: 032] repool cp4020 [puppet] - 10https://gerrit.wikimedia.org/r/195215 (owner: 10BBlack) [06:30:30] PROBLEM - puppet last run on mw1144 is CRITICAL: CRITICAL: Puppet has 1 failures [06:30:50] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:00] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 4 below the confidence bounds [06:44:55] RECOVERY - puppet last run on mw1176 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [06:45:45] RECOVERY - puppet last run on amssq35 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [06:45:56] RECOVERY - puppet last run on mw1222 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:46:06] RECOVERY - puppet last run on mw1144 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:46:16] RECOVERY - puppet last run on mw1092 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [06:46:36] RECOVERY - puppet last run on cp3014 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on mw1065 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [06:46:56] RECOVERY - puppet last run on mw1052 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:47:25] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [07:07:36] RECOVERY - Disk space on fluorine is OK: DISK OK [07:15:14] (03PS1) 10BBlack: nginx workers: 1/cputhread [puppet] - 10https://gerrit.wikimedia.org/r/195216 [07:26:13] 6operations, 10Wikimedia-Labs-Other, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#1099553 (10jeremyb) [07:36:25] springle: hi, thanks for looking into the db1068 issue! is this kind of situation a total freak accident or is it possible other slaves could be affected like this one? [07:46:56] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [07:50:34] (03PS6) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [07:50:49] (03CR) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [08:14:53] (03PS7) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) [08:33:26] PROBLEM - puppet last run on lvs2003 is CRITICAL: CRITICAL: puppet fail [08:42:35] (03PS1) 10Tim Landscheidt: Tools: Remove tools-webproxy from list of proxies [puppet] - 10https://gerrit.wikimedia.org/r/195219 (https://phabricator.wikimedia.org/T91939) [08:43:08] (03PS2) 10Yuvipanda: Tools: Remove tools-webproxy from list of proxies [puppet] - 10https://gerrit.wikimedia.org/r/195219 (https://phabricator.wikimedia.org/T91939) (owner: 10Tim Landscheidt) [08:43:53] (03CR) 10Yuvipanda: [C: 032 V: 032] Tools: Remove tools-webproxy from list of proxies [puppet] - 10https://gerrit.wikimedia.org/r/195219 (https://phabricator.wikimedia.org/T91939) (owner: 10Tim Landscheidt) [08:43:58] (03CR) 10Tim Landscheidt: "This is blocking webservices starting, so please review and merge as soon as possible." [puppet] - 10https://gerrit.wikimedia.org/r/195219 (https://phabricator.wikimedia.org/T91939) (owner: 10Tim Landscheidt) [08:44:06] (03PS2) 10BBlack: nginx workers: 1/cputhread [puppet] - 10https://gerrit.wikimedia.org/r/195216 [08:44:16] (03CR) 10Tim Landscheidt: "Okay, that was quick :-)." [puppet] - 10https://gerrit.wikimedia.org/r/195219 (https://phabricator.wikimedia.org/T91939) (owner: 10Tim Landscheidt) [08:44:32] (03CR) 10BBlack: [C: 032 V: 032] nginx workers: 1/cputhread [puppet] - 10https://gerrit.wikimedia.org/r/195216 (owner: 10BBlack) [08:44:56] (03CR) 10Yuvipanda: "Puppet-merged :)" [puppet] - 10https://gerrit.wikimedia.org/r/195219 (https://phabricator.wikimedia.org/T91939) (owner: 10Tim Landscheidt) [08:46:53] hello [08:51:16] RECOVERY - puppet last run on lvs2003 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [09:28:28] (03PS1) 10Yuvipanda: tools: Increase proxylistener nofile limits [puppet] - 10https://gerrit.wikimedia.org/r/195221 (https://phabricator.wikimedia.org/T91939) [09:28:31] bblack: _joe_ ^ [09:28:50] (03PS2) 10Yuvipanda: tools: Increase proxylistener nofile limits [puppet] - 10https://gerrit.wikimedia.org/r/195221 (https://phabricator.wikimedia.org/T91939) [09:29:25] PROBLEM - HTTP error ratio anomaly detection on graphite1001 is CRITICAL: CRITICAL: Anomaly detected: 11 data above and 4 below the confidence bounds [09:29:42] (03CR) 10BBlack: [C: 031] tools: Increase proxylistener nofile limits [puppet] - 10https://gerrit.wikimedia.org/r/195221 (https://phabricator.wikimedia.org/T91939) (owner: 10Yuvipanda) [09:29:57] (03CR) 10Yuvipanda: [C: 032] tools: Increase proxylistener nofile limits [puppet] - 10https://gerrit.wikimedia.org/r/195221 (https://phabricator.wikimedia.org/T91939) (owner: 10Yuvipanda) [09:50:15] PROBLEM - check_mysql on db1008 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 634 [09:55:15] RECOVERY - check_mysql on db1008 is OK: Uptime: 2386742 Threads: 2 Questions: 25563162 Slow queries: 17272 Opens: 39430 Flush tables: 2 Open tables: 64 Queries per second avg: 10.710 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [11:08:36] RECOVERY - HTTP error ratio anomaly detection on graphite1001 is OK: OK: No anomaly detected [11:10:56] PROBLEM - puppet last run on db2016 is CRITICAL: CRITICAL: puppet fail [11:12:16] PROBLEM - puppet last run on install2001 is CRITICAL: CRITICAL: Puppet has 97 failures [11:28:46] RECOVERY - puppet last run on db2016 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [11:28:56] RECOVERY - puppet last run on install2001 is OK: OK: Puppet is currently enabled, last run 40 seconds ago with 0 failures [11:41:32] grrrit-wm: ping [11:41:50] hurr durr [11:41:54] I wonder if redis died agian [11:44:41] wb, grrrit-wm [11:53:58] PROBLEM - HHVM rendering on mw1040 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 50350 bytes in 7.224 second response time [11:54:58] RECOVERY - HHVM rendering on mw1040 is OK: HTTP OK: HTTP/1.1 200 OK - 72664 bytes in 0.146 second response time [11:58:22] bblack: merging the amssq48-51 change [12:00:09] ok [12:04:12] YuviPanda: How do I get grrrit-wm to come back in my channel? [12:04:27] RoanKattouw: it looks dead. I’m looking at it now. [12:04:29] Oh wait it is there [12:04:33] It just isn't reporting anything [12:04:34] OK [12:10:08] PROBLEM - Host amssq48 is DOWN: PING CRITICAL - Packet loss = 100% [12:11:18] PROBLEM - Host amssq52 is DOWN: PING CRITICAL - Packet loss = 100% [12:11:28] PROBLEM - Host amssq50 is DOWN: PING CRITICAL - Packet loss = 100% [12:11:31] bblack: ^ is this you? [12:11:38] PROBLEM - Host amssq51 is DOWN: PING CRITICAL - Packet loss = 100% [12:11:44] uh oh [12:11:45] yes :) [12:12:18] RECOVERY - Host amssq48 is UP: PING OK - Packet loss = 0%, RTA = 88.72 ms [12:12:27] PROBLEM - Host amssq53 is DOWN: PING CRITICAL - Packet loss = 100% [12:12:54] sorry, I had been relying on the relevant grrrit-wm messages in place of !log earlier, since they're similar [12:13:01] but no grrrit-wm warnings anymore :) [12:14:56] bblack: yeah, am working on that atm. [12:14:57] PROBLEM - Host amssq48 is DOWN: PING CRITICAL - Packet loss = 100% [12:15:02] tools-redis dead again, is fulllll [12:15:15] I wonder if we should get a bare-metal redis host for toollabs. 15G filled up [12:15:20] but I bet it’s just some tool misbehaving [12:15:28] RECOVERY - Host amssq48 is UP: PING OK - Packet loss = 0%, RTA = 89.74 ms [12:15:48] PROBLEM - Host amssq49 is DOWN: PING CRITICAL - Packet loss = 100% [12:17:57] RECOVERY - Host amssq49 is UP: PING WARNING - Packet loss = 64%, RTA = 87.77 ms [12:22:42] PROBLEM - Host amssq48 is DOWN: PING CRITICAL - Packet loss = 100% [12:23:57] (03PS3) 10Yuvipanda: Tools: Properly puppetize crontab replacement [puppet] - 10https://gerrit.wikimedia.org/r/186627 (https://phabricator.wikimedia.org/T86445) (owner: 10Tim Landscheidt) [12:24:02] RECOVERY - Host amssq48 is UP: PING OK - Packet loss = 0%, RTA = 89.98 ms [12:24:04] bblack: back [12:24:14] woot [12:24:41] redis was full - all 15G of it, and there were no volatile keys to expire [12:24:50] I just set them to allkeys-lru [12:24:57] you should use it only for caching, etc, etc. [12:26:51] PROBLEM - Host amssq48 is DOWN: PING CRITICAL - Packet loss = 100% [12:26:56] (03Abandoned) 10Glaisher: Set wgLanguageCode at cawikimedia from 'en-ca' to 'en' [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195092 (https://phabricator.wikimedia.org/T88843) (owner: 10Glaisher) [12:28:31] RECOVERY - Host amssq48 is UP: PING OK - Packet loss = 0%, RTA = 87.40 ms [12:32:38] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-mk] - 10https://gerrit.wikimedia.org/r/195244 (https://phabricator.wikimedia.org/T89936) [12:32:40] (03PS1) 10BBlack: depool amssq52-53 [puppet] - 10https://gerrit.wikimedia.org/r/195243 [12:32:50] (03CR) 10BBlack: [C: 032 V: 032] depool amssq52-53 [puppet] - 10https://gerrit.wikimedia.org/r/195243 (owner: 10BBlack) [12:34:31] PROBLEM - Host amssq48 is DOWN: PING CRITICAL - Packet loss = 100% [12:38:40] (03PS5) 10Glaisher: Delete vewikimedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171219 (https://phabricator.wikimedia.org/T57737) [12:40:35] (03CR) 10Glaisher: "Ready to go. Has been done on the ops side now" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/171219 (https://phabricator.wikimedia.org/T57737) (owner: 10Glaisher) [12:47:07] YuviPanda: So grrrit-wm was down for like an hour and of course Ed and I promptly went ahead and wrote the same commit independently without realizing what was happening because we rely on the bot to notify us xD [12:47:18] RoanKattouw: :D [12:47:21] http://graphite.wmflabs.org/render/?width=586&height=308&_salt=1425905223.08&target=tools.tools-redis.redis.6379.memory.internal_view.value&target=tools.tools-redis.redis.6379.memory.external_view.value&from=00%3A00_20150109&until=23%3A59_20150309 [12:47:26] the redis instance has been filling up [12:47:30] it should do better now, I think. [12:47:37] with allkeys-lru than volatile-lru [12:47:40] I shall write a check [12:48:04] RoanKattouw: I should also make grrrit-wm handle redis connection failures better. [12:48:52] bblack: what's that with amssq48? [12:53:33] what's which thing with amssq48? [12:53:54] it's currently being reinstalled, if that's what you mean [12:53:54] <_joe_> PROBLEM - Host amssq48 is DOWN: PING CRITICAL - Packet loss = 100% [12:53:59] <_joe_> oh ok [12:55:40] (03PS1) 10BBlack: repool amssq50-51 [puppet] - 10https://gerrit.wikimedia.org/r/195248 [12:55:47] ok [12:55:56] (03PS2) 10BBlack: repool amssq50-51 [puppet] - 10https://gerrit.wikimedia.org/r/195248 [12:56:02] (03CR) 10BBlack: [C: 032 V: 032] repool amssq50-51 [puppet] - 10https://gerrit.wikimedia.org/r/195248 (owner: 10BBlack) [13:02:51] PROBLEM - dhclient process on amssq52 is CRITICAL: Connection refused by host [13:02:51] PROBLEM - Varnish traffic logger on amssq49 is CRITICAL: Connection refused by host [13:02:51] PROBLEM - configured eth on amssq48 is CRITICAL: Connection refused by host [13:02:51] PROBLEM - DPKG on amssq52 is CRITICAL: Connection refused by host [13:03:01] PROBLEM - DPKG on amssq48 is CRITICAL: Connection refused by host [13:03:01] PROBLEM - dhclient process on amssq48 is CRITICAL: Connection refused by host [13:03:01] PROBLEM - Disk space on amssq52 is CRITICAL: Connection refused by host [13:03:01] PROBLEM - puppet last run on amssq52 is CRITICAL: Connection refused by host [13:03:20] PROBLEM - puppet last run on amssq48 is CRITICAL: Connection refused by host [13:03:21] PROBLEM - Disk space on amssq48 is CRITICAL: Connection refused by host [13:03:21] PROBLEM - salt-minion processes on amssq52 is CRITICAL: Connection refused by host [13:03:21] PROBLEM - Varnishkafka log producer on amssq49 is CRITICAL: Connection refused by host [13:03:21] PROBLEM - HTTPS on amssq52 is CRITICAL: Return code of 255 is out of bounds [13:03:25] ^ those are all normal, icinga found out about the new hosts before they could finish configuring themselves [13:03:31] PROBLEM - configured eth on amssq49 is CRITICAL: Connection refused by host [13:03:31] PROBLEM - salt-minion processes on amssq48 is CRITICAL: Connection refused by host [13:03:31] PROBLEM - HTTPS on amssq48 is CRITICAL: Return code of 255 is out of bounds [13:03:41] PROBLEM - dhclient process on amssq49 is CRITICAL: Connection refused by host [13:03:41] PROBLEM - DPKG on amssq49 is CRITICAL: Connection refused by host [13:03:50] PROBLEM - RAID on amssq48 is CRITICAL: Connection refused by host [13:03:51] PROBLEM - puppet last run on amssq49 is CRITICAL: Connection refused by host [13:03:51] PROBLEM - Disk space on amssq49 is CRITICAL: Connection refused by host [13:03:51] RECOVERY - dhclient process on amssq52 is OK: PROCS OK: 0 processes with command name dhclient [13:03:52] RECOVERY - configured eth on amssq48 is OK: NRPE: Unable to read output [13:04:00] RECOVERY - DPKG on amssq52 is OK: All packages OK [13:04:01] PROBLEM - HTTPS on amssq49 is CRITICAL: Return code of 255 is out of bounds [13:04:10] RECOVERY - dhclient process on amssq48 is OK: PROCS OK: 0 processes with command name dhclient [13:04:11] RECOVERY - DPKG on amssq48 is OK: All packages OK [13:04:11] RECOVERY - Disk space on amssq52 is OK: DISK OK [13:04:11] PROBLEM - Varnish HTTP text-backend on amssq52 is CRITICAL: Connection refused [13:04:21] RECOVERY - Disk space on amssq48 is OK: DISK OK [13:04:21] RECOVERY - salt-minion processes on amssq52 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:04:21] PROBLEM - Varnish HTTP text-frontend on amssq52 is CRITICAL: Connection refused [13:04:21] PROBLEM - Varnish HTTP text-backend on amssq48 is CRITICAL: Connection refused [13:04:40] RECOVERY - configured eth on amssq49 is OK: NRPE: Unable to read output [13:04:40] RECOVERY - salt-minion processes on amssq48 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:04:40] PROBLEM - Varnish HTTP text-frontend on amssq48 is CRITICAL: Connection refused [13:04:40] PROBLEM - Varnish traffic logger on amssq52 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [13:04:51] RECOVERY - dhclient process on amssq49 is OK: PROCS OK: 0 processes with command name dhclient [13:04:51] PROBLEM - Varnish traffic logger on amssq48 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishncsa [13:04:51] RECOVERY - RAID on amssq48 is OK: OK: Active: 2, Working: 2, Failed: 0, Spare: 0 [13:04:51] RECOVERY - Disk space on amssq49 is OK: DISK OK [13:05:00] PROBLEM - Varnish HTTP text-backend on amssq49 is CRITICAL: Connection refused [13:05:10] PROBLEM - Varnish HTTP text-frontend on amssq49 is CRITICAL: Connection refused [13:05:10] PROBLEM - Varnishkafka log producer on amssq48 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [13:05:11] RECOVERY - puppet last run on amssq52 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [13:05:20] RECOVERY - Varnish HTTP text-backend on amssq52 is OK: HTTP OK: HTTP/1.1 200 OK - 190 bytes in 0.181 second response time [13:05:30] RECOVERY - Varnishkafka log producer on amssq49 is OK: PROCS OK: 1 process with command name varnishkafka [13:05:31] RECOVERY - puppet last run on amssq48 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [13:05:31] RECOVERY - Varnish HTTP text-frontend on amssq52 is OK: HTTP OK: HTTP/1.1 200 OK - 285 bytes in 0.189 second response time [13:05:31] RECOVERY - Varnish HTTP text-backend on amssq48 is OK: HTTP OK: HTTP/1.1 200 OK - 190 bytes in 0.182 second response time [13:05:41] RECOVERY - HTTPS on amssq52 is OK: SSLXNN OK - 36 OK [13:05:41] RECOVERY - Varnish HTTP text-frontend on amssq48 is OK: HTTP OK: HTTP/1.1 200 OK - 284 bytes in 0.177 second response time [13:05:41] RECOVERY - Varnish traffic logger on amssq52 is OK: PROCS OK: 2 processes with command name varnishncsa [13:05:51] RECOVERY - DPKG on amssq49 is OK: All packages OK [13:05:51] RECOVERY - Varnish traffic logger on amssq48 is OK: PROCS OK: 2 processes with command name varnishncsa [13:06:00] RECOVERY - HTTPS on amssq48 is OK: SSLXNN OK - 36 OK [13:06:01] RECOVERY - puppet last run on amssq49 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [13:06:01] RECOVERY - Varnish HTTP text-backend on amssq49 is OK: HTTP OK: HTTP/1.1 200 OK - 189 bytes in 0.180 second response time [13:06:10] RECOVERY - Varnish traffic logger on amssq49 is OK: PROCS OK: 2 processes with command name varnishncsa [13:06:11] RECOVERY - Varnish HTTP text-frontend on amssq49 is OK: HTTP OK: HTTP/1.1 200 OK - 286 bytes in 0.178 second response time [13:06:11] RECOVERY - Varnishkafka log producer on amssq48 is OK: PROCS OK: 1 process with command name varnishkafka [13:06:30] RECOVERY - HTTPS on amssq49 is OK: SSLXNN OK - 36 OK [13:07:36] now I'm regretting https://gerrit.wikimedia.org/r/#/c/163752/ [13:07:40] :P [13:08:08] (03PS1) 10BBlack: repool amssq48,49,52,53 [puppet] - 10https://gerrit.wikimedia.org/r/195250 [13:08:21] (03CR) 10BBlack: [C: 032 V: 032] repool amssq48,49,52,53 [puppet] - 10https://gerrit.wikimedia.org/r/195250 (owner: 10BBlack) [13:15:11] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00666666666667 [13:19:11] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [13:19:21] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [13:20:31] bblack: ^ [13:20:56] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [13:21:56] oh [13:22:07] yeah, puppet-merge helps [13:22:10] :) [13:22:36] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [13:23:06] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [13:23:43] (03PS1) 10BBlack: remove old pub dns for amssq48-53,cp30[12]2 [dns] - 10https://gerrit.wikimedia.org/r/195251 [13:24:08] (03CR) 10BBlack: [C: 032] remove old pub dns for amssq48-53,cp30[12]2 [dns] - 10https://gerrit.wikimedia.org/r/195251 (owner: 10BBlack) [13:28:41] (03PS1) 10BBlack: depool cp40[01]1 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195253 [13:28:54] (03CR) 10BBlack: [C: 032 V: 032] depool cp40[01]1 for reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195253 (owner: 10BBlack) [13:35:26] PROBLEM - Host cp4001 is DOWN: PING CRITICAL - Packet loss = 100% [13:36:26] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [13:41:25] RECOVERY - Host cp4001 is UP: PING OK - Packet loss = 0%, RTA = 78.42 ms [13:42:16] RECOVERY - Host cp4011 is UP: PING OK - Packet loss = 0%, RTA = 79.12 ms [13:51:53] hashar: hey! around? [13:52:06] YuviPanda: busy packaging :D [13:52:45] hashar: alright. I’m going to go ahead and spend some more time testing my parsoid patch and then I’m going to co-ordinate with the parsoid folks to see when it will be least disruptive to merge, considering that the worst case is no jenkins-update. [13:52:50] (03PS4) 10Yuvipanda: parsoid: Remove parsoid beta role [puppet] - 10https://gerrit.wikimedia.org/r/193082 (https://phabricator.wikimedia.org/T86633) [13:54:14] akosiaris: your zotero patches have been in beta for more than two days... [13:54:46] YuviPanda: I was expecting you :P [13:55:01] no worries, I 'll be merging today [13:55:22] akosiaris: :D mind if I uncherry-pick? [13:56:01] hmm, I am actually testing a couple of last minute logging changes, I 'd rather not [13:56:07] but I can promise I will do it [13:56:12] as in today [13:56:13] akosiaris: alright [13:58:26] PROBLEM - Host cp4011 is DOWN: PING CRITICAL - Packet loss = 100% [14:10:16] (03PS1) 10Faidon Liambotis: Add asw-esams mgmt IP [dns] - 10https://gerrit.wikimedia.org/r/195255 [14:11:08] (03CR) 10Faidon Liambotis: [C: 032] Add asw-esams mgmt IP [dns] - 10https://gerrit.wikimedia.org/r/195255 (owner: 10Faidon Liambotis) [14:13:46] (03PS10) 10Alexandros Kosiaris: Puppet module for the zotero service [puppet] - 10https://gerrit.wikimedia.org/r/194495 (https://phabricator.wikimedia.org/T89867) [14:15:21] (03PS1) 10BBlack: repool cp40[01]1 [puppet] - 10https://gerrit.wikimedia.org/r/195256 [14:15:31] (03CR) 10BBlack: [C: 032 V: 032] repool cp40[01]1 [puppet] - 10https://gerrit.wikimedia.org/r/195256 (owner: 10BBlack) [14:16:31] (03CR) 10Alexandros Kosiaris: [C: 032] Puppet module for the zotero service [puppet] - 10https://gerrit.wikimedia.org/r/194495 (https://phabricator.wikimedia.org/T89867) (owner: 10Alexandros Kosiaris) [14:18:09] (03CR) 10Alexandros Kosiaris: [C: 032] Beta: Assign proxy for zotero [puppet] - 10https://gerrit.wikimedia.org/r/194552 (owner: 10Alexandros Kosiaris) [14:50:42] earldouglas: Ping for SWAT in about 9 minutes. [14:51:26] aaah, swat is an hour early [14:51:30] ^d, thcipriani: Who wants SWAT this morning? [14:51:38] aude: Yeah, US DST started this weekend [14:51:45] :/ [14:52:44] * anomie wonders if marktraceur is intentionally not listed as a SWATter this morning, or if he somehow accidentally got left off [14:53:54] Interesting question. [14:54:03] I'll take it as a sign. [14:54:30] the revenge of Special:Upload [14:55:27] ^d: never done deploy before, tagged along on the big Wednesday one. Would love to do this with some supervision in a screen session, if possible. [14:58:52] <^d> thcipriani: I dunno how to share a screen unless you're root but you're more than welcome to take the swat this AM [14:58:57] anomie: pong [14:58:59] <^d> I'll be around if you need a hand [15:02:44] (03PS1) 10QChris: Run analytics/refinery/source guards on stat1002 [puppet] - 10https://gerrit.wikimedia.org/r/195262 [15:02:58] Hmm, jouncebot is missing. thcipriani, are you going to do the SWAT? [15:03:33] sure, I can do it. ^d: Do you have a few for a google hangout, maybe? [15:05:37] <^d> I can, sure [15:06:14] (03CR) 10QChris: Run analytics/refinery/source guards on stat1002 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195262 (owner: 10QChris) [15:06:27] jouncebot: next [15:06:27] In 0 hour(s) and 53 minute(s): Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150309T1600) [15:06:35] anomie: ^ ’tis baaaack [15:07:11] YuviPanda: And it's got the wrong timezone :/ [15:07:17] yup [15:07:48] Looks like GIGO [15:08:01] heh [15:08:27] jouncebot: reload [15:08:48] jouncebot: refresh [15:08:48] I refreshed my knowledge about deployments. [15:08:54] jouncebot: next [15:08:54] In 4 hour(s) and 51 minute(s): Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150309T2000) [15:08:59] jouncebot: current [15:09:30] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-hbs-mkd] - 10https://gerrit.wikimedia.org/r/195264 (https://phabricator.wikimedia.org/T89936) [15:10:07] (03PS1) 10Yuvipanda: tools: Alert Tim as well when things go wrong [puppet] - 10https://gerrit.wikimedia.org/r/195265 (https://phabricator.wikimedia.org/T91978) [15:10:22] (03PS2) 10Yuvipanda: tools: Alert Tim as well when things go wrong [puppet] - 10https://gerrit.wikimedia.org/r/195265 (https://phabricator.wikimedia.org/T91978) [15:10:40] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Alert Tim as well when things go wrong [puppet] - 10https://gerrit.wikimedia.org/r/195265 (https://phabricator.wikimedia.org/T91978) (owner: 10Yuvipanda) [15:21:31] Is SWAT happening? Very quiet. [15:23:12] marktraceur: I think you’re like, an hour late [15:23:15] maybe [15:23:31] time change? :) [15:24:16] 20 minutes. But whatever. [15:29:01] !log thcipriani Synchronized php-1.25wmf20/extensions/CirrusSearch/: morning swat (duration: 00m 08s) [15:29:06] marktraceur: Well, I see the patch was merged. I think thcipriani and ^d are doing a hangout as SWAT training. [15:29:08] <^d> marktraceur: I'm walking thcipriani through it [15:29:11] Logged the message, Master [15:29:26] Ah. [15:30:23] (03CR) 10Cmjohnson: [C: 032] added mw2136-mw2148 [puppet] - 10https://gerrit.wikimedia.org/r/194921 (owner: 10Papaul) [15:30:50] papaul: merged [15:31:10] thanks [15:31:23] thcipriani: one of us, one of us! [15:32:18] PROBLEM - puppet last run on cp1048 is CRITICAL: CRITICAL: Puppet has 1 failures [15:32:40] (03Abandoned) 10Alexandros Kosiaris: zotero: require 'firefox'; get rid of shell wrapper [puppet] - 10https://gerrit.wikimedia.org/r/192016 (owner: 10Ori.livneh) [15:33:08] <^d> mw1119 has a full tc cache [15:33:11] <^d> needs hhvm kick [15:33:21] (03PS3) 10Alexandros Kosiaris: Include the zotero role in the sca role [puppet] - 10https://gerrit.wikimedia.org/r/195041 (https://phabricator.wikimedia.org/T89869) [15:34:16] earldouglas: gerrit 194726 should be deployed. Could you please verify? [15:34:45] <^d> ori: Can you? ^^ mw1119? [15:35:00] ^d: I'm on it [15:35:06] <^d> thx [15:35:10] chasemp: just restarted it [15:35:22] !log restarted HHVM on mw1119; ^d reports TC cache full [15:35:27] ori you steal my thunder dude [15:35:27] Logged the message, Master [15:35:28] RECOVERY - HHVM rendering on mw1119 is OK: HTTP OK: HTTP/1.1 200 OK - 71512 bytes in 3.151 second response time [15:35:28] RECOVERY - Apache HTTP on mw1119 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 440 bytes in 0.207 second response time [15:35:36] nice! [15:35:49] <_joe_> ori: thanks! [15:35:57] <_joe_> ori: isn't it very early there? [15:36:03] 8:30 [15:36:11] <_joe_> oh, yes, DST changes [15:36:43] ask ^d, he's in the same timezone and already walking someone through a deployment [15:36:50] he's one of those morning people [15:37:10] <^d> 8:30 is not early, lol [15:37:11] thcipriani: on enwiki? [15:37:15] <^d> sf people just wake up crazy late [15:37:18] ^d hasn't gone to sleep yet [15:37:20] my theory [15:38:14] earldouglas: no, group0 wikis definitely. [15:38:38] Url? Sorry, new guy. [15:39:46] Oh, found it on http://www.mediawiki.org/wiki/MediaWiki_1.25/Roadmap#Schedule_for_the_deployments [15:39:54] <^d> earldouglas: Test on mw.org? [15:40:03] <^d> group0 includes test, test2, mw.org, zerowiki [15:40:12] Thanks, stand by for test... [15:40:57] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [15:41:20] (03PS4) 10Alexandros Kosiaris: Include the zotero role in the sca role [puppet] - 10https://gerrit.wikimedia.org/r/195041 (https://phabricator.wikimedia.org/T89869) [15:41:28] OMG THE CLONING SUCCEEDED, WE HAVE A greg-g2 [15:41:34] thcipriani: looks good from here. [15:42:34] YuviPanda: sadly yes, my server is unresponsive (and I don't have mgmt access, but my buddy who does has been notified, we share the box) [15:43:14] earldouglas: in that case swat complete. Huzzah! [15:43:20] \o/ [15:43:38] thcipriani: how horrified are you? [15:43:45] at the multiwiki / deployment stuff [15:44:59] honestly, I was definitely surprised when I first saw it, but I know there are already improvements in the works. [15:45:35] very political, i like [15:45:36] pushing for a more unified and automated deployment I think is a Good Thing™ [15:47:40] wait what.. is swat happening an hour earlier? [15:47:56] oh.. DST silliness [15:48:16] is SWAT over or can I get one more deployed? [15:48:43] ori: He had to review the scap codebase as an interview task remember. We were screening for "run screaming away from the darkness". [15:49:46] (03CR) 10Alexandros Kosiaris: [C: 032] Include the zotero role in the sca role [puppet] - 10https://gerrit.wikimedia.org/r/195041 (https://phabricator.wikimedia.org/T89869) (owner: 10Alexandros Kosiaris) [15:50:07] RECOVERY - puppet last run on cp1048 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [15:50:22] ^d: ^ [15:50:53] <^d> Already over [15:51:04] aww.. I hate DST [15:51:09] ditto [15:51:09] <^d> So do I :) [15:51:49] I’m ok with it [15:51:55] ops meeting finishes at 12:30 instead of 1:30 :P [15:51:57] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:52:31] DST almost makes it worth moving to Arizona [15:53:18] so... what time is the train deploy this week? :) [15:53:28] <^d> Whenever twentyafterfour does it [15:53:53] the lua thing actually calculates this [15:54:00] on the deployments page [15:54:26] hrmmm, I should double check that [15:54:29] things are pinned to SF time [15:55:01] looks sane :) [15:55:09] says utc+1 [15:55:11] for me [15:55:18] PROBLEM - puppet last run on sca1002 is CRITICAL: CRITICAL: Puppet has 2 failures [15:55:28] PROBLEM - puppet last run on sca1001 is CRITICAL: CRITICAL: Puppet has 2 failures [15:55:48] RECOVERY - Host restbase1006 is UP: PING WARNING - Packet loss = 50%, RTA = 1.50 ms [15:56:05] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/195275 (https://phabricator.wikimedia.org/T89936) [15:56:16] greg-g2: wat. AZ is now the same as CA. I know because I'm losing a very productive hour. :-) [15:56:17] audo 11am Pacific for me, so yeah [15:56:31] audo? heh, aude^ [15:57:08] chrismcmahon: right, just the fact of not doing DST almost makes moving there a good idea :) [15:57:17] oic [15:57:46] greg-g2: go for it :) [15:57:54] almost [15:58:17] PROBLEM - Cassandra database on restbase1006 is CRITICAL: Timeout while attempting connection [15:58:17] PROBLEM - Disk space on restbase1006 is CRITICAL: Timeout while attempting connection [15:58:38] PROBLEM - SSH on restbase1006 is CRITICAL: Connection timed out [15:58:38] PROBLEM - dhclient process on restbase1006 is CRITICAL: Timeout while attempting connection [15:58:48] PROBLEM - DPKG on restbase1006 is CRITICAL: Timeout while attempting connection [15:58:58] PROBLEM - puppet last run on restbase1006 is CRITICAL: Timeout while attempting connection [15:59:17] PROBLEM - configured eth on restbase1006 is CRITICAL: Timeout while attempting connection [15:59:17] PROBLEM - salt-minion processes on restbase1006 is CRITICAL: Timeout while attempting connection [16:02:40] PROBLEM - Host restbase1006 is DOWN: PING CRITICAL - Packet loss = 100% [16:03:02] PROBLEM - zotero on sca1002 is CRITICAL: Connection refused [16:04:01] PROBLEM - zotero on sca1001 is CRITICAL: Connection refused [16:04:35] akosiaris: \o/ 0 cherry-picks on beta again! Thanks :) [16:05:04] YuviPanda: told ya I would do it :-) [16:05:36] apergos: :D [16:05:37] err [16:05:38] akosiaris: :D [16:07:42] (03PS1) 10KartikMistry: Added initial Debian packaging [debs/contenttranslation/apertium-mk-bg] - 10https://gerrit.wikimedia.org/r/195284 (https://phabricator.wikimedia.org/T89936) [16:09:31] RECOVERY - puppet last run on sca1002 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [16:10:11] RECOVERY - puppet last run on sca1001 is OK: OK: Puppet is currently enabled, last run 54 seconds ago with 0 failures [16:11:05] gwicke: mobrovac: any reason puppet is disabled for 10 days on cerium|praseodymium|xenon ? puppet agent says: Reason: 'reason not specified' [16:11:29] (03PS2) 10Hashar: contint: disable hhvm stacktraces / map [puppet] - 10https://gerrit.wikimedia.org/r/195035 (https://phabricator.wikimedia.org/T64788) [16:12:30] akosiaris: as there are test boxes, we use them to test various cassandra configs and to fill the prod db, so i guess gwicke simply disabled it [16:12:36] even though 10 days seems a bit long [16:12:38] akosiaris: few packages in review when you're free. [16:13:16] springle: Can you look at, https://gerrit.wikimedia.org/r/#/c/194046 and estimate when DB in production can/will be updated? [16:13:22] springle: Thanks :) [16:14:01] RECOVERY - zotero on sca1002 is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.015 second response time [16:14:34] (03PS1) 10Hashar: Make zuul uses master branch on both prod and labs [puppet] - 10https://gerrit.wikimedia.org/r/195287 [16:15:01] RECOVERY - zotero on sca1001 is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.014 second response time [16:16:03] (03PS2) 10Hashar: Make zuul uses master branch on both prod and labs [puppet] - 10https://gerrit.wikimedia.org/r/195287 (https://phabricator.wikimedia.org/T91984) [16:16:25] <_joe_> akosiaris: wow [16:16:48] mobrovac: yes 10 days is a bit log, which is why I am asking. I 'd like to enable puppet on them today. I 'll wait for gwicke to get online and ask him so I can enable it. [16:20:04] k, he should be in soon [16:21:39] (03PS1) 10Faidon Liambotis: Add asw-esams to rancid, smokeping, torrus [puppet] - 10https://gerrit.wikimedia.org/r/195289 [16:22:07] akosiaris: we'll need to tweak the config for those test boxes first before we can re-enable puppet there [16:23:41] <_joe_> 10 days is definitely too long [16:24:08] (03CR) 10Faidon Liambotis: [C: 032] Add asw-esams to rancid, smokeping, torrus [puppet] - 10https://gerrit.wikimedia.org/r/195289 (owner: 10Faidon Liambotis) [16:26:23] gwicke: tweak ? [16:26:35] as in for those boxes specifically ? [16:26:37] akosiaris: in a meeting now, bbiab [16:26:41] ok [16:26:44] heap needs to be smaller [16:27:20] (03PS3) 10Alexandros Kosiaris: LVS configuration for zotero service [puppet] - 10https://gerrit.wikimedia.org/r/191938 (https://phabricator.wikimedia.org/T89867) (owner: 10Dzahn) [16:28:01] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [16:32:20] (03CR) 10Alexandros Kosiaris: [C: 032] LVS configuration for zotero service [puppet] - 10https://gerrit.wikimedia.org/r/191938 (https://phabricator.wikimedia.org/T89867) (owner: 10Dzahn) [16:35:17] (03CR) 10Jforrester: "> The VE is enabled for beta-test on the Main Namespace.I think it can be supported to the Draft Namespace." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/193827 (https://phabricator.wikimedia.org/T91223) (owner: 10Gerrit Patch Uploader) [16:40:34] (03PS1) 10Ottomata: Set up proxy on misc web to serve Hadoop Yarn ResourceManager UI at yarn.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/195298 [16:40:52] (03PS2) 10Ottomata: Set up proxy on misc web to serve Hadoop Yarn ResourceManager UI at yarn.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/195298 (https://phabricator.wikimedia.org/T83601) [16:41:29] did pybal just got restarted? [16:41:51] akosiaris: hm, that zotero change I guess? akosiaris, was that you? [16:43:32] paravoid: yup [16:43:39] please !log that [16:43:50] it seems you fixed something else with that [16:43:59] Or caused it? [16:44:00] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 13.33% of data above the critical threshold [500.0] [16:44:01] we also got a huge 503 spike [16:44:05] that^ :) [16:44:06] We appear to have been serving 503s for a minute there [16:44:09] probably caused it [16:44:16] it fixed the issue that mw1062 didnt get any traffic [16:44:25] even though it was enabled before [16:44:27] (03CR) 10Ottomata: [C: 032] Set up proxy on misc web to serve Hadoop Yarn ResourceManager UI at yarn.wikimedia.org [puppet] - 10https://gerrit.wikimedia.org/r/195298 (https://phabricator.wikimedia.org/T83601) (owner: 10Ottomata) [16:44:27] !log restarting pybal on lvs1003, lvs1006 for LVS zotero change [16:44:28] Ahm [16:44:31] Logged the message, Master [16:44:37] Isn't pybal supposed to be protected against this? [16:44:46] Or did akosiaris not follow the documented procedure for restarting pybal? [16:45:17] I doubt that [16:45:23] it is like service pybal restart [16:45:43] How long did you wait between restarting it on lvs1003 and lvs1006? [16:45:52] about a 30 secs [16:45:55] argh [16:45:56] don't do that [16:46:08] mark ? the 30 secs ? [16:46:14] that's rather short isn't it? [16:46:30] it made this work: http://ganglia.wikimedia.org/latest/graph.php?r=hour&z=xlarge&h=mw1062.eqiad.wmnet&m=cpu_report&s=descending&mc=2&g=network_report&c=Application+servers+eqiad [16:46:36] akosiaris: There's like 1.5 page of documentation urging people to be careful while restarting pybal [16:46:37] akosiaris: https://wikitech.wikimedia.org/wiki/LVS#Deploy_a_change_to_an_existing_service [16:47:55] RoanKattouw: I do all of those anyway without having to look into that page [16:48:21] PROBLEM - puppet last run on cp1044 is CRITICAL: CRITICAL: Puppet has 1 failures [16:48:31] PROBLEM - puppet last run on cp1043 is CRITICAL: CRITICAL: Puppet has 1 failures [16:48:39] (03PS1) 10Ottomata: Add analytics1001 as misc varnish backend [puppet] - 10https://gerrit.wikimedia.org/r/195299 (https://phabricator.wikimedia.org/T83601) [16:48:43] evidently not akosiaris :P [16:49:03] ^^ my fault, fixing [16:49:25] no I actually did. Apart from not waiting perhaps enough [16:49:58] (03CR) 10Ottomata: [C: 032] Add analytics1001 as misc varnish backend [puppet] - 10https://gerrit.wikimedia.org/r/195299 (https://phabricator.wikimedia.org/T83601) (owner: 10Ottomata) [16:50:20] I do know for example I have a problem with the crappy zotero thing... [16:50:31] cause pybal can't monitor nicely right now [16:51:17] and of course zotero is to blame, not pybal :-( [16:51:41] RECOVERY - puppet last run on cp1044 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [16:51:50] PROBLEM - puppet last run on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:51] PROBLEM - configured eth on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:51:51] RECOVERY - puppet last run on cp1043 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [16:52:00] PROBLEM - dhclient process on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:52:00] PROBLEM - RAID on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:52:10] PROBLEM - salt-minion processes on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:52:20] PROBLEM - DPKG on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:52:30] PROBLEM - Disk space on rhenium is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:52:51] RECOVERY - configured eth on rhenium is OK: NRPE: Unable to read output [16:52:51] RECOVERY - puppet last run on rhenium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [16:52:52] RECOVERY - dhclient process on rhenium is OK: PROCS OK: 0 processes with command name dhclient [16:53:00] RECOVERY - RAID on rhenium is OK: OK: Active: 6, Working: 6, Failed: 0, Spare: 0 [16:53:01] RECOVERY - salt-minion processes on rhenium is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [16:53:12] RECOVERY - DPKG on rhenium is OK: All packages OK [16:53:21] RECOVERY - Disk space on rhenium is OK: DISK OK [16:54:23] (03CR) 10Alex Monk: "What performance loss? Are you going to schedule this for swat?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/112590 (https://phabricator.wikimedia.org/T60583) (owner: 10Jforrester) [16:57:10] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [17:00:31] RECOVERY - Host cp1047 is UP: PING OK - Packet loss = 0%, RTA = 0.92 ms [17:01:17] (03CR) 10Anomie: mediawiki: add configs to support the Dallas DC (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [17:09:58] (03PS3) 10Krinkle: Make zuul use master branch on both prod and labs [puppet] - 10https://gerrit.wikimedia.org/r/195287 (https://phabricator.wikimedia.org/T91984) (owner: 10Hashar) [17:10:06] greg-g: I wonder if we should add a "SF" timezone to the input on the Deployments page, to avoid https://wikitech.wikimedia.org/w/index.php?title=Deployments&diff=147500&oldid=147472 [17:10:08] (03PS1) 10Alexandros Kosiaris: Kill the ProxyFetch monitor for zotero LVS [puppet] - 10https://gerrit.wikimedia.org/r/195301 [17:13:04] (03PS1) 10Dzahn: delete etherpad SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195303 (https://phabricator.wikimedia.org/T85788) [17:13:26] anomie: ahh, that's why the page looked right this morning when I looked :) [17:13:44] anomie: thanks for fixing, and yeah, should be easy enough to add to the lua template [17:15:27] anomie: hmmm, maybe "easy" /me is looking [17:16:02] (03PS1) 10Dzahn: delete metrics.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195304 (https://phabricator.wikimedia.org/T73156) [17:16:15] i am getting 504 gateway timeout to phab [17:16:56] and now this from the app: [17:16:57] >>> UNRECOVERABLE FATAL ERROR <<< [17:16:57] Maximum execution time of 30 seconds exceeded [17:16:57] /core/lib/libphutil/src/parser/PhutilTypeSpec.php:1909 [17:16:57] jgage: SyntaxError: Unexpected identifier [17:17:13] ok, random bot [17:18:13] >>> console.log(‘hi jgage’) [17:18:13] YuviPanda: SyntaxError: Unexpected token ILLEGAL [17:18:17] heh [17:18:23] > console.log(‘hi jgage’) [17:18:28] heh [17:20:19] >>> console.log( 'hi YuviPanda' ); [17:20:20] RoanKattouw: undefined; Console: 'hi YuviPanda' [17:20:21] (03PS1) 10Ottomata: Add yarn.wikimedia.org to DNS [dns] - 10https://gerrit.wikimedia.org/r/195305 (https://phabricator.wikimedia.org/T83601) [17:20:26] Semicolons, it's a thing [17:20:45] RoanKattouw: I’m somewhat happy that I haven’t written JS long enough to forget that :D [17:20:53] >>> console.log ('>>> console.log [17:20:53] mutante: SyntaxError: Unexpected token ILLEGAL [17:21:13] (03CR) 10Ottomata: [C: 032] Add yarn.wikimedia.org to DNS [dns] - 10https://gerrit.wikimedia.org/r/195305 (https://phabricator.wikimedia.org/T83601) (owner: 10Ottomata) [17:21:54] (03PS1) 10Dzahn: delete *.planet.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195306 (https://phabricator.wikimedia.org/T85789) [17:22:35] PROBLEM - Slow CirrusSearch query rate on fluorine is CRITICAL: CirrusSearch-slow.log_line_rate CRITICAL: 0.00333333333333 [17:24:20] (03CR) 10RobH: [C: 031] "death to one-off certs \o/" [puppet] - 10https://gerrit.wikimedia.org/r/195306 (https://phabricator.wikimedia.org/T85789) (owner: 10Dzahn) [17:29:21] (03PS1) 10Dzahn: delete bugzilla SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/195307 (https://phabricator.wikimedia.org/T85785) [17:30:45] (03CR) 10Alexandros Kosiaris: [C: 032] Kill the ProxyFetch monitor for zotero LVS [puppet] - 10https://gerrit.wikimedia.org/r/195301 (owner: 10Alexandros Kosiaris) [17:31:32] (03PS1) 10Dzahn: delete blog SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195308 (https://phabricator.wikimedia.org/T73156) [17:32:45] RECOVERY - Slow CirrusSearch query rate on fluorine is OK: CirrusSearch-slow.log_line_rate OKAY: 0.0 [17:33:07] (03PS1) 10Dzahn: delete stats.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195309 (https://bugzilla.wikimedia.org/73156) [17:33:35] greg-g: There you go. [17:34:35] looks like i failed to entice anyone else to troubleshoot phab, i'll take a look ;) [17:35:26] hmmm it's working for robh [17:35:46] but i'm still getting 502 and 504 and that internal erorr? [17:35:55] guess i'll restart my browser [17:36:13] (03PS1) 10Dzahn: delete svn.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195310 (https://phabricator.wikimedia.org/T88731) [17:37:25] looks like my session was hosed somehow [17:37:31] now it's loading for me and i'm logged out [17:37:38] (03PS2) 10Dzahn: delete blog SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/195308 (https://phabricator.wikimedia.org/T73156) [17:39:08] !log restarting pybal on lvs1006 to pick up https://gerrit.wikimedia.org/r/195301 [17:39:10] paravoid: ^ [17:39:11] Logged the message, Master [17:39:16] (03PS5) 10Jforrester: Disable 'beta' label in tab for the VE opt-in wiki (enwiki) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/112590 (https://phabricator.wikimedia.org/T60583) [17:39:21] (03CR) 10RobH: [C: 04-2] "do not delete blog.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/195308 (https://phabricator.wikimedia.org/T73156) (owner: 10Dzahn) [17:39:32] mutante: dont delete blog.w.o [17:39:37] we still use it on automattic [17:39:48] and our git repos are the safest place for us to store our copies [17:40:10] robh: ! ok [17:40:16] techblog.w.o is depreciated though so it can totally go [17:40:19] in fact its expired =] [17:40:23] (and revoked) [17:40:30] right, ok [17:40:37] i should have pulled it when i did that, my bad. [17:40:41] should it still have one bug per cert? [17:40:41] thx for cleaning it up ] [17:41:04] what do you mean? [17:41:36] ahh, one task per revocation request, yes [17:41:47] also, if its rapidssl, anyone with dns-admin@w.o email can revoke [17:42:04] i can update a task with the instructions on next request [17:42:25] ok, i'll make a ticket for each, also so we don't forget to nuke the keys etc [17:42:36] cool [17:42:38] thx dude [17:42:42] yw [17:43:56] also, i looked now because of the "Replace SHA1 certificates with SHA256" [17:44:08] that makes that tasks smaller [17:55:23] (03PS1) 10BBlack: depool amssq31 for testing [puppet] - 10https://gerrit.wikimedia.org/r/195314 [17:55:36] (03CR) 10BBlack: [C: 032 V: 032] depool amssq31 for testing [puppet] - 10https://gerrit.wikimedia.org/r/195314 (owner: 10BBlack) [17:58:09] !log restarting pybal on lvs1003 to pick up https://gerrit.wikimedia.org/r/195301 [17:58:12] Logged the message, Master [18:00:52] (03PS1) 10Ori.livneh: Add stub LVSService object and two tests for IdleConnectionMonitoringProtocol [debs/pybal] - 10https://gerrit.wikimedia.org/r/195318 [18:04:28] PROBLEM - puppet last run on amssq54 is CRITICAL: CRITICAL: Puppet has 1 failures [18:08:03] (03CR) 10Ori.livneh: [C: 032] "Change only adds unit tests" [debs/pybal] - 10https://gerrit.wikimedia.org/r/195318 (owner: 10Ori.livneh) [18:08:23] (03Merged) 10jenkins-bot: Add stub LVSService object and two tests for IdleConnectionMonitoringProtocol [debs/pybal] - 10https://gerrit.wikimedia.org/r/195318 (owner: 10Ori.livneh) [18:25:05] RECOVERY - puppet last run on amssq54 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [18:25:32] (03PS2) 10Krinkle: delete stats.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195309 (https://phabricator.wikimedia.org/T73156) (owner: 10Dzahn) [18:27:00] (03PS1) 10Yuvipanda: mysql: Cleanup mysql::param [puppet] - 10https://gerrit.wikimedia.org/r/195321 [18:29:56] (03CR) 10Chmarkine: [C: 031] delete etherpad SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195303 (https://phabricator.wikimedia.org/T85788) (owner: 10Dzahn) [18:33:26] (03CR) 10Chmarkine: [C: 031] delete stats.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195309 (https://phabricator.wikimedia.org/T73156) (owner: 10Dzahn) [18:35:15] (03CR) 10Chmarkine: [C: 031] delete bugzilla SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/195307 (https://phabricator.wikimedia.org/T85785) (owner: 10Dzahn) [18:36:32] (03PS1) 10Thcipriani: Add version to mariadb package resource [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/195328 [18:37:28] (03CR) 10Chmarkine: [C: 031] delete metrics.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195304 (https://phabricator.wikimedia.org/T73156) (owner: 10Dzahn) [18:38:08] (03CR) 10Yuvipanda: "What is the conflict? why does it require force?" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/195328 (owner: 10Thcipriani) [18:39:28] (03CR) 10Yuvipanda: [C: 04-1] Add /etc/mysql dir before linking inside it (031 comment) [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/194925 (owner: 10Thcipriani) [18:39:38] (03CR) 10Ori.livneh: mediawiki: add configs to support the Dallas DC (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [18:46:27] * aude wonder if gerrit (from europe) is broken again, in regards to uploading patches [18:48:02] proxying via the us worked [18:56:17] (03CR) 10Chmarkine: "Isn't this cert still in use, since the general cert *.wikimedia.org doesn't match *.planet.wikimedia.org?" [puppet] - 10https://gerrit.wikimedia.org/r/195306 (https://phabricator.wikimedia.org/T85789) (owner: 10Dzahn) [19:02:01] (03CR) 10Yuvipanda: [C: 04-1] Labs: Puppetize labstore1003 (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/194395 (owner: 10coren) [19:02:47] (03CR) 10Dzahn: [C: 04-2] "chmarkine, you are right. it should be removed from zirconium and also from here: modules/planet/manifests/webserver.pp but it is still in" [puppet] - 10https://gerrit.wikimedia.org/r/195306 (https://phabricator.wikimedia.org/T85789) (owner: 10Dzahn) [19:04:34] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: puppet fail [19:16:40] (03PS1) 10BBlack: depool cp3003 -> reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195334 [19:16:56] (03CR) 10BBlack: [C: 032 V: 032] depool cp3003 -> reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195334 (owner: 10BBlack) [19:19:48] (03PS1) 10Thcipriani: Combine deployment_server roles [puppet] - 10https://gerrit.wikimedia.org/r/195336 [19:19:59] (03PS1) 10Aaron Schulz: Added the jobchron daemon that complements jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/195337 [19:21:59] (03CR) 10Ori.livneh: Added the jobchron daemon that complements jobrunner (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/195337 (owner: 10Aaron Schulz) [19:22:24] PROBLEM - Host cp3003 is DOWN: PING CRITICAL - Packet loss = 100% [19:23:35] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:23:50] (03CR) 10Aaron Schulz: Added the jobchron daemon that complements jobrunner (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195337 (owner: 10Aaron Schulz) [19:24:11] (03CR) 10Ori.livneh: Added the jobchron daemon that complements jobrunner (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/195337 (owner: 10Aaron Schulz) [19:24:34] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 6.67% of data above the critical threshold [500.0] [19:25:48] (03PS2) 10Aaron Schulz: Added the jobchron daemon that complements jobrunner [puppet] - 10https://gerrit.wikimedia.org/r/195337 [19:29:58] (03PS1) 10Chmarkine: Enable HSTS on dev.wm.org max-age=7 days [puppet] - 10https://gerrit.wikimedia.org/r/195338 (https://phabricator.wikimedia.org/T40516) [19:31:33] (03PS2) 10Chmarkine: Enable HSTS on dev.wm.org max-age=7 days [puppet] - 10https://gerrit.wikimedia.org/r/195338 (https://phabricator.wikimedia.org/T40516) [19:36:44] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [19:40:44] (03PS1) 10Yuvipanda: deployment: Combine labs/prod deployment server roles [puppet] - 10https://gerrit.wikimedia.org/r/195340 [19:40:49] thcipriani: ^ [19:41:15] * thcipriani looks [19:43:01] thcipriani: am testing it on tin atm [19:43:05] not sure what to do for the deployer key [19:43:43] (03PS2) 10Yuvipanda: deployment: Combine labs/prod deployment server roles [puppet] - 10https://gerrit.wikimedia.org/r/195340 [19:45:52] YuviPanda: put it in the private repo? [19:46:02] heh, the ‘private’ repo [19:46:07] oh [19:46:10] the real private one [19:46:13] by ‘tin’ I meant staging-tin [19:46:19] oh! [19:46:26] yea, i thought tin [19:46:59] put a fake one into labs/private? [19:47:07] so you can test [19:47:49] yeah [19:47:51] probably [19:48:51] yes, that's what it's for, snakeoil.key [19:50:14] (03PS1) 10Ottomata: Don't send metric to statsd if the value is non-numeric. [debs/logster] - 10https://gerrit.wikimedia.org/r/195344 (https://phabricator.wikimedia.org/T91464) [19:51:10] mutante: oh, there’s already one? [19:51:50] (03PS1) 10BBlack: repool cp3003 [puppet] - 10https://gerrit.wikimedia.org/r/195345 [19:52:00] (03PS1) 10Chmarkine: Enable HSTS on tendril with max-age=7days [puppet] - 10https://gerrit.wikimedia.org/r/195346 (https://phabricator.wikimedia.org/T40516) [19:52:02] (03CR) 10BBlack: [C: 032 V: 032] repool cp3003 [puppet] - 10https://gerrit.wikimedia.org/r/195345 (owner: 10BBlack) [19:52:37] YuviPanda: yay!?:) [19:52:45] yeah :) [19:52:48] :) [19:55:38] (03PS1) 10Mobrovac: Enable edit updates for RESTBase on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195348 (https://phabricator.wikimedia.org/T87520) [19:57:13] (03PS1) 10Dzahn: planet apache: remove SSL cert and mod_ssl [puppet] - 10https://gerrit.wikimedia.org/r/195349 [19:58:17] (03PS2) 10Dzahn: planet apache: don't install SSL cert and mod_ssl [puppet] - 10https://gerrit.wikimedia.org/r/195349 [19:59:15] (03PS2) 10Mobrovac: Enable edit updates for RESTBase on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195348 (https://phabricator.wikimedia.org/T87520) [19:59:25] (03CR) 10Ottomata: [C: 032 V: 032] Don't send metric to statsd if the value is non-numeric. [debs/logster] - 10https://gerrit.wikimedia.org/r/195344 (https://phabricator.wikimedia.org/T91464) (owner: 10Ottomata) [19:59:48] (03CR) 10Dzahn: [C: 032] planet apache: don't install SSL cert and mod_ssl [puppet] - 10https://gerrit.wikimedia.org/r/195349 (owner: 10Dzahn) [20:00:04] gwicke, cscott, arlolra, subbu: Dear anthropoid, the time has come. Please deploy Parsoid/OCG (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150309T2000). [20:00:48] you don't always have to do V: 2 , give jenkins a chance :) [20:01:59] sorry, I encourage bad habits! I'm pushing through a ton of trivial pool/depool commits lately and I never wait for jenkins on them [20:02:07] not a great idea in the general case :) [20:02:31] (03CR) 10GWicke: [C: 031] "LGTM." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195348 (https://phabricator.wikimedia.org/T87520) (owner: 10Mobrovac) [20:02:44] (03PS1) 10BBlack: depool cp4002,cp4013 [puppet] - 10https://gerrit.wikimedia.org/r/195352 [20:02:54] (03PS1) 10Ottomata: Bump to version 0.0.10 [debs/logster] - 10https://gerrit.wikimedia.org/r/195353 [20:03:02] (03CR) 10BBlack: [C: 032 V: 032] depool cp4002,cp4013 [puppet] - 10https://gerrit.wikimedia.org/r/195352 (owner: 10BBlack) [20:03:05] (03CR) 10Ottomata: [C: 032 V: 032] Bump to version 0.0.10 [debs/logster] - 10https://gerrit.wikimedia.org/r/195353 (owner: 10Ottomata) [20:03:39] subbu: do you think I can use some of your window to deploy a config change for restbase updates? [20:03:44] /cc greg-g [20:03:52] re: https://gerrit.wikimedia.org/r/#/c/195348/ [20:04:13] fine with me .. arlolra is deploying and i think it shouldn't take more than 15-20 mins. [20:05:08] minions are fetching [20:05:26] (03CR) 10JanZerebecki: [C: 031] Enable HSTS on dev.wm.org max-age=7 days [puppet] - 10https://gerrit.wikimedia.org/r/195338 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [20:06:54] subbu, arlolra: cool, let us know when you are ready [20:07:01] PROBLEM - puppet last run on db2009 is CRITICAL: CRITICAL: Puppet has 1 failures [20:09:02] PROBLEM - Host cp4002 is DOWN: PING CRITICAL - Packet loss = 100% [20:09:55] !log updated Parsoid to version c8370a480636c3a0d47ed5090dd29efcb72591e2 [20:10:00] Logged the message, Master [20:10:47] (03PS1) 10Ottomata: Fix for logrotate, bump debian/changelog to 0.0.10 [debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/195356 [20:11:32] (03CR) 10Ottomata: [C: 032 V: 032] Fix for logrotate, bump debian/changelog to 0.0.10 [debs/logster] (debian) - 10https://gerrit.wikimedia.org/r/195356 (owner: 10Ottomata) [20:11:52] PROBLEM - Host cp4013 is DOWN: PING CRITICAL - Packet loss = 100% [20:12:02] gwicke: (sure) [20:12:39] greg-g: ok, thx [20:12:44] (03PS3) 10Yuvipanda: deployment: Combine labs/prod deployment server roles [puppet] - 10https://gerrit.wikimedia.org/r/195340 [20:12:45] mobrovac: I'll +2 then [20:12:54] (03CR) 10GWicke: [C: 032] Enable edit updates for RESTBase on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195348 (https://phabricator.wikimedia.org/T87520) (owner: 10Mobrovac) [20:14:12] RECOVERY - Host cp4002 is UP: PING OK - Packet loss = 0%, RTA = 73.10 ms [20:14:28] (03CR) 10Dzahn: [C: 031] "yes. you might think this should be done with the global "$ssl_settings" from ssl_ciphersuite as we already do in a couple places like ($s" [puppet] - 10https://gerrit.wikimedia.org/r/195338 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [20:15:12] PROBLEM - HTTP 5xx req/min on graphite1001 is CRITICAL: CRITICAL: 7.14% of data above the critical threshold [500.0] [20:18:22] RECOVERY - Host cp4013 is UP: PING OK - Packet loss = 0%, RTA = 80.79 ms [20:19:18] (03PS2) 10JanZerebecki: Enable HSTS on tendril with max-age=7days [puppet] - 10https://gerrit.wikimedia.org/r/195346 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [20:20:08] bblack: if SSL for services terminates at misc-web nginx and we removed all the SSL config from backend apaches and then we wanted to set Headers for STS, would we still do that on the Apaches? (just needs mod_headers anyways) or would it belong where SSL terminates? [20:20:26] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [20:20:35] palladium was me, ignore [20:21:16] mutante: why would we remove SSL from the backend apaches? [20:21:56] PROBLEM - Host cp4013 is DOWN: PING CRITICAL - Packet loss = 100% [20:22:09] (but yes, for the misc-web case for now, apache should detect SSL via checking X-Forwarded-Proto == https, optionally enforce it with a redirect based on that if appropriate for the service, and optionally set HSTS if X-F-P is set) [20:22:37] I guess it doesn't matter either way for the internal misc-web stuff if SSL is gone from the actual apache [20:22:43] bblack: because the backends just talk http to the frontends so it's an unused thing that would just have to be maintained/updated [20:22:45] so long as varnish is the only thing consuming [20:22:53] (03CR) 10GWicke: [V: 032] Enable edit updates for RESTBase on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195348 (https://phabricator.wikimedia.org/T87520) (owner: 10Mobrovac) [20:23:08] there could be some cases where other things might hit those backends directly is all [20:23:13] (03CR) 10JanZerebecki: [C: 031] Enable HSTS on tendril with max-age=7days [puppet] - 10https://gerrit.wikimedia.org/r/195346 (https://phabricator.wikimedia.org/T40516) (owner: 10Chmarkine) [20:24:01] bblack: that makes sense re: "optionally set HSTS if X-F-P is set" and we do the protocol redirect. thanks [20:24:35] RECOVERY - puppet last run on db2009 is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [20:24:46] (03PS4) 10Yuvipanda: deployment: Combine labs/prod deployment server roles [puppet] - 10https://gerrit.wikimedia.org/r/195340 [20:24:56] RECOVERY - Host cp4013 is UP: PING OK - Packet loss = 0%, RTA = 83.79 ms [20:26:31] mutante: right, basically if X-F-P isn't set, redirect. if it is set, set HSTS [20:27:14] (03PS2) 10RobH: add eevans to restbase/cassandra roots and deployers [puppet] - 10https://gerrit.wikimedia.org/r/194987 (https://phabricator.wikimedia.org/T91134) (owner: 10Dzahn) [20:28:31] !log mobrovac Synchronized wmf-config/InitialiseSettings.php: Enable the RESTBaseUpdateJobs extension on testwiki (duration: 00m 06s) [20:28:34] Logged the message, Master [20:28:36] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: Puppet last ran 4 hours ago [20:28:42] RewriteCond %{HTTP:X-Forwarded-Proto} !https [20:28:47] RewriteRule ^/(.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R=301,E=ProtoRedirect] [20:28:58] Header always merge Vary X-Forwarded-Proto [20:29:06] ^ it's the default snippet now for a couple of them [20:29:08] (03PS3) 10RobH: add eevans to restbase/cassandra roots and deployers [puppet] - 10https://gerrit.wikimedia.org/r/194987 (https://phabricator.wikimedia.org/T91134) (owner: 10Dzahn) [20:29:09] works fine [20:29:29] !log mobrovac Synchronized wmf-config/CommonSettings.php: Set the correct RESTBase server for the RESTBaseUpdateJobs extension (duration: 00m 07s) [20:29:32] Logged the message, Master [20:29:53] (03CR) 10RobH: [C: 032] add eevans to restbase/cassandra roots and deployers [puppet] - 10https://gerrit.wikimedia.org/r/194987 (https://phabricator.wikimedia.org/T91134) (owner: 10Dzahn) [20:29:58] and using Header already anyways, so the STS header fits there [20:33:21] thcipriani|afk: so the patch kindof became bigger and bigger, but now it gets quite far :) [20:33:26] I’m almost installing scap... [20:34:01] scap --self ? [20:34:46] (03PS5) 10Yuvipanda: deployment: Combine labs/prod deployment server roles [puppet] - 10https://gerrit.wikimedia.org/r/195340 [20:35:06] PROBLEM - puppet last run on amssq38 is CRITICAL: CRITICAL: Puppet has 1 failures [20:35:18] (03CR) 10Yuvipanda: [C: 04-2] "Ongoing, WIP patch." [puppet] - 10https://gerrit.wikimedia.org/r/195340 (owner: 10Yuvipanda) [20:35:22] * YuviPanda goes to sleep [20:36:15] RECOVERY - HTTP 5xx req/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:36:45] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [20:38:39] (03PS1) 10GWicke: Add restbase job runners [puppet] - 10https://gerrit.wikimedia.org/r/195364 [20:38:46] (03PS1) 10Papaul: added mw2149-mw2214 [puppet] - 10https://gerrit.wikimedia.org/r/195365 [20:42:34] (03PS1) 10Mobrovac: Exclude slow edit dependency jobs from the default queue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195366 [20:42:56] (03PS2) 10GWicke: Add restbase job runners [puppet] - 10https://gerrit.wikimedia.org/r/195364 [20:44:02] (03PS3) 10GWicke: Add restbase job runners [puppet] - 10https://gerrit.wikimedia.org/r/195364 [20:44:35] (03CR) 10Aaron Schulz: "Note that this doesn't do anything since the job runner always qualifies the type when it spawns works. We might just want to kill all of " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195366 (owner: 10Mobrovac) [20:45:52] (03CR) 10Aaron Schulz: "If the job is too slow to run at the end of web requests, you can set wgJobTypesExcludedFromDefaultQueue in the extension itself. It won't" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195366 (owner: 10Mobrovac) [20:48:52] (03CR) 10RobH: [C: 031] "don't forget the private key repo and shredding +1!" [puppet] - 10https://gerrit.wikimedia.org/r/195310 (https://phabricator.wikimedia.org/T88731) (owner: 10Dzahn) [20:51:29] (03CR) 10GWicke: "@Aaron, will setting it in the extension also push the jobs in dedicated queues in our setup?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195366 (owner: 10Mobrovac) [20:52:27] (03PS1) 10BBlack: repool cp4002,cp4013 [puppet] - 10https://gerrit.wikimedia.org/r/195373 [20:52:29] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [20:52:41] (03CR) 10BBlack: [C: 032 V: 032] repool cp4002,cp4013 [puppet] - 10https://gerrit.wikimedia.org/r/195373 (owner: 10BBlack) [20:52:59] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 3 unmerged changes in puppet (dir /var/lib/git/operations/puppet). [20:52:59] RECOVERY - puppet last run on amssq38 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [20:53:30] RECOVERY - Unmerged changes on repository puppet on palladium is OK: No changes to merge. [20:54:00] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [20:54:27] (03PS3) 10Dzahn: delete techblog.wm blog SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195308 (https://phabricator.wikimedia.org/T73156) [20:54:36] robh: ^ now just techblog but not blog [20:58:09] (03CR) 10RobH: [C: 031] "cool. I'd say 'dont forget to shred' but it doesn't exist on any hosts anymore! so 'dont forget the private key' heh ;]" [puppet] - 10https://gerrit.wikimedia.org/r/195308 (https://phabricator.wikimedia.org/T73156) (owner: 10Dzahn) [20:59:15] (03CR) 10Dzahn: "this though: https://gerrit.wikimedia.org/r/#/c/195349/" [puppet] - 10https://gerrit.wikimedia.org/r/195306 (https://phabricator.wikimedia.org/T85789) (owner: 10Dzahn) [20:59:24] (03Abandoned) 10Dzahn: delete *.planet.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195306 (https://phabricator.wikimedia.org/T85789) (owner: 10Dzahn) [21:02:46] (03CR) 10Mobrovac: [C: 031] Add restbase job runners [puppet] - 10https://gerrit.wikimedia.org/r/195364 (owner: 10GWicke) [21:02:59] (03PS7) 10coren: Labs: Puppetize labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/194395 [21:03:07] (03CR) 10Giuseppe Lavagetto: mediawiki: add configs to support the Dallas DC (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194830 (https://phabricator.wikimedia.org/T91754) (owner: 10Giuseppe Lavagetto) [21:06:02] (03PS1) 10Gage: Deploy IPsec on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/195394 [21:06:42] (03PS2) 10Gage: Deploy IPsec on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/195394 [21:07:45] (03CR) 10Gage: [C: 032] Deploy IPsec on cp1008 [puppet] - 10https://gerrit.wikimedia.org/r/195394 (owner: 10Gage) [21:08:43] (03PS1) 10Dzahn: planet: webserver class, no ports.conf, simplify [puppet] - 10https://gerrit.wikimedia.org/r/195410 [21:13:18] PROBLEM - Kafka Broker Messages In on analytics1021 is CRITICAL: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate CRITICAL: 810.748295073 [21:13:23] tsk tsk tsk [21:15:02] 6operations, 10ops-codfw, 3wikis-in-codfw: rack and setup rdb2001-2004 - https://phabricator.wikimedia.org/T92013#1101529 (10RobH) 3NEW a:3Papaul [21:15:39] (03PS1) 10RobH: rbd2001-2004 mgmt ip entries [dns] - 10https://gerrit.wikimedia.org/r/195437 [21:16:27] RECOVERY - Kafka Broker Messages In on analytics1021 is OK: kafka.server.BrokerTopicMetrics.AllTopicsMessagesInPerSec.FifteenMinuteRate OKAY: 4527.43383431 [21:16:43] (03PS2) 10RobH: rbd2001-2004 mgmt ip entries [dns] - 10https://gerrit.wikimedia.org/r/195437 [21:17:15] (03CR) 10RobH: [C: 032] rbd2001-2004 mgmt ip entries [dns] - 10https://gerrit.wikimedia.org/r/195437 (owner: 10RobH) [21:17:21] (03CR) 10Dzahn: [C: 032] planet: webserver class, no ports.conf, simplify [puppet] - 10https://gerrit.wikimedia.org/r/195410 (owner: 10Dzahn) [21:18:29] 6operations, 10ops-codfw, 3wikis-in-codfw: create mgmt dns entries for rbd2001-2004 asset tags - https://phabricator.wikimedia.org/T92015#1101562 (10RobH) 3NEW a:3Papaul [21:18:42] 6operations: create mgmt dns entries for rbd2001-2004 asset tags - https://phabricator.wikimedia.org/T92015#1101562 (10RobH) [21:21:48] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1101593 (10hashar) [21:22:22] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#489927 (10hashar) [21:22:23] 6operations: Provide dh-virtualenv 0.9 package on apt.wikimedia.org Precise distribution - https://phabricator.wikimedia.org/T91631#1101610 (10hashar) [21:26:05] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: Access to stat1003 for Niklas and Kartik - https://phabricator.wikimedia.org/T91625#1101632 (10RobH) One item was completely overlooked. Due to these being existing users, and not having signed the phabricator copy of our access responsibilities, we are... [21:26:23] (03PS1) 10Chmarkine: Enable HSTS on racktables with max-age=7days [puppet] - 10https://gerrit.wikimedia.org/r/195444 (https://phabricator.wikimedia.org/T40516) [21:32:54] (03PS4) 10Dzahn: delete techblog.wm blog SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195308 (https://phabricator.wikimedia.org/T73156) [21:35:39] (03CR) 10Dzahn: [C: 032] "@robh: deleted private key from private repo, and https://phabricator.wikimedia.org/T92021" [puppet] - 10https://gerrit.wikimedia.org/r/195308 (https://phabricator.wikimedia.org/T73156) (owner: 10Dzahn) [21:39:47] 6operations: revoke/delete SSL cert techblog.wikimedia.org - https://phabricator.wikimedia.org/T92021#1101688 (10Dzahn) [21:40:41] (03CR) 10John F. Lewis: [C: 031] delete bugzilla SSL certs [puppet] - 10https://gerrit.wikimedia.org/r/195307 (https://phabricator.wikimedia.org/T85785) (owner: 10Dzahn) [21:41:38] (03PS4) 10GWicke: Add restbase job runners [puppet] - 10https://gerrit.wikimedia.org/r/195364 [21:41:51] Aaron [21:42:04] eh, that search didn't actually search ;) [21:42:56] (03PS2) 10Dzahn: create shell account for Jeff Hobson [puppet] - 10https://gerrit.wikimedia.org/r/194806 (https://phabricator.wikimedia.org/T90624) [21:43:10] (03CR) 10GWicke: "@Aaron, neverming re exclusions: Added it in the puppet patch at https://gerrit.wikimedia.org/r/#/c/195364/." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195366 (owner: 10Mobrovac) [21:44:38] (03Abandoned) 10Mobrovac: Exclude slow edit dependency jobs from the default queue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195366 (owner: 10Mobrovac) [21:45:41] (03PS1) 10Gergő Tisza: Enable CORS support logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195450 (https://bugzilla.wikimedia.org/507) [21:51:56] (03PS3) 10RobH: create shell account for Jeff Hobson [puppet] - 10https://gerrit.wikimedia.org/r/194806 (https://phabricator.wikimedia.org/T90624) (owner: 10Dzahn) [21:53:34] (03CR) 10RobH: [C: 032] "The three day wait passed awhile ago, and patchset looks good. Merging." [puppet] - 10https://gerrit.wikimedia.org/r/194806 (https://phabricator.wikimedia.org/T90624) (owner: 10Dzahn) [21:54:47] (03CR) 10Thcipriani: "the output of the errors this patch fixes is available here: https://phabricator.wikimedia.org/P376" [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/195328 (owner: 10Thcipriani) [21:58:44] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: RESTBase deploy access and shell on Cassandra cluster for eevans - https://phabricator.wikimedia.org/T91134#1101922 (10RobH) [21:59:04] 10Ops-Access-Requests, 6operations, 5Patch-For-Review: RESTBase deploy access and shell on Cassandra cluster for eevans - https://phabricator.wikimedia.org/T91134#1101923 (10RobH) 5Open>3Resolved access approved and merged live. [22:04:48] PROBLEM - puppet last run on vanadium is CRITICAL: CRITICAL: Puppet has 1 failures [22:06:49] 6operations: Backport python-virtualenv 1.11.4 from Trusty to Precise - https://phabricator.wikimedia.org/T92033#1101944 (10hashar) 3NEW a:3hashar [22:10:50] 6operations: revoke/delete SSL cert techblog.wikimedia.org - https://phabricator.wikimedia.org/T92021#1101967 (10Dzahn) [22:11:20] 6operations, 6Labs, 10hardware-requests: eqiad: (5) virt nodes - https://phabricator.wikimedia.org/T89752#1101968 (10RobH) Quotes are back in and escalated to @mark for approvals process on https://rt.wikimedia.org/Ticket/Display.html?id=9249 [22:12:24] 6operations, 10ops-codfw: label/update mgmt & settings/test eventlog2001 - https://phabricator.wikimedia.org/T90909#1101970 (10RobH) 5Open>3Resolved [22:12:25] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1101972 (10RobH) [22:12:26] 6operations, 10hardware-requests: codfw/eqiad: (1) eventlogging node (per site) - eqiad done, codfw in progress - https://phabricator.wikimedia.org/T90747#1101971 (10RobH) [22:12:43] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1070870 (10RobH) network port ge-5/0/9 in a5-codfw [22:13:13] 6operations: Backport python-virtualenv 1.11.4 from Trusty to Precise - https://phabricator.wikimedia.org/T92033#1101974 (10hashar) I don't think there is any specific impact. One might want to verify which of our Precise servers have the package installed. If it is only the CI servers gallium / lanthanum, we ca... [22:13:33] (03PS1) 10BBlack: depool cp4018,cp1067 -> reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195455 [22:13:49] (03CR) 10BBlack: [C: 032 V: 032] depool cp4018,cp1067 -> reinstall [puppet] - 10https://gerrit.wikimedia.org/r/195455 (owner: 10BBlack) [22:15:50] (03CR) 10Dzahn: [C: 032] install fonts-unfonts-core, not just fonts-unfonts-extra [puppet] - 10https://gerrit.wikimedia.org/r/194828 (https://phabricator.wikimedia.org/T91685) (owner: 10Dzahn) [22:17:02] (03PS1) 10RobH: adding eventlog2001 install-server info [puppet] - 10https://gerrit.wikimedia.org/r/195458 [22:18:48] RECOVERY - salt-minion processes on cp1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [22:18:58] PROBLEM - Host cp4018 is DOWN: PING CRITICAL - Packet loss = 100% [22:19:55] 6operations, 10ops-codfw: label/update mgmt & settings/test eventlog2001 - https://phabricator.wikimedia.org/T90909#1102004 (10RobH) uh, rack tables says b5- but your update says a5... [22:20:13] 6operations, 10hardware-requests: deploy eventlog2001 - https://phabricator.wikimedia.org/T90907#1102006 (10RobH) pulled the info off blockign ticket, but racktables shows this in b5, not a5 [22:20:31] (03CR) 10Dzahn: "Yuvi says we are not going to use FreeBSD.. boo :)" [puppet] - 10https://gerrit.wikimedia.org/r/195321 (owner: 10Yuvipanda) [22:21:38] RECOVERY - puppet last run on vanadium is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [22:21:48] PROBLEM - Host cp1067 is DOWN: PING CRITICAL - Packet loss = 100% [22:21:48] (03PS1) 10RobH: eventlog2001 dns entry [dns] - 10https://gerrit.wikimedia.org/r/195460 [22:22:02] (03CR) 10RobH: [C: 032] adding eventlog2001 install-server info [puppet] - 10https://gerrit.wikimedia.org/r/195458 (owner: 10RobH) [22:22:18] PROBLEM - salt-minion processes on cp1069 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [22:22:37] (03CR) 10RobH: [C: 031] eventlog2001 dns entry [dns] - 10https://gerrit.wikimedia.org/r/195460 (owner: 10RobH) [22:23:48] PROBLEM - puppet last run on amssq59 is CRITICAL: CRITICAL: Puppet has 1 failures [22:24:27] 6operations, 6CA-team, 6MediaWiki-Core-Team, 10SUL-Finalization: db1068 (s4/commonswiki slave) is missing data about at least 6 users - https://phabricator.wikimedia.org/T91920#1102019 (10Philippe-WMF) [22:24:28] RECOVERY - salt-minion processes on cp1069 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [22:24:48] RECOVERY - Host cp4018 is UP: PING OK - Packet loss = 0%, RTA = 77.72 ms [22:25:22] (03PS2) 10RobH: eventlog2001 dns entry [dns] - 10https://gerrit.wikimedia.org/r/195460 [22:25:57] (03CR) 10RobH: [C: 032] eventlog2001 dns entry [dns] - 10https://gerrit.wikimedia.org/r/195460 (owner: 10RobH) [22:27:39] (03CR) 10Dzahn: [C: 031] "no seriously, $service_provider = $::initsystem is a nice simplifaction over that whole stanza.." [puppet] - 10https://gerrit.wikimedia.org/r/195321 (owner: 10Yuvipanda) [22:27:57] PROBLEM - Host cp4018 is DOWN: PING CRITICAL - Packet loss = 100% [22:28:08] RECOVERY - Host cp1067 is UP: PING OK - Packet loss = 0%, RTA = 1.64 ms [22:29:13] !log doing an extensions/GlobalUsage/refreshGlobalimagelinks.php --pages=nonexistent test run on aawiki [22:29:21] Logged the message, Master [22:31:18] RECOVERY - Host cp4018 is UP: PING OK - Packet loss = 0%, RTA = 83.10 ms [22:32:44] (03PS6) 10Greg Grossmeier: deployment: Combine labs/prod deployment server roles [puppet] - 10https://gerrit.wikimedia.org/r/195340 (https://phabricator.wikimedia.org/T88442) (owner: 10Yuvipanda) [22:33:19] Should I be worried about this crashing in my browser? https://commons.wikimedia.org/wiki/File:Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg [22:34:18] 6operations, 10Continuous-Integration, 3Continuous-Integration-Isolation, 5Patch-For-Review, 7Upstream: Create a Debian package for Zuul - https://phabricator.wikimedia.org/T48552#1102046 (10hashar) From a mail I sent to the private ops list: Hello, To package Zuul [T48552], I gave dh_virtualenv a try.... [22:34:32] Deskana: https://phabricator.wikimedia.org/T89573 ? [22:35:11] (03Abandoned) 10Dzahn: Get rid of role::apachesync [puppet] - 10https://gerrit.wikimedia.org/r/164508 (owner: 10Ori.livneh) [22:35:20] mutante: why abandoned? [22:35:21] tgr: I guess that's it. [22:35:25] tgr: Thanks. :-) [22:35:51] ori: because if you would rebase it it should be like nothing [22:36:06] ori: the module is deleted [22:37:46] mutante: oh. ok! [22:38:05] ori: 99% sure you would abandon it once you saw that [22:38:31] didn't mean to be rude, just bold in cleaning up [22:38:32] (03PS1) 10BBlack: depool amssq54-57 [puppet] - 10https://gerrit.wikimedia.org/r/195469 [22:38:43] mutante: right on, makes sense [22:38:46] (03CR) 10BBlack: [C: 032 V: 032] depool amssq54-57 [puppet] - 10https://gerrit.wikimedia.org/r/195469 (owner: 10BBlack) [22:42:49] RECOVERY - puppet last run on amssq59 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [22:46:29] PROBLEM - puppet last run on amssq61 is CRITICAL: CRITICAL: Puppet has 1 failures [22:48:29] PROBLEM - Host cp4018 is DOWN: PING CRITICAL - Packet loss = 100% [22:48:59] RECOVERY - Host cp4018 is UP: PING OK - Packet loss = 0%, RTA = 90.39 ms [22:49:30] PROBLEM - Host amssq56 is DOWN: PING CRITICAL - Packet loss = 100% [22:49:39] PROBLEM - Host amssq55 is DOWN: PING CRITICAL - Packet loss = 100% [22:49:48] PROBLEM - Host amssq54 is DOWN: PING CRITICAL - Packet loss = 100% [22:49:48] PROBLEM - Host amssq57 is DOWN: PING CRITICAL - Packet loss = 100% [22:50:48] PROBLEM - puppet last run on amssq40 is CRITICAL: CRITICAL: Puppet has 1 failures [22:57:09] PROBLEM - puppet last run on mw1064 is CRITICAL: CRITICAL: Puppet has 1 failures [22:58:38] (03PS1) 10BBlack: repool cp1067+cp4018 [puppet] - 10https://gerrit.wikimedia.org/r/195474 [22:58:50] (03CR) 10BBlack: [C: 032 V: 032] repool cp1067+cp4018 [puppet] - 10https://gerrit.wikimedia.org/r/195474 (owner: 10BBlack) [23:00:04] RoanKattouw, ^d, kaldari: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150309T2300). Please do the needful. [23:00:04] RoanKattouw, ^d, kaldari: Respected human, time to deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150309T2300). Please do the needful. [23:00:28] here [23:01:16] o/ [23:01:56] PROBLEM - Disk space on fluorine is CRITICAL: DISK CRITICAL - free space: /a 76021 MB (3% inode=99%): [23:02:08] two jouncebots [23:02:15] * bd808 will fix [23:02:43] (03PS1) 10Andrew Bogott: Remove duplicate definition of fonts-unfonts-core. [puppet] - 10https://gerrit.wikimedia.org/r/195475 [23:02:50] RoanKattouw: are you going to SWAT? ^d is at a conference today [23:04:00] Hmm OK yeah I'll do it [23:04:11] (03CR) 10Dzahn: [C: 031] "oh, thanks for pointing this out. and yes, i _just_ added it to the fonts list because we installed -extra but not -core which was a bit o" [puppet] - 10https://gerrit.wikimedia.org/r/195475 (owner: 10Andrew Bogott) [23:04:22] Although, kaldari is here [23:04:26] kaldari: Are you able to do SWAT today? [23:04:42] he just got up from his desk :P [23:04:49] I could SWAT if you don't want to [23:04:53] Please do [23:04:56] It's midnight here, so maybe not [23:05:02] ok :) [23:05:13] I should probably have taken myself off the calendar for this week given that I'm in a terrible timezone for this SWAT window [23:05:14] (03CR) 10Andrew Bogott: [C: 032] Remove duplicate definition of fonts-unfonts-core. [puppet] - 10https://gerrit.wikimedia.org/r/195475 (owner: 10Andrew Bogott) [23:05:17] RECOVERY - puppet last run on amssq61 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [23:05:22] tgr: James_F: ping for SWAT [23:05:27] Though last week was worse, last week it was 1am [23:05:27] legoktm: Here. [23:05:31] pong [23:06:03] jouncebot: refresh [23:06:04] I refreshed my knowledge about deployments. [23:06:10] jouncebot: next [23:06:10] In 15 hour(s) and 53 minute(s): Morning SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20150310T1500) [23:06:25] tgr: should your submodule updates go before the config patch? [23:06:36] legoktm: yes [23:07:36] springle: Just this? https://gerrit.wikimedia.org/r/#/c/195472/ [23:07:48] legoktm: with a ~5-10 min pause between them, if that's no trouble [23:07:51] sure [23:09:57] !log legoktm Synchronized php-1.25wmf20/extensions/ImageMetrics/resources/: https://gerrit.wikimedia.org/r/#/c/195449/ (duration: 00m 06s) [23:10:01] Logged the message, Master [23:10:37] RECOVERY - puppet last run on amssq40 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [23:11:02] !log legoktm Synchronized php-1.25wmf19/extensions/ImageMetrics/resources/: https://gerrit.wikimedia.org/r/#/c/195447/ (duration: 00m 05s) [23:11:05] Logged the message, Master [23:11:15] tgr: ^ let me know when the config change is ready [23:11:49] James_F: the thanks change doesn't need to be deployed correct? [23:12:05] legoktm: It needs to go to make the tests work. [23:12:21] legoktm: If you don't deploy it, tests fail. Thanks, Flow. [23:12:40] legoktm: But its a zero-user-facing change, I understand. [23:12:49] James_F: but I don't need to sync anything for it? [23:12:58] legoktm: Aren't you meant to sync the dirs to avoid leaving them dirty? [23:13:11] probably should. [23:14:06] (03CR) 10Legoktm: [C: 032] Fix WikiGrok settings init [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194919 (owner: 10MaxSem) [23:15:47] RECOVERY - puppet last run on mw1064 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [23:18:37] PROBLEM - puppet last run on cp3011 is CRITICAL: CRITICAL: Puppet has 1 failures [23:19:54] 6operations: revoke/delete bugzilla ssl certs - https://phabricator.wikimedia.org/T92041#1102185 (10Dzahn) [23:20:51] 6operations: delete / revoke stats.wikimedia.org SSL cert - https://phabricator.wikimedia.org/T92043#1102189 (10Dzahn) 3NEW [23:22:10] legoktm: should be ready by now [23:23:39] 6operations: revoke / delete metrics.wikimedia.org SSL cert - https://phabricator.wikimedia.org/T92044#1102217 (10Dzahn) 3NEW [23:23:55] legoktm [23:24:16] "DeltaQuad blocked User:Schsc with an expiry time of [INVALID]" [23:24:29] tgr: The merge stack is seven items deep. [23:25:05] That happened after using the block function of CheckUser, legoktm [23:25:11] Bsadowski1: little busy right now [23:25:24] ok [23:25:47] Bsadowski1: Distracting Lego whilst he's breaking Wikipedia.org considered harmful. :-) [23:25:57] !log legoktm Synchronized php-1.25wmf20/extensions/VisualEditor/lib/ve/src/ce/nodes/ve.ce.TableCellNode.js: https://gerrit.wikimedia.org/r/#/c/195290/ (duration: 00m 06s) [23:26:00] Logged the message, Master [23:26:07] James_F: ^ [23:26:09] Whee. [23:26:12] * James_F tests. [23:26:35] James_F: well, worst case I get some junk in the log table [23:26:46] it still won't break anything [23:26:49] 6operations: revoke / delete etherpad.wikimedia.org SSL cert - https://phabricator.wikimedia.org/T92045#1102233 (10Dzahn) [23:26:59] tgr: Sure. [23:27:03] !log legoktm Synchronized php-1.25wmf20/extensions/Thanks/tests/: https://gerrit.wikimedia.org/r/#/c/195290/ (duration: 00m 06s) [23:27:06] Logged the message, Master [23:27:36] I filed https://phabricator.wikimedia.org/T92042 for the extreme jenkins slowness btw [23:27:40] legoktm: Confirmed fix in production. All else seems fine. Thanks! [23:27:46] (03CR) 10Legoktm: [V: 032] Fix WikiGrok settings init [mediawiki-config] - 10https://gerrit.wikimedia.org/r/194919 (owner: 10MaxSem) [23:28:18] (03PS3) 10Dzahn: delete stats.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195309 (https://phabricator.wikimedia.org/T92043) [23:28:29] !log legoktm Synchronized wmf-config/mobile.php: https://gerrit.wikimedia.org/r/#/c/194919/ (duration: 00m 07s) [23:28:32] Logged the message, Master [23:28:33] kaldari: ^ [23:28:49] (03PS2) 10Dzahn: delete metrics.wikimedia.org SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195304 (https://phabricator.wikimedia.org/T73156) [23:28:53] legoktm: thanks, checking… [23:29:19] (03PS2) 10Dzahn: delete etherpad SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195303 (https://phabricator.wikimedia.org/T92045) [23:29:26] (03CR) 10Legoktm: [C: 032 V: 032] Enable CORS support logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195450 (https://bugzilla.wikimedia.org/507) (owner: 10Gergő Tisza) [23:30:11] (03CR) 10John F. Lewis: [C: 031] delete etherpad SSL cert [puppet] - 10https://gerrit.wikimedia.org/r/195303 (https://phabricator.wikimedia.org/T92045) (owner: 10Dzahn) [23:30:13] !log legoktm Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/195450/ (duration: 00m 05s) [23:30:17] Logged the message, Master [23:30:21] tgr: ^ [23:30:26] (03PS1) 10BBlack: repool amssq54-57 [puppet] - 10https://gerrit.wikimedia.org/r/195481 [23:30:40] (03CR) 10BBlack: [C: 032 V: 032] repool amssq54-57 [puppet] - 10https://gerrit.wikimedia.org/r/195481 (owner: 10BBlack) [23:31:52] legoktm: seems to be good [23:33:00] !log legoktm Synchronized php-1.25wmf20/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/195467/ (duration: 00m 05s) [23:33:03] Logged the message, Master [23:33:10] 6operations, 6Engineering-Community, 6WMF-Legal, 3ECT-March-2015, 6WMF-NDA: Implement the Volunteer NDA process in Phabricator - https://phabricator.wikimedia.org/T655#1102252 (10MBrar.WMF) Thanks @Qgil for that additional information! We've finalized the text of the NDA so that it only really requires s... [23:33:58] !log legoktm Synchronized php-1.25wmf19/extensions/WikimediaMaintenance/dumpInterwiki.php: https://gerrit.wikimedia.org/r/#/c/195466/ (duration: 00m 06s) [23:34:02] Logged the message, Master [23:34:21] (03PS1) 10BBlack: remove old pub dns for amssq5[4567],cp3003 [dns] - 10https://gerrit.wikimedia.org/r/195482 [23:34:34] (03PS1) 10GWicke: WIP: Don't include a node in its own seeds [puppet/cassandra] - 10https://gerrit.wikimedia.org/r/195483 (https://phabricator.wikimedia.org/T91617) [23:34:47] (03CR) 10BBlack: [C: 032] remove old pub dns for amssq5[4567],cp3003 [dns] - 10https://gerrit.wikimedia.org/r/195482 (owner: 10BBlack) [23:34:51] !log legoktm Synchronized wmf-config/interwiki.cdb: Updating interwiki cache (duration: 00m 05s) [23:34:54] Logged the message, Master [23:35:28] legoktm: works, thanks! [23:37:21] (03PS1) 10Legoktm: Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195484 [23:37:25] RECOVERY - puppet last run on cp3011 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [23:37:31] (03CR) 10Legoktm: [C: 032 V: 032] Updating interwiki cache [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195484 (owner: 10Legoktm) [23:37:55] 6operations: pybal issue? - https://phabricator.wikimedia.org/T90839#1102281 (10Dzahn) >>! In T90839#1092808, @Dzahn wrote: > regarding mw1062: > > why does it still not get any traffic? It does now, since @akosiaris restarted pybal for an unrelated LVS config change. [23:38:51] 6operations: pybal issue? - https://phabricator.wikimedia.org/T90839#1102284 (10Dzahn) regarding ocg, i still don't see a difference here: http://ganglia.wikimedia.org/latest/graph_all_periods.php?h=ocg1003.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2&st=1425920026&g=network_report&z=large&c=PDF%20ser... [23:41:15] PROBLEM - Unmerged changes on repository puppet on palladium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [23:41:29] 6operations, 6Engineering-Community, 6WMF-Legal, 3ECT-March-2015, 6WMF-NDA: Implement the Volunteer NDA process in Phabricator - https://phabricator.wikimedia.org/T655#1102298 (10Qgil) Good! @MBrar.WMF, as said above, since you are a member of #wmf-legal, you can edit the agreement directly at L2 ("Manag... [23:41:36] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet). [23:42:24] (03CR) 10Odder: [C: 04-1] "Technically the patch does what it's supposed to do, but from a philosophical point of view, this is in direct and blatant contradiction o" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195197 (https://phabricator.wikimedia.org/T44894) (owner: 10Jalexander) [23:50:09] (03CR) 10Dzahn: [C: 04-1] "yes, it is a slippery slope indeed. also, if you make spammers create users, they will" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195197 (https://phabricator.wikimedia.org/T44894) (owner: 10Jalexander) [23:50:43] 6operations, 10Wikimedia-Labs-Other, 7Tracking: (Tracking) Database replication services - https://phabricator.wikimedia.org/T50930#1102345 (10Earwig) [23:50:52] (03CR) 10Ori.livneh: "Odder, I think it'd be better to express philosophical opposition on the Phabricator task (as Kaldari did, to cite an example of a develop" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195197 (https://phabricator.wikimedia.org/T44894) (owner: 10Jalexander) [23:52:42] (03CR) 10Dzahn: "removing vote per Ori's comment. it wasn't a vote on the technical correctness" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195197 (https://phabricator.wikimedia.org/T44894) (owner: 10Jalexander) [23:55:46] (03CR) 10Odder: "@Ori: Well, yeah, I guess you're right." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/195197 (https://phabricator.wikimedia.org/T44894) (owner: 10Jalexander) [23:57:26] 6operations, 5Patch-For-Review: contacts.wikimedia.org drupal unpuppetized - https://phabricator.wikimedia.org/T90679#1102380 (10AKoval_WMF) >>! In T90679#1094689, @Dzahn wrote: > < quiddity> mutante, I've emailed anna koval, who might know. > < abartov> mutante: AFAIK, the Edu team now uses Asana to manage th...