[00:00:05] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T0000). [00:02:19] 06Operations: unaccepted salt keys - https://phabricator.wikimedia.org/T143846#2581021 (10Dzahn) [00:02:52] PROBLEM - Tool Labs instance distribution on labcontrol1001 is CRITICAL: CRITICAL: static class instances not spread out enough [00:03:03] nice catch, icinga-wm [00:03:07] * yuvipanda is fixing [00:03:11] PROBLEM - Tool Labs instance distribution on labcontrol1002 is CRITICAL: CRITICAL: static class instances not spread out enough [00:07:06] (03PS1) 10Dereckson: Allow bureaucrats to manage account creators group on ar.wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306595 (https://phabricator.wikimedia.org/T143844) [00:08:38] !log chromium back in service - both eqiad DNS recursors now on jessie [00:08:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:09:47] 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#2581057 (10Dzahn) [00:09:50] 06Operations: Migrate hydrogen/chromium to jessie - https://phabricator.wikimedia.org/T123727#2581055 (10Dzahn) 05Open>03Resolved 17:14 < mutante> !log chromium back in service - both eqiad DNS recursors now on jessie [00:11:24] 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#1981054 (10Dzahn) [00:11:42] 06Operations, 05Goal: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#1989222 (10Dzahn) chromium done. both DNS recursors now jessie. count: 12 [00:19:07] (03CR) 10Dzahn: [C: 031] Fix race in puppet::self (puppet.conf compilation) [puppet] - 10https://gerrit.wikimedia.org/r/284852 (https://phabricator.wikimedia.org/T132689) (owner: 1020after4) [00:19:39] (03PS2) 10Dzahn: Beta: Add logstash port [puppet] - 10https://gerrit.wikimedia.org/r/303240 (owner: 10Thcipriani) [00:19:50] (03CR) 10Dzahn: [C: 032] Beta: Add logstash port [puppet] - 10https://gerrit.wikimedia.org/r/303240 (owner: 10Thcipriani) [00:24:57] (03PS3) 10Dzahn: udp2log::instance: require psmisc package for use of killall command [puppet] - 10https://gerrit.wikimedia.org/r/305766 (owner: 10Alex Monk) [00:25:26] (03CR) 10Dzahn: [C: 032] "yes, the package is already installed on fluorine" [puppet] - 10https://gerrit.wikimedia.org/r/305766 (owner: 10Alex Monk) [00:32:31] !log ebernhardson@tin Synchronized php-1.28.0-wmf.16/extensions/CirrusSearch/includes/Job/CheckerJob.php: Fix CirrusSearch CheckerJob stuck in a loop (duration: 00m 47s) [00:32:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:36:20] (03CR) 1020after4: [C: 031] Fix race in puppet::self (puppet.conf compilation) [puppet] - 10https://gerrit.wikimedia.org/r/284852 (https://phabricator.wikimedia.org/T132689) (owner: 1020after4) [00:38:10] (03CR) 10Dzahn: [C: 031] "+1 per "okay to merge this but we should be careful we don't start the apocalypse" :)" [puppet] - 10https://gerrit.wikimedia.org/r/296687 (https://phabricator.wikimedia.org/T139008) (owner: 10Ladsgroup) [00:38:12] (03CR) 1020after4: "I don't think this fully fixed the problem but the race condition has only been spotted once since the patch was cherry-picked on beta. I " [puppet] - 10https://gerrit.wikimedia.org/r/284852 (https://phabricator.wikimedia.org/T132689) (owner: 1020after4) [00:39:35] (03CR) 10Dzahn: [C: 032] Fix race in puppet::self (puppet.conf compilation) [puppet] - 10https://gerrit.wikimedia.org/r/284852 (https://phabricator.wikimedia.org/T132689) (owner: 1020after4) [00:39:40] (03PS2) 10Dzahn: Fix race in puppet::self (puppet.conf compilation) [puppet] - 10https://gerrit.wikimedia.org/r/284852 (https://phabricator.wikimedia.org/T132689) (owner: 1020after4) [00:39:53] 06Operations, 10ops-codfw: rack/setup/deploy puppetmaster200[12] - https://phabricator.wikimedia.org/T143255#2581095 (10Papaul) [00:41:16] 06Operations, 10ops-codfw: rack/setup/deploy puppetmaster200[12] - https://phabricator.wikimedia.org/T143255#2562105 (10Papaul) a:05Papaul>03Joe @joe @akosiaris , installation complete. [00:42:33] (03CR) 10Dzahn: [C: 031] toollabs: install pdf2djvu [puppet] - 10https://gerrit.wikimedia.org/r/304788 (https://phabricator.wikimedia.org/T130138) (owner: 10Merlijn van Deen) [00:45:42] 06Operations, 10ops-codfw, 06DC-Ops, 07Wikimedia-Incident: Labstore2001 controller or shelf failure - https://phabricator.wikimedia.org/T102626#2581102 (10Papaul) @chasemp what do you want to do with this? [00:48:04] (03CR) 10Dzahn: "still wanted?" [puppet] - 10https://gerrit.wikimedia.org/r/302705 (owner: 10Halfak) [00:51:01] (03CR) 10MZMcBride: "I don't think we should rush merging and deploying this change. (And I don't want to hear about some April 2016 edition of Tech News, I do" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [00:51:13] !log T137474: Stopping dumps in RESTBase staging, and reverting xenon.eqiad.wmnet to Cassandra 2.2.6-wmf1 [00:51:15] T137474: Investigate lack of recency bias in Cassandra histogram metrics - https://phabricator.wikimedia.org/T137474 [00:51:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:51:37] (03CR) 10Dzahn: [C: 031] "looks like it'd be useful. i think Krenair also wrote something like this" [puppet] - 10https://gerrit.wikimedia.org/r/288711 (https://phabricator.wikimedia.org/T135187) (owner: 10Hashar) [00:54:12] (03CR) 10Dzahn: "yep, meanwhile we have an allusers group and no more bastiononly, what Alex Monk linked" [puppet] - 10https://gerrit.wikimedia.org/r/244471 (https://phabricator.wikimedia.org/T114161) (owner: 10Rush) [00:56:01] (03CR) 10Dzahn: "gotta ask labs people about the capacity limits" [puppet] - 10https://gerrit.wikimedia.org/r/285957 (https://phabricator.wikimedia.org/T133911) (owner: 10Hashar) [01:04:23] (03CR) 10Yuvipanda: [C: 04-2] "-2ing for now, anyone from the labs team feel free to remove my -2 once discussions have finished." [puppet] - 10https://gerrit.wikimedia.org/r/285957 (https://phabricator.wikimedia.org/T133911) (owner: 10Hashar) [01:04:24] RECOVERY - puppet last run on xenon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:05:23] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/3831/" [puppet] - 10https://gerrit.wikimedia.org/r/300468 (owner: 1020after4) [01:05:43] (03PS1) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [01:06:53] (03CR) 10BryanDavis: [C: 04-1] "Will test in Labs project" [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [01:08:36] !log radium apt-get autoremove; apt-get upgrade (openssh, openssl, passwd, sudo, libpam, libc6 :) [01:08:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:09:01] * Platonides read that as removing those packages! xD [01:09:54] uhm no, it's just the nice combo of things all upgraded at once [01:10:13] autoremove deleted 3.x kernel and stuff [01:12:34] bd808 I guess we're going to finish this off tomorrow? [01:12:46] yuvipanda: yeah. [01:14:26] * yuvipanda nods [01:14:44] bd808 let's not do the k8s upgrade tomorrow, and replace that withthis/ [01:14:44] ? [01:15:19] sure, that works for me [01:15:24] (03PS1) 10Dzahn: tor::relay: add debdeploy grains [puppet] - 10https://gerrit.wikimedia.org/r/306606 [01:15:39] poor k8s keeps getting postponed [01:15:56] yeah [01:16:27] I'm having a hard time getting puppet to cleanup the prior nginx install correctly :/ [01:16:50] (03PS6) 10Dzahn: Specify home directory for phd user [puppet] - 10https://gerrit.wikimedia.org/r/300468 (owner: 1020after4) [01:18:16] bd808 I think it's ok to do it by hand [01:18:32] I don't think we sohuld have puppet remove things [01:19:09] its a bit dicey [01:19:39] In that case I'll just rip the nginx bits out directly [01:19:53] bd808 yeah, that sounds good to me [01:21:12] (03CR) 10Dzahn: [C: 032] "alright, confirmed we should not be missing the .subversion dir" [puppet] - 10https://gerrit.wikimedia.org/r/300468 (owner: 1020after4) [01:23:04] (03PS1) 10Dzahn: Revert "Specify home directory for phd user" [puppet] - 10https://gerrit.wikimedia.org/r/306607 [01:23:57] (03CR) 10Dzahn: [C: 032] "nope. doesn't work like this. needs to be done during a maintenance period apparently" [puppet] - 10https://gerrit.wikimedia.org/r/306607 (owner: 10Dzahn) [01:25:04] (03PS2) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [01:26:39] (03CR) 10Dzahn: "Error: Could not set home on user[phd]: Execution of '/usr/sbin/usermod -d /var/run/phd phd' returned 8: usermod: user phd is currently us" [puppet] - 10https://gerrit.wikimedia.org/r/300468 (owner: 1020after4) [01:27:47] (03CR) 10Dzahn: "looks like this needs to be scheduled for a maintenance window where the service can be stopped first" [puppet] - 10https://gerrit.wikimedia.org/r/300468 (owner: 1020after4) [01:28:21] (03PS1) 10Dzahn: Revert "Revert "Specify home directory for phd user"" [puppet] - 10https://gerrit.wikimedia.org/r/306608 [01:28:51] (03CR) 10Dzahn: [C: 04-1] "phabricator needs to be stopped for this" [puppet] - 10https://gerrit.wikimedia.org/r/306608 (owner: 10Dzahn) [01:30:23] 06Operations, 10DBA, 10Phabricator: Upgrade m3 (phabricator) db servers - https://phabricator.wikimedia.org/T138460#2581152 (10Dzahn) [01:30:54] PROBLEM - puppet last run on ms-be2022 is CRITICAL: CRITICAL: puppet fail [01:33:37] (03PS3) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [01:35:43] (03PS4) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [01:39:20] (03CR) 10BryanDavis: [C: 04-1] striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [01:47:21] (03CR) 1020after4: "it's ok to stop phd without a maintenance window. It won't be very disruptive if it's down for a minute or two (it just causes commits and" [puppet] - 10https://gerrit.wikimedia.org/r/300468 (owner: 1020after4) [01:47:31] mutante: ^ [01:48:11] aah. ok [01:49:03] (03CR) 10Dzahn: [C: 032] Revert "Revert "Specify home directory for phd user"" [puppet] - 10https://gerrit.wikimedia.org/r/306608 (owner: 10Dzahn) [01:49:41] !log iridium - temp. stopping phd service for home dir change [01:49:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:50:34] usermod: user phd is currently used by process 5565 [01:51:12] puppet started phd again? [01:51:45] Notice: /Stage[main]/Phabricator::Phd/User[phd]/home: home changed '/home/phd' to '/var/run/phd' [01:51:54] Notice: /Stage[main]/Phabricator/Service[phd]/ensure: ensure changed 'stopped' to 'running' [01:52:26] twentyafterfour: it's a race, puppet does both, on second attempt it worked [01:52:36] looking for phab2001 [01:53:22] yep, it's changed on both [01:53:28] phd:x:997:997::/var/run/phd:/bin/false [01:54:40] twentyafterfour: all done [01:54:41] (03CR) 10Dzahn: "done now on iridium and phab2001 after temp stopping the service and 2 puppet runs" [puppet] - 10https://gerrit.wikimedia.org/r/306608 (owner: 10Dzahn) [01:57:24] RECOVERY - puppet last run on ms-be2022 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:58:54] mutante: thanks! [02:01:24] twentyafterfour: yw, cya later then. stepping away for tonight [02:15:46] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 3 others: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2581179 (10AndyRussG) Here's how things work in the proposed patch: - The `mw.centralNotice.ge... [02:17:34] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [02:19:02] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [02:27:44] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.15) (duration: 10m 41s) [02:27:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:36:26] 06Operations, 10Fundraising-Backlog, 10MediaWiki-extensions-CentralNotice, 10Traffic, and 3 others: CN: Stop using the geoiplookup HTTPS service (always use the Cookie) - https://phabricator.wikimedia.org/T143271#2581207 (10AndyRussG) P.S. Thanks to @Krinkle for the idea of using config to name a RL module... [02:45:53] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [02:46:23] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [02:49:21] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.16) (duration: 11m 09s) [02:49:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:55:50] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Aug 25 02:55:50 UTC 2016 (duration 6m 30s) [02:55:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:04:03] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [03:04:42] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [03:08:15] doing an UBN deploy for T143840 [03:08:16] T143840: Unable to reset temporary password on mediawiki - https://phabricator.wikimedia.org/T143840 [03:09:03] bd808: yurik: am I stepping on any toes? [03:09:15] always, why? [03:09:26] :) [03:09:35] what did i miss? [03:09:40] you seemed to be active on tin [03:09:59] and someone did syncs a few minutes ago [03:10:14] tgr, ah, nah, i just had a shell running that i didn't logg off [03:10:18] done [03:10:23] not i [03:10:24] cool, thx [03:10:57] and it sucks that someone could do a sync and we don't know who it was :) [03:28:24] !log tgr@tin Synchronized php-1.28.0-wmf.16/includes/specialpage/AuthManagerSpecialPage.php: UBN fix for T143840 (duration: 00m 49s) [03:28:26] T143840: Unable to reset temporary password on mediawiki - https://phabricator.wikimedia.org/T143840 [03:28:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:32:52] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [03:34:22] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [03:44:46] (03CR) 10Aude: "we still need to do this. :/" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208655 (https://phabricator.wikimedia.org/T94416) (owner: 10Aude) [03:48:14] PROBLEM - Postgres Replication Lag on maps2004 is CRITICAL: CRITICAL - Rep Delay is: 1809.2716 Seconds [03:50:13] RECOVERY - Postgres Replication Lag on maps2004 is OK: OK - Rep Delay is: 0.0 Seconds [04:02:37] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [04:03:02] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [04:03:07] (03PS2) 10Aude: Update Wikibase site id and group for test2wiki and testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208655 (https://phabricator.wikimedia.org/T94416) [04:10:33] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [04:11:02] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [04:22:23] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [04:22:53] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [04:34:43] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [04:35:12] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [04:50:42] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [04:51:03] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [04:54:43] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [04:55:03] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [05:06:01] (03PS5) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [05:16:29] (03PS6) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [05:16:54] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [05:17:13] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [05:25:21] (03PS7) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [05:27:03] (03PS8) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [05:33:13] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [05:33:32] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [05:43:13] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [05:43:33] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [05:45:52] (03PS4) 10KartikMistry: apertium-hbs-slv: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/296203 (https://phabricator.wikimedia.org/T107306) [05:46:01] (03CR) 10jenkins-bot: [V: 04-1] apertium-hbs-slv: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/296203 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [05:47:01] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/296203 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [05:51:31] (03PS4) 10KartikMistry: apertium-hbs-eng: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) [05:51:39] (03CR) 10jenkins-bot: [V: 04-1] apertium-hbs-eng: New upstream, rebuild for Jessie and cleanup [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [05:52:31] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [05:55:14] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [05:55:23] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:17:54] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 209, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/2: down - Core: cr2-knams:xe-1/1/0 (GTT, 00341724) {#3466} [10Gbps MPLS]BR [06:19:33] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [06:19:42] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [06:22:02] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 211, down: 0, dormant: 0, excluded: 0, unused: 0 [06:28:43] PROBLEM - MariaDB Slave Lag: x1 on dbstore2001 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 180114.14 seconds [06:36:02] (03PS4) 10Phedenskog: Enable PerformanceInspector extension for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304992 [06:40:27] (03CR) 10Muehlenhoff: [C: 04-1] "Already present; debdeploy-tor, assigned via hieradata/role/common/tor.yaml. It seems the role was renamed at some point, but the assigned" [puppet] - 10https://gerrit.wikimedia.org/r/306606 (owner: 10Dzahn) [06:45:52] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [06:45:53] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [06:51:52] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [06:51:53] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [07:00:07] (03PS1) 10Muehlenhoff: Rename Hiera data for Salt grain [puppet] - 10https://gerrit.wikimedia.org/r/306629 [07:03:31] (03CR) 10Muehlenhoff: [C: 032] Rename Hiera data for Salt grain [puppet] - 10https://gerrit.wikimedia.org/r/306629 (owner: 10Muehlenhoff) [07:04:13] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [07:04:13] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [07:05:23] RECOVERY - MariaDB Slave Lag: x1 on dbstore2001 is OK: OK slave_sql_lag Replication lag: 42691.81 seconds [07:07:33] !log installing harfbuzz security updates [07:07:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:26:17] (03PS4) 10Alex Monk: mw-log-cleanup: remove wfDebug files in deployment-prep every week [puppet] - 10https://gerrit.wikimedia.org/r/305768 [07:26:20] (03PS6) 10Alex Monk: Remove the hard-coded /a/mw-log references scattered around everywhere [puppet] - 10https://gerrit.wikimedia.org/r/305767 [07:26:33] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [07:26:33] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [07:29:30] (03CR) 10Alex Monk: "PS6 just changes a comment. PS5 puppet-compiler result: https://puppet-compiler.wmflabs.org/3833/fluorine.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/305767 (owner: 10Alex Monk) [07:29:35] (03PS5) 10Alex Monk: mw-log-cleanup: remove wfDebug files in deployment-prep every week [puppet] - 10https://gerrit.wikimedia.org/r/305768 [07:29:37] (03PS7) 10Alex Monk: Remove the hard-coded /a/mw-log references scattered around everywhere [puppet] - 10https://gerrit.wikimedia.org/r/305767 [07:29:46] (03CR) 10Alex Monk: "Er, sorry. I meant PS6 and PS7." [puppet] - 10https://gerrit.wikimedia.org/r/305767 (owner: 10Alex Monk) [07:34:02] (03PS1) 10Muehlenhoff: Add modprobe configuration for br_netfilter for Linux >= 3.18 [puppet] - 10https://gerrit.wikimedia.org/r/306633 (https://phabricator.wikimedia.org/T142388) [07:34:33] (03PS2) 10Muehlenhoff: Add modprobe configuration for br_netfilter for Linux >= 3.18 [puppet] - 10https://gerrit.wikimedia.org/r/306633 (https://phabricator.wikimedia.org/T142388) [07:39:33] moritzm: "options br_netfilter\n" is better [07:39:42] kmod won't care probably, but it's better for humans :) [07:40:20] but also, are you sure that the presence of options loads the module on boot? [07:40:29] I've never used it like that -- only knew of /etc/modules [07:41:03] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [07:41:03] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [07:43:19] (03Abandoned) 10Elukey: Reduce cronspam from terbium related to the echo_mail_batch cron script [puppet] - 10https://gerrit.wikimedia.org/r/298785 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [07:44:09] paravoid: works fine on stretch, but I'll doublecheck with a trusty host in labs [07:44:43] (03CR) 10Elukey: "Sorry Daniel just seen your comment, we can chat with Alex on IRC and maybe reach an agreement? I didn't check if the issue is still affec" [puppet] - 10https://gerrit.wikimedia.org/r/298785 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [07:45:45] (03PS3) 10Muehlenhoff: Add modprobe configuration for br_netfilter for Linux >= 3.18 [puppet] - 10https://gerrit.wikimedia.org/r/306633 (https://phabricator.wikimedia.org/T142388) [07:49:47] (03CR) 10Elukey: "Thanks a lot for fixing a bug that I have introduced, and sorry for the trouble. The only thing that I wanted to express with the extra st" [puppet] - 10https://gerrit.wikimedia.org/r/306556 (owner: 10Krinkle) [07:55:12] PROBLEM - Router interfaces on cr1-eqord is CRITICAL: CRITICAL: host 208.80.154.198, interfaces up: 37, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-1/1/0: down - Core: cr2-eqiad:xe-4/2/0 (Telia, IC-314533, 24ms) {#11371} [10Gbps wave]BR [07:55:12] PROBLEM - Router interfaces on cr2-eqiad is CRITICAL: CRITICAL: host 208.80.154.197, interfaces up: 212, down: 1, dormant: 0, excluded: 0, unused: 0BRxe-4/2/0: down - Core: cr1-eqord:xe-1/0/0 (Telia, IC-314533, 29ms) {#3658} [10Gbps wave]BR [08:09:03] PROBLEM - puppet last run on cp4012 is CRITICAL: CRITICAL: Puppet has 1 failures [08:11:39] jynus: T143862 is likely related to the saneitizer issue. dcausse is looking into it. [08:11:50] T143862: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862 [08:12:24] 07Puppet, 10Beta-Cluster-Infrastructure, 07Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#2581568 (10hashar) [08:12:27] 10Blocked-on-Operations, 07Puppet, 10Beta-Cluster-Infrastructure, 06Labs, 13Patch-For-Review: /etc/puppet/puppet.conf keeps getting double content - first for labs-wide puppetmaster, then for the correct puppetmaster - https://phabricator.wikimedia.org/T132689#2581564 (10hashar) 05Open>03Resolved a:... [08:12:34] great, not causing issues for now that I can see [08:12:46] but a 3x increase in thoughput is scary [08:13:38] jynus: definitely! [08:13:55] if it was required, we can do it, but I would like to bind it to a specific host to avoid latency issues of other queries that cannot wait [08:14:07] jynus: I'm preparing to disable feeding more stuff into that job until we know what's going on. [08:14:28] it can continue for now, as long as you are actively working on it [08:14:42] I suppose it will help seeing it in action [08:14:42] jynus: this is most probably a bug on our side, the job has been running for quite some time without causing this kind of traffic and started going crazy yesterday [08:15:04] s3 tends to have an effect on parallelization [08:15:07] good morning [08:15:17] jynus: ok, I'll at least make sure we are ready to kill it if needed... [08:15:31] jynus: what do you mean "effect on //ization" ? [08:15:45] that is when we have pushed wmf.16 to group1 isn't it ? [08:15:50] (I mean the 3x increase) [08:16:20] when people execute things with "1 thread per wiki" [08:16:39] that is ok on all hosts (1 thread for enwiki, with is slow, 7 for s7, etc.) [08:16:44] if there is a patch to push to mw and you need assistance let me know [08:17:09] hashar: thanks! will do! [08:17:13] but then we have s3, which has 800 wikis, and some people in the past, not realizing that, have run 800 concurrent threads [08:17:22] RECOVERY - Apache HTTP on mw1191 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.062 second response time [08:17:37] jynus: ok, I'll check with dcausse... [08:17:38] !log restarted hhvm on mw1191 and mw1216, got stuck [08:17:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:17:52] RECOVERY - HHVM rendering on mw1216 is OK: HTTP OK: HTTP/1.1 200 OK - 67818 bytes in 0.349 second response time [08:18:32] RECOVERY - HHVM rendering on mw1191 is OK: HTTP OK: HTTP/1.1 200 OK - 67796 bytes in 0.126 second response time [08:19:12] RECOVERY - Apache HTTP on mw1216 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.030 second response time [08:21:43] (03CR) 10Giuseppe Lavagetto: "@thcipriani I think you're wrong, please see my comments." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/306429 (owner: 10Giuseppe Lavagetto) [08:21:43] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [08:21:43] RECOVERY - Router interfaces on cr2-eqiad is OK: OK: host 208.80.154.197, interfaces up: 214, down: 0, dormant: 0, excluded: 0, unused: 0 [08:26:17] volans: I have some patches to add python linters to operations/software but that looks scary [08:26:45] if you get some spare time today I can demo it to you and then will want to reach out to other active developers to find out whether it is a good system [08:26:59] hi hashar, it's becoming too hacky due to multiple softwares in the same repo? [08:27:04] sure [08:27:07] (03PS1) 10Gehel: CirrusSearch: disable saneitizer cron job [puppet] - 10https://gerrit.wikimedia.org/r/306639 (https://phabricator.wikimedia.org/T143862) [08:27:14] not so hacky, but definitely confusing :} [08:27:30] poke me whenever you get time [08:27:44] (03CR) 10Gehel: [C: 04-1] "Not to be merged yet (dcausse is working on a real fix)." [puppet] - 10https://gerrit.wikimedia.org/r/306639 (https://phabricator.wikimedia.org/T143862) (owner: 10Gehel) [08:29:07] hashar: whenever you want [08:31:14] volans: like now? Grabbing a coffee and looking up for the patches [08:31:39] hashar, volans: can I join? I just need a coffee before though... [08:32:30] 06Operations, 10ops-codfw, 05Puppet-infrastructure-modernization: rack/setup/deploy puppetmaster200[12] - https://phabricator.wikimedia.org/T143255#2581609 (10Joe) [08:32:45] ack [08:35:53] RECOVERY - puppet last run on cp4012 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [08:36:47] (03CR) 10Elukey: "Thanks a lot Daniel!" [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [08:39:38] gehel: obviously you can join [08:39:44] volans: got caffeine in my hand [08:40:03] hashar: kool! got cafeine in my blood... [08:40:19] lol [08:41:30] so in short volans asked to get some python linters to be run by CI for operations/software.git [08:41:40] which is a catch all repo having several standalone scripts [08:41:51] and at least a couple python packages (with their own setup.py) [08:42:23] we have made CI dumb it has a job that just git clone && tox [08:42:45] tox let you define multiple virtual env and run whatever commands in those virtual env context [08:43:00] so devs can do whatever just by configuring tox in /tox.ini [08:43:07] and CI happily runs the commands [08:43:20] that delegates what is being run to devs [08:43:37] gotcha is that the jenkins job invokes tox at the root of the repo [08:44:02] so if we want to run different things / run setup.py for the couple softwares having setup.py (eg: clouseau) [08:44:14] we need to have tox from the root of the repo to cd clouseau then run tox there [08:44:33] so I got a patch that adds a /tox.ini that will be used for CI [08:44:42] which does something like: cd clouseau && tox [08:45:11] eg runs whatever is defined in /clouseau/tox.ini and run its setup.py sdist + flake8 with specific set of commands [08:45:44] for the other standalone scripts, we could have the /tox.ini to invoke flake8 there and instruct it to exclude the standalone software which have their own rules / tox. Eg: flake8.exclude = clouseau [08:46:15] * hashar is done with wall of text [08:47:08] hashar: do you think that having Jenkins look for all tox files and do the cd && tox would be worse? [08:47:12] serie is https://gerrit.wikimedia.org/r/#/q/project:operations/software+bug:143559 [08:47:30] yeah I thought about that volans [08:47:35] hashar: o/ did you see my update for the zuul deb (~10 days ago) [08:47:49] I am not sure how messy it is going to be though when one of the sub tox fails [08:48:08] would have to catch all error codes (since Jenkins abort on the first exit !=0) and aggregate the results [08:48:14] or maybe just fail on the first [08:48:30] elukey: yeah I havent processed it yet [08:49:12] elukey: I will have to rebuild the package(s) isn't it ? [08:49:46] hashar: probably, but don't worry I didn't want to rush you, I just wanted to make sure that you knew :) [08:49:53] we can work on it next week if you have time [08:50:14] I feel bad that I went on vacation without chatting with you about it [08:50:30] elukey: sure thing. I dont have access to my build box right now (it is at home and I am at my coworking place). Will try to get it rebuild / address the issue tomorrow and we can follow up next week [08:50:40] and I was on vacation for most of august anyway, so no big deal :} [08:50:46] super :) [08:51:35] hashar: I guess a set +e should allow to continue execution, grab exit status for all of them and exit with !=0 if their sum is > 0 [08:52:23] (03CR) 10Hashar: "sdist fails to build due to README_retention.txt not being included in the tarball :D rather minor issue. The tox.ini is pretty lame but " [software] - 10https://gerrit.wikimedia.org/r/306010 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [08:52:41] volans: yeah we are doing that for a few specific jobs [08:52:59] volans: then what I wanted to achieve is for anyone to be able to clone the repo, invoke tox, and then have the same outcome as CI [08:53:04] much easier to reproduce issues [08:53:34] https://gerrit.wikimedia.org/r/#/c/306032/1/tox.ini deals with chaining to clouseau [08:53:40] yes but I'm sure that at next addition nobody will remember to add the lines to the root tox [08:54:19] that's basically my only concern, maintanability :) [08:55:19] and for standalone files? [08:55:40] (03CR) 10Hashar: "Some explanations about tox.ini" (035 comments) [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [08:55:50] I have refreshed https://gerrit.wikimedia.org/r/#/c/306032/1/tox.ini with some inline comments [08:56:08] for standalone files, we would run the commands from the /tox.ini [08:56:24] that is the second change https://gerrit.wikimedia.org/r/#/c/306033/1/tox.ini [08:56:46] which adds flake8 at the root of the repo, but have it exclude /clouseau since it has its own tox.ini and its own set of flake8 rules [08:56:58] ok so we need to manually exclude there subprojects with their own [08:57:07] for maintainability if one adds a new software with its own setup.py indeed we would need to catch that [08:57:24] maybe I can get a test that looks for setup.py files, and ensure the sub project has an entry in /tox.ini [08:57:31] with proper exclue [08:58:20] another possibility is to move the standalone projects such as clouseau to their own git repo (eg operations/software/clouseau.git ) [08:58:30] that is a bit more cumbersome [08:59:02] 06Operations, 07Puppet, 05Goal, 05Puppet-infrastructure-modernization: Set up a puppet frontend in codfw who can work as a slave of eqiad's master - https://phabricator.wikimedia.org/T143869#2581631 (10Joe) [08:59:22] yeah, maybe make more sense to add the test [08:59:27] for example swiftrepl have a setup.py too [09:00:02] I am adding tox to CI with https://gerrit.wikimedia.org/r/306641 [09:00:09] will let us comment "check experimental" to trigger it [09:00:15] while swift-synctool has not but has a .pep8 :D [09:00:22] ouch [09:00:32] .pep8 files should probably get removed nowadays :D [09:02:36] (03CR) 10Hashar: "check experimental" [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [09:02:42] (03CR) 10Hashar: "check experimental" [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [09:08:36] (03PS2) 10Filippo Giunchedi: graphite: parametrize cors_origins for labs [puppet] - 10https://gerrit.wikimedia.org/r/306467 (https://phabricator.wikimedia.org/T143556) [09:09:49] (03CR) 10Filippo Giunchedi: [C: 032] graphite: parametrize cors_origins for labs [puppet] - 10https://gerrit.wikimedia.org/r/306467 (https://phabricator.wikimedia.org/T143556) (owner: 10Filippo Giunchedi) [09:09:53] hashar, volans: newbie question, but why do we aggregate multiple projects in a single git repo? cheaper to setup? [09:10:44] (03PS4) 10Muehlenhoff: Add modprobe configuration for br_netfilter for Linux >= 3.18 [puppet] - 10https://gerrit.wikimedia.org/r/306633 (https://phabricator.wikimedia.org/T142388) [09:11:03] gehel: the software repo includes some misc stuff, many are in python and we didn't have any check on those [09:11:13] historical / legacy I guess [09:11:28] a couple are small project with their own setup.py, mostly are single file scripts or few files scripts [09:11:30] operations/software was setup for misc operations stuff [09:11:31] (03PS5) 10Muehlenhoff: Add modules-load.d/kmod configuration for br_netfilter for Linux >= 3.18 [puppet] - 10https://gerrit.wikimedia.org/r/306633 (https://phabricator.wikimedia.org/T142388) [09:12:00] and some scripts eventually evolved to a standalone software or some "abused" that repo to land their code instead of creating a repo [09:12:06] it is not really a problem really [09:12:19] I can understand that for single file scripts, it does not make sense to have a dedicated repo. But thigs like clouseau seem that they deserve their own repo [09:12:20] if it is easier to have all those soft in the same git repo, lets stick to that [09:12:54] yeah maybe clouseau can be migrated to a standalone repo [09:13:14] then I dont want to enforce it. I am rather flexible on that and in the end it is up to devs [09:13:48] hashar: I like the way you thinkg :) [09:14:13] I try to be pretty liberal when I can :D [09:15:44] :) [09:17:50] so tox pass happily on the patch that adds /tox.ini and chains to clouseau https://gerrit.wikimedia.org/r/#/c/306032/ [09:18:01] https://integration.wikimedia.org/ci/job/tox/286/console [09:18:05] (03PS1) 10Giuseppe Lavagetto: Add SRV records for puppet (all pointing to palladium.eqiad.wmnet) [dns] - 10https://gerrit.wikimedia.org/r/306642 (https://phabricator.wikimedia.org/T143869) [09:18:30] it creates the 'clouseau' virtualenv, execute the test command 'tox -c clouseau/tox.ini' into it [09:19:00] which in turns create two venv: py27 that runs sdist and try to run tests, and a 'flake8' venv that runs from /clouseau/ and reads flake8 config from /clouseau/tox.ini [09:19:06] both pass [09:19:09] 00:00:21.783 py27: commands succeeded [09:19:09] 00:00:21.783 flake8: commands succeeded [09:19:14] thus the clouseau test command pass [09:19:19] 00:00:21.799 clouseau: commands succeeded [09:19:48] the other patch that adds flake8 at the root of the repo fails though ( https://gerrit.wikimedia.org/r/#/c/306033/1 ) due to various misc scripts not adhering to flake8 standard [09:20:02] looks ok, just to be sure you could temporarily add a python error in a file inside clouseau and in a spare .py file ;) [09:20:17] ok great :D [09:20:21] already tested then [09:20:32] one can git-review -d 306032 [09:20:40] introduce some error in a file under /clouseau [09:20:45] then run tox at root of repo -> should fail [09:20:57] and then run tox from /clouseau -> should run the same things and fail as well [09:21:02] (which it does :} ) [09:23:57] 06Operations, 06Labs, 13Patch-For-Review: grafana-labs.wikimedia.org doesn't reflect grafana-labs-admin.wikimedia.org - https://phabricator.wikimedia.org/T143556#2581695 (10fgiunchedi) cors is fixed, the only bit left to automate is changing `files/grafana/grafana_create_anon_user` to also add `Viewer` right... [09:25:39] eheheh [09:25:59] how do we want to proceed for existing infractions? [09:26:19] I can do a pass probably on some of them [09:26:40] E501 line too long (106 > 79 characters) [09:26:47] hashar: didn't you put 120? [09:27:43] only in /clouseau/tox.ini [09:27:58] (03PS6) 10Muehlenhoff: Provide override file for base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/305635 [09:28:09] for the random issues [09:28:17] I would usually fix them scripts by scripts [09:28:30] easier to review for people [09:28:37] and ignore the errors that are cumbersome / annoying [09:28:52] then once flake8 pass entirely fine, the remaining errors can be dealt with one by one [09:29:06] ok [09:29:07] we did that for pywikibot (a decade + old set of python scripts) [09:29:22] started with pep8 ignoring a lot of rules, fix them one by one [09:29:30] then progressively added pyflakes / flake8 etc [09:29:50] up to a point where nowadays it complains when doc block are improperly formatted :} [09:30:26] yeah! [09:30:36] with the last patch https://gerrit.wikimedia.org/r/#/c/306033/ [09:30:53] to run flake8 one would go at the root of the repo and : tox -e flake8 [09:31:07] the commands accepts extra args: commands = flake8 {posargs} [09:31:15] so you can do: tox -eflake8 -- --statistics [09:31:23] (which really ends up invoking: flake8 --statistics ) [09:31:30] or: tox -eflake8 -- --help [09:31:35] tox -eflake8 -- --ignore=W ) [09:31:42] tox -eflake8 -- some/script.py [09:32:05] got it [09:32:12] so if one wants to fix up checkhosts: tox -eflake8 -- checkhosts/ [09:32:18] tox is really just a wrapper around virtualenv [09:32:36] guarantee you that the commands are running with the proper set of dependencies [09:32:59] so later on one can add environment for python2.7 and python3 [09:33:10] or django1.7 vs django1.9 etc [09:33:37] and combine them all (python2.7 + django 1.7, python3.4 + django 1.9 etc) [09:38:57] (03CR) 10Muehlenhoff: [C: 032] Provide override file for base::service_unit [puppet] - 10https://gerrit.wikimedia.org/r/305635 (owner: 10Muehlenhoff) [09:40:43] PROBLEM - DPKG on cp4005 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [09:41:55] 06Operations, 06Labs, 10wikitech.wikimedia.org, 13Patch-For-Review: Rename specific account in LDAP, Wikitech, Gerrit and Phabricator - https://phabricator.wikimedia.org/T85913#2581769 (10zeljkofilipin) [09:41:59] 06Operations, 10ops-eqiad, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2581771 (10Gehel) @Cmjohnson: so new arrangement could be: **A 3:** 6 nodes //**A 6:** + 2 nodes// **B 3:**... [09:42:43] RECOVERY - DPKG on cp4005 is OK: All packages OK [09:44:33] PROBLEM - puppet last run on mw1269 is CRITICAL: CRITICAL: puppet fail [09:44:42] PROBLEM - puppet last run on mw2102 is CRITICAL: CRITICAL: puppet fail [09:44:52] PROBLEM - puppet last run on mw1188 is CRITICAL: CRITICAL: puppet fail [09:45:23] PROBLEM - puppet last run on cp2007 is CRITICAL: CRITICAL: puppet fail [09:45:24] PROBLEM - puppet last run on restbase2005 is CRITICAL: CRITICAL: puppet fail [09:45:33] PROBLEM - puppet last run on wtp1004 is CRITICAL: CRITICAL: puppet fail [09:45:43] PROBLEM - puppet last run on mw2170 is CRITICAL: CRITICAL: puppet fail [09:45:43] PROBLEM - puppet last run on mw2179 is CRITICAL: CRITICAL: puppet fail [09:45:43] PROBLEM - puppet last run on mw2197 is CRITICAL: CRITICAL: puppet fail [09:45:43] PROBLEM - puppet last run on mw2183 is CRITICAL: CRITICAL: puppet fail [09:45:43] PROBLEM - puppet last run on db2070 is CRITICAL: CRITICAL: puppet fail [09:45:53] PROBLEM - puppet last run on mw1290 is CRITICAL: CRITICAL: puppet fail [09:45:53] PROBLEM - puppet last run on mw1283 is CRITICAL: CRITICAL: puppet fail [09:45:53] PROBLEM - puppet last run on wtp2007 is CRITICAL: CRITICAL: puppet fail [09:45:54] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: puppet fail [09:46:03] PROBLEM - puppet last run on wtp1015 is CRITICAL: CRITICAL: puppet fail [09:46:03] PROBLEM - puppet last run on rdb1002 is CRITICAL: CRITICAL: puppet fail [09:46:23] PROBLEM - puppet last run on mw1171 is CRITICAL: CRITICAL: puppet fail [09:46:23] PROBLEM - puppet last run on mw2164 is CRITICAL: CRITICAL: puppet fail [09:46:33] PROBLEM - puppet last run on mw1243 is CRITICAL: CRITICAL: puppet fail [09:46:42] moritzm: is it your merge? [09:46:44] PROBLEM - puppet last run on mw2064 is CRITICAL: CRITICAL: puppet fail [09:46:46] PROBLEM - puppet last run on rdb1003 is CRITICAL: CRITICAL: puppet fail [09:46:52] PROBLEM - puppet last run on cp2021 is CRITICAL: CRITICAL: puppet fail [09:46:52] PROBLEM - puppet last run on mc2002 is CRITICAL: CRITICAL: puppet fail [09:46:53] PROBLEM - puppet last run on wtp1007 is CRITICAL: CRITICAL: puppet fail [09:46:53] PROBLEM - puppet last run on db2069 is CRITICAL: CRITICAL: puppet fail [09:46:53] PROBLEM - puppet last run on mw2080 is CRITICAL: CRITICAL: puppet fail [09:46:53] PROBLEM - puppet last run on mw2227 is CRITICAL: CRITICAL: puppet fail [09:46:53] PROBLEM - puppet last run on mw2243 is CRITICAL: CRITICAL: puppet fail [09:46:54] PROBLEM - puppet last run on wtp2002 is CRITICAL: CRITICAL: puppet fail [09:46:54] RECOVERY - puppet last run on mw1188 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [09:47:03] PROBLEM - puppet last run on mw1280 is CRITICAL: CRITICAL: puppet fail [09:47:12] PROBLEM - puppet last run on mw1225 is CRITICAL: CRITICAL: puppet fail [09:47:13] PROBLEM - puppet last run on mw1201 is CRITICAL: CRITICAL: puppet fail [09:47:23] PROBLEM - puppet last run on mc1004 is CRITICAL: CRITICAL: puppet fail [09:47:24] RECOVERY - puppet last run on cp2007 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [09:47:28] volans: the argument count of changes and puppet only notices that on the second run unfortunately [09:47:33] PROBLEM - puppet last run on mw1271 is CRITICAL: CRITICAL: puppet fail [09:47:33] PROBLEM - puppet last run on mw1299 is CRITICAL: CRITICAL: puppet fail [09:47:33] PROBLEM - puppet last run on mw2199 is CRITICAL: CRITICAL: puppet fail [09:47:33] PROBLEM - puppet last run on mw2109 is CRITICAL: CRITICAL: puppet fail [09:47:34] PROBLEM - puppet last run on mw1266 is CRITICAL: CRITICAL: puppet fail [09:47:34] PROBLEM - puppet last run on mw2220 is CRITICAL: CRITICAL: puppet fail [09:47:47] does anyone know how to temporarily disable the irc bot? [09:47:52] PROBLEM - puppet last run on mw1189 is CRITICAL: CRITICAL: puppet fail [09:47:52] PROBLEM - puppet last run on thumbor1002 is CRITICAL: CRITICAL: puppet fail [09:47:53] PROBLEM - puppet last run on fluorine is CRITICAL: CRITICAL: puppet fail [09:47:54] PROBLEM - puppet last run on mw2221 is CRITICAL: CRITICAL: puppet fail [09:48:02] PROBLEM - puppet last run on cp3010 is CRITICAL: CRITICAL: puppet fail [09:48:03] PROBLEM - puppet last run on kafka1001 is CRITICAL: CRITICAL: puppet fail [09:48:06] kill it? :) [09:48:20] so we'll have ALL puppet alarms on icinga RED for ~1h? [09:48:23] PROBLEM - puppet last run on cp3047 is CRITICAL: CRITICAL: puppet fail [09:48:35] RECOVERY - puppet last run on mw1171 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [09:48:35] PROBLEM - puppet last run on mw1263 is CRITICAL: CRITICAL: puppet fail [09:48:40] btw moritzm the post-merge CI build on your merge failed: https://integration.wikimedia.org/ci/job/operations-puppet-doc/25826/console [09:48:54] PROBLEM - puppet last run on mw2127 is CRITICAL: CRITICAL: puppet fail [09:48:54] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: puppet fail [09:49:11] volans: that seems unrelated (the CI one) [09:49:13] PROBLEM - puppet last run on mw2196 is CRITICAL: CRITICAL: puppet fail [09:49:22] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: puppet fail [09:49:32] PROBLEM - puppet last run on mw2176 is CRITICAL: CRITICAL: puppet fail [09:49:34] PROBLEM - puppet last run on mw2114 is CRITICAL: CRITICAL: puppet fail [09:49:34] PROBLEM - puppet last run on mw2117 is CRITICAL: CRITICAL: puppet fail [09:49:34] PROBLEM - puppet last run on maps2002 is CRITICAL: CRITICAL: puppet fail [09:49:36] volans: is that the tcpircbot.py on neon? [09:49:43] PROBLEM - puppet last run on ruthenium is CRITICAL: CRITICAL: puppet fail [09:49:52] PROBLEM - puppet last run on mw2087 is CRITICAL: CRITICAL: puppet fail [09:49:53] PROBLEM - puppet last run on mw1166 is CRITICAL: CRITICAL: puppet fail [09:49:54] PROBLEM - puppet last run on mw2244 is CRITICAL: CRITICAL: puppet fail [09:49:55] moritzm: not sure I was checking in the repo, seems like tough [09:50:21] volans: moritzm: operations-puppet-doc is broken because puppet rdoc is being passed a file such as modules/whatever/bin/bla [09:50:22] PROBLEM - puppet last run on mw1285 is CRITICAL: CRITICAL: puppet fail [09:50:47] T143233 [09:50:47] T143233: post build failures for operations/puppet on operations-puppet-doc - https://phabricator.wikimedia.org/T143233 [09:51:37] moritzm: probably good to log that puppet fails are somehow expected [09:51:50] you got logmsgbot killed too ;) [09:51:58] !log temporarily stop irc bot, until puppet has self-healed [09:52:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:52:20] volans, moritzm: https://gerrit.wikimedia.org/r/#/c/298921/ <- this is the naive approach, we would need to avoid thundering herd issues perhaps by sleeping a random amount of seconds between the first and second puppet run [09:53:11] ema: rotfl for the "naive" :) [09:53:35] not on reboot? :-P [09:53:40] :) [09:53:49] volans: the upstart doesn't seem to work properly, when I stopped the service, ircecho was still running [09:54:11] _joe_ didn't agree with that patch though, but I forgot why [09:54:42] ETOOUGLY? [09:54:55] ema: ah, I'll have a look at when puppet is dealt with [09:55:20] ema no sleep in between? [09:55:48] volans: yeah the sleep needs to be added, but I think there was something more [10:00:26] !log ema@palladium conftool action : set/pooled=yes; selector: cp4005.ulsfo.wmnet (tags: ['dc=ulsfo', 'cluster=cache_upload', 'service=nginx']) [10:00:27] !log ema@palladium conftool action : set/pooled=yes; selector: cp4005.ulsfo.wmnet (tags: ['dc=ulsfo', 'cluster=cache_upload', 'service=varnish-fe']) [10:00:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:00:43] RECOVERY - puppet last run on wtp2007 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:01:34] RECOVERY - tcpircbot_service_running on neon is OK: PROCS OK: 1 process with command name python, args tcpircbot.py [10:01:42] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:01:42] RECOVERY - puppet last run on db2069 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:02:03] RECOVERY - puppet last run on mw1201 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:02:13] RECOVERY - puppet last run on mc1004 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [10:02:23] RECOVERY - puppet last run on restbase2005 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:02:23] RECOVERY - puppet last run on mw1266 is OK: OK: Puppet is currently enabled, last run 9 seconds ago with 0 failures [10:02:32] RECOVERY - puppet last run on ruthenium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:02:42] RECOVERY - puppet last run on db2070 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:03:14] RECOVERY - puppet last run on mw1263 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:03:33] RECOVERY - puppet last run on mw1269 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:03:43] RECOVERY - puppet last run on rdb1003 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [10:04:02] RECOVERY - puppet last run on mw1280 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [10:04:43] RECOVERY - puppet last run on fluorine is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:04:43] RECOVERY - puppet last run on mw1283 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:05:02] RECOVERY - puppet last run on kafka1001 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [10:05:03] RECOVERY - puppet last run on mw1285 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:05:22] RECOVERY - puppet last run on cp3047 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:05:52] RECOVERY - puppet last run on mc2002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:06:23] RECOVERY - puppet last run on mw1299 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:06:33] RECOVERY - puppet last run on maps2002 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [10:06:43] RECOVERY - puppet last run on mw1166 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [10:06:52] RECOVERY - puppet last run on mw1290 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:08:52] RECOVERY - puppet last run on mw1189 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:09:42] RECOVERY - puppet last run on mw1243 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [10:09:54] RECOVERY - puppet last run on mw2102 is OK: OK: Puppet is currently enabled, last run 46 seconds ago with 0 failures [10:10:22] RECOVERY - puppet last run on mw1225 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:10:33] RECOVERY - puppet last run on mw1271 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:10:42] RECOVERY - puppet last run on mw2109 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:10:52] (03PS1) 10Jcrespo: Quote ARGS parameters for trusty compatibility [puppet] - 10https://gerrit.wikimedia.org/r/306645 (https://phabricator.wikimedia.org/T126757) [10:11:02] RECOVERY - puppet last run on mw2244 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:11:33] RECOVERY - puppet last run on mw2164 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:12:03] RECOVERY - puppet last run on mw2243 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:12:43] RECOVERY - puppet last run on mw2117 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:12:43] RECOVERY - puppet last run on mw2114 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:12:44] RECOVERY - puppet last run on mw2220 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [10:12:53] RECOVERY - puppet last run on mw2197 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:12:53] RECOVERY - puppet last run on mw2170 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:12:54] RECOVERY - puppet last run on mw2179 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:12:54] RECOVERY - puppet last run on mw2183 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [10:13:04] RECOVERY - puppet last run on mw2221 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [10:14:04] RECOVERY - puppet last run on mw2227 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:14:42] RECOVERY - puppet last run on mw2176 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [10:14:44] RECOVERY - puppet last run on mw2199 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:16:04] RECOVERY - puppet last run on mw2064 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:16:25] RECOVERY - puppet last run on mw2196 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [10:17:03] RECOVERY - puppet last run on mw2087 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [10:18:01] (03PS1) 10Volans: udpprofile: fix Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306646 (https://phabricator.wikimedia.org/T143559) [10:18:13] RECOVERY - puppet last run on mw2127 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:18:22] hashar: ^^^ [10:19:30] (03PS2) 10Muehlenhoff: Provide a systemd override unit for hhvm [puppet] - 10https://gerrit.wikimedia.org/r/306225 (https://phabricator.wikimedia.org/T143210) [10:22:08] volans: should I land the other tox patches ? [10:22:10] if they make sense [10:22:30] or should I reach out to ops list for more discussion [10:23:02] also I dont even know whether udpprofile is still used [10:23:12] (03PS1) 10Ema: 4.1.3-1wm2: Drop 0003-varnishd-nukelru.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/306647 [10:23:26] it's easier to fix flake8 than know if it's used usually ;) [10:24:14] :D [10:24:24] I think it used to be on noc.wm.org [10:25:09] (03CR) 10Hashar: [C: 031] "udpprofile is probably no more used. flake8 fix are sane though" [software] - 10https://gerrit.wikimedia.org/r/306646 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [10:25:21] I have +1 ed, can't +2 / merge on that repo [10:25:30] I can :) [10:26:50] (03CR) 10Muehlenhoff: "PCC for a trusty and jessie host: http://puppet-compiler.wmflabs.org/3836/" [puppet] - 10https://gerrit.wikimedia.org/r/306225 (https://phabricator.wikimedia.org/T143210) (owner: 10Muehlenhoff) [10:27:00] * volans checking your last changes [10:30:50] (03CR) 10Volans: [C: 031] "LGTM" [software] - 10https://gerrit.wikimedia.org/r/306010 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [10:32:24] (03CR) 10Volans: [C: 031] "LGTM" [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [10:33:04] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/306645 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [10:34:10] hashar: what about swiftrepl? [10:34:30] volans: don't know? what is up with that one ? [10:34:40] has a setup.py [10:34:53] ah [10:34:56] should not have it's own block in tox like clousea? [10:35:00] so need the same trick I did for clouseau [10:35:02] *clouseau [10:35:48] would have to enroll godog in swiftrepl :D [10:36:01] you can pretty much copy paste what I did for clouseau and it would probably just work [10:36:05] delta the flake8 issues [10:37:52] yep [10:59:26] (03CR) 10Jcrespo: [C: 032] Quote ARGS parameters for trusty compatibility [puppet] - 10https://gerrit.wikimedia.org/r/306645 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [11:00:45] (03PS1) 10Volans: salt-misc: fix Flake8 [software] - 10https://gerrit.wikimedia.org/r/306650 (https://phabricator.wikimedia.org/T143559) [11:05:00] (03CR) 10ArielGlenn: [C: 031] "thanks for the cleanup." [software] - 10https://gerrit.wikimedia.org/r/306650 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [11:05:56] (03CR) 10ArielGlenn: [C: 031] clouseau: fix setup.py / add tox with flake8 [software] - 10https://gerrit.wikimedia.org/r/306010 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:06:13] apergos: volans: land them as needed :} [11:06:20] I can't +2 / merge on that repo [11:07:29] hashar: suggestion on the order? I guess first the flake8 fixes or ignores [11:07:38] I had them chained [11:07:49] yours yes [11:07:55] I was talking those additional ones [11:08:03] for cleanuo [11:08:07] *cleanup [11:08:11] first is to fix clouseau sdist [11:08:11] https://gerrit.wikimedia.org/r/#/c/306010/ [11:08:23] it fails due to README_retention.txt not being included in the source tarball [11:08:30] though it is read() in setup.py [11:08:51] it is a bit awkward with introduction of MANIFEST.in [11:08:58] but I haven't found how to add a file in the source from setup.py [11:09:11] all docs i have found instructed to add a MANIFEST.in [11:09:28] that patch also add the tox.ini file [11:09:38] so one can git-review -d 306010 ; cd clouseau ; tox [11:09:40] and it should pass [11:09:40] ok, I have no context on clouseau, what is where it's used [11:09:54] it is merely for sdist [11:10:06] so has no impact on wherever the package is used as I understand it [11:11:28] checking tox on 306010 locally [11:11:31] it passes [11:12:31] and fails if I change something, so looks good to me [11:13:09] volans: it's the data retention auditing software [11:13:15] you can just merge, no problems [11:13:20] ok [11:13:27] (03CR) 10Volans: [C: 032] clouseau: fix setup.py / add tox with flake8 [software] - 10https://gerrit.wikimedia.org/r/306010 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:13:37] the other one can be just merged too, or I can do it if you prefer [11:13:56] sure, go ahead [11:14:48] hashar: then https://gerrit.wikimedia.org/r/#/c/306032 right/ [11:14:49] ? [11:15:12] (03CR) 10ArielGlenn: [C: 032] salt-misc: fix Flake8 [software] - 10https://gerrit.wikimedia.org/r/306650 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [11:16:07] volans: yes [11:16:09] (03PS2) 10Volans: Wrapper to invoke clouseau tox from root dir [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:16:10] will turn on ci [11:16:26] rebasing to check the Jenkins run again [11:16:56] (03CR) 10Volans: "check experimental" [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:17:48] turning it on with https://gerrit.wikimedia.org/r/306651 [11:18:19] but this one after we fix all errors right? [11:19:07] hashar: should not Jenkins have run the experimental here? https://gerrit.wikimedia.org/r/#/c/306032 [11:19:30] that one should pass [11:20:19] looks like is not running though [11:20:30] (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:20:47] running at https://integration.wikimedia.org/ci/job/tox/289/console [11:21:06] if Jenkins has rights to merge on that repo, I guess one can just CR+2 [11:21:10] and CI will land the patch now [11:22:03] you mean to avoid the submit? [11:22:20] (03CR) 10Volans: [C: 032] Wrapper to invoke clouseau tox from root dir [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:23:41] neat [11:24:54] but I still need to submit manually [11:24:55] volans: I had to tweak gerrit permissions. Can you remove your CR+2 from https://gerrit.wikimedia.org/r/#/c/306032/ and CR+2 again ? [11:24:56] it's ok I guess [11:25:02] ok, sure [11:25:09] JenkinsBot lacked the permission to Submit and to change Verified vote [11:25:21] (03CR) 10Volans: "testing Jenkins permissions" [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:25:31] (03PS1) 10ArielGlenn: salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 [11:25:39] (03CR) 10jenkins-bot: [V: 04-1] salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [11:25:44] hahaha nice [11:26:08] apergos: yeah needs https://gerrit.wikimedia.org/r/#/c/306032/ [11:26:11] (03PS2) 10Filippo Giunchedi: nagios_common: add check_prometheus_metric [puppet] - 10https://gerrit.wikimedia.org/r/300863 [11:26:25] (03CR) 10Volans: [C: 032] Wrapper to invoke clouseau tox from root dir [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:26:31] hashar: done [11:26:53] (03Merged) 10jenkins-bot: Wrapper to invoke clouseau tox from root dir [software] - 10https://gerrit.wikimedia.org/r/306032 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:27:01] magic [11:27:05] great! [11:27:17] (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [11:27:23] thanks for the recheck [11:27:33] and since CI always merge the patch against tip of the branch, Ariel patch https://gerrit.wikimedia.org/r/#/c/306653/ does not need to be rebased [11:27:44] oh yay [11:27:46] CI pick the patch and merge it on tip of master (which now has the patch introducing tox.ini) [11:27:51] so it should work just fine ™ [11:27:57] heh heh [11:28:01] hashar: but apergos patch will not be tested right? [11:28:11] we need https://gerrit.wikimedia.org/r/#/c/306033/1/tox.ini first [11:28:16] there is no automatic recheck whenever a patch is merged [11:28:20] ah yes please [11:28:23] so have to recheck manually [11:28:24] or rebase [11:28:27] I will wait for tox.ini [11:28:33] that would fail because of other pieces that are failing around [11:28:39] AIUI [11:28:47] ohhhh [11:29:02] meh sigh [11:29:05] so flake8 does not run from root of repo yet [11:29:10] https://gerrit.wikimedia.org/r/#/c/306033/ has the basic logic [11:29:28] but since bunch of scripts are failing, it is not going to pass until errors are ignored or all issues are adressed [11:29:32] well shall I merge my pep8/pylint fixes or wait? not sure what's best [11:29:33] we can tweak that patch to ignore all currently failing errors [11:29:50] (03PS2) 10Hashar: Add flake8 at root of repo [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) [11:30:19] (03CR) 10jenkins-bot: [V: 04-1] Add flake8 at root of repo [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [11:32:51] hashar: my patch fix all uppprofile, I'm tempted to merge it [11:33:03] don't anyone even look at 'checkhosts' directory in software, I see it's horrible [11:33:16] I'll do some fixes over the next day, it will be a lot [11:34:01] hashar: what about ignoring directories and then we re-add them one by one if they are used and it's worth? [11:34:08] so all new things will be checked [11:34:20] and legacy one will opt-in fixing the errors and removing the ignore [11:34:28] I am polishing up the patch [11:34:30] so it pass ignoring most errors [11:35:06] yes but it's harder to fix a single error across all files/projects that do a cleanup of a single project don't you think? [11:35:31] (03PS3) 10Hashar: Add flake8 at root of repo [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) [11:35:38] yeah :( [11:35:50] so probably want to keep PS2 of above patch [11:35:58] fix up the bunch of issues [11:36:28] or if fixed, amend PS3 I did and reduce the list of ignored errors and warnings [11:36:46] I was thinking to add to the exclude list the directories where there are failures [11:36:52] are like 7~10 [11:36:56] yeah that to [11:37:14] so you get flake8 passing on a few dir [11:37:22] then when cleaning one remove the name from the ignore and check that everything is passing in that dir [11:37:28] then when a subdir get fixed, remove it from the exclude ? [11:37:28] before merging [11:37:32] yep [11:37:35] yeah looks smarter [11:37:55] feel free to tweak https://gerrit.wikimedia.org/r/306033 in that direction :} [11:38:21] I like your approach for a single big project, but this is bunch of misc stuff [11:38:24] ok will do [11:38:33] thanks for the precious help! [11:38:42] poke me as needed, will be happy to help [11:50:13] lunch & [12:00:30] !log temporarily disabling puppet on kafka* hosts (to enable ferm changes in controlled stages) [12:00:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:02:37] (03PS3) 10Muehlenhoff: Kafka brokers: Limit access to production and fundraising networks [puppet] - 10https://gerrit.wikimedia.org/r/305969 [12:03:21] 06Operations, 06WMF-Legal, 06WMF-NDA-Requests: ZhouZ needs access to WMF-NDA group - https://phabricator.wikimedia.org/T98722#2582043 (10Aklapper) Folks who want access to file a task under #wmf-nda-requests. I verify the account and check if it was created by Office IT. If it was not, I ask the person to ma... [12:04:07] 06Operations, 10Ops-Access-Requests: Requesting access to the statistics host(s) for flemmerich - https://phabricator.wikimedia.org/T143881#2582044 (10flemmerich) [12:04:10] (03CR) 10Muehlenhoff: [C: 032] Kafka brokers: Limit access to production and fundraising networks [puppet] - 10https://gerrit.wikimedia.org/r/305969 (owner: 10Muehlenhoff) [12:14:43] mobrovac: FYI --^ [12:15:15] it should be a no-op and I am watching grafana with Moritz, but if you see anything weird let us know [12:15:44] oh ok [12:16:01] is this going out now? [12:17:02] mobrovac: we're enabling puppet in steps, so far on kafka2* [12:17:33] k, lemme check CP in codfw to see if there are conn provblems [12:18:31] looks good, CP seems to be happy [12:24:02] PROBLEM - puppet last run on mw2203 is CRITICAL: CRITICAL: Puppet has 1 failures [12:25:12] (03CR) 10Filippo Giunchedi: [C: 032] nagios_common: add check_prometheus_metric [puppet] - 10https://gerrit.wikimedia.org/r/300863 (owner: 10Filippo Giunchedi) [12:25:17] (03PS3) 10Filippo Giunchedi: nagios_common: add check_prometheus_metric [puppet] - 10https://gerrit.wikimedia.org/r/300863 [12:26:40] mobrovac: yeah I was watching Eventbus and kafka's grafana dashboards [12:26:49] we are proceeding with kafka100[12] [12:27:28] kk [12:34:43] (03CR) 10BBlack: [C: 031] 4.1.3-1wm2: Drop 0003-varnishd-nukelru.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/306647 (owner: 10Ema) [12:41:57] (03CR) 10Ema: [C: 032] 4.1.3-1wm2: Drop 0003-varnishd-nukelru.patch [debs/varnish4] (debian-wmf) - 10https://gerrit.wikimedia.org/r/306647 (owner: 10Ema) [12:44:53] (03PS4) 10Volans: Add flake8 at root of repo [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [12:46:01] elukey: sorry Icant find the ticket about zuul package anymore :( [12:46:09] hashar: ^^^ I'm unsure about 2 things: 1) should we remove .pep8? 2) I've used dirname/** to avoid to exclude foo/dirname/... and I dind't find a better syntax, flake8 docs are not very detailed on this [12:46:44] hashar: https://phabricator.wikimedia.org/T140894 [12:46:47] volans: the .pep8 can (should?) be dropped and its bit moved to the tox.ini under [flake8] [12:47:07] volans: the exclude, yeah it is not really described. I have no idea how it does the matching :( [12:47:16] volans: if foo/** exclude them properly, I guess it is good [12:47:33] yes it exclude foo/** but not bar/foo/** from my tests [12:49:13] RECOVERY - puppet last run on mw2203 is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [12:50:06] (03CR) 10Thiemo Mättig (WMDE): [C: 031] "Do we *really* need to announce this? This affects test systems only." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208655 (https://phabricator.wikimedia.org/T94416) (owner: 10Aude) [12:53:43] (03CR) 10Aude: "afaik the pywikibot developers use test.wikidata etc. for some testing, and perhaps also people testing lua" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/208655 (https://phabricator.wikimedia.org/T94416) (owner: 10Aude) [12:53:50] (03CR) 10Thcipriani: [C: 031] "Derp. You're right, All the $(dirname $path) stuff in the exec had me thinking of scap::target." [puppet] - 10https://gerrit.wikimedia.org/r/306429 (owner: 10Giuseppe Lavagetto) [12:57:08] hashar: ready for swat? [12:57:41] 10Blocked-on-Operations, 06Operations, 10Continuous-Integration-Infrastructure, 07Zuul: Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T140894#2582143 (10hashar) Looks like I have been building it without dpkg-gen-changes -sa to force the inclusion of the orig... [12:58:11] elukey: that is annoying :( stupid .changes I have replied to the task, maybe the fields can be added manually to .changes or the tarball copied at the proper place under the /pool/ [12:58:14] zeljkof: yeah [12:58:16] kart_: ping } [12:59:02] hashar: the usual hangout, or just here? [12:59:11] depends on kart_ I guess [12:59:28] it is too hot for me to hangout outside [12:59:30] but I can survive :D [13:00:04] hashar, Dereckson, addshore, and aude: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T1300). [13:00:04] kart_: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:32] around. [13:00:40] hashar: ^ [13:00:59] I can SWAT today! [13:01:05] zeljkof: nice. [13:01:10] kart_: want to join hangout? [13:01:17] zeljkof: link please. [13:01:26] I am new to deployments, so hashar is helping me [13:01:51] kart_, hashar: https://hangouts.google.com/hangouts/_/wikimedia.org/euswat [13:01:53] see you there [13:04:13] PROBLEM - puppet last run on neon is CRITICAL: CRITICAL: puppet fail [13:04:31] neon could be me, taking a look [13:09:04] !log puppet re-enabled on all kafka* hosts [13:09:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:10:12] (03PS1) 10Filippo Giunchedi: nagios_common: use require_package [puppet] - 10https://gerrit.wikimedia.org/r/306658 [13:10:15] +2d https://gerrit.wikimedia.org/r/#/c/306654/, waiting for it to be merged [13:11:10] volans: guess we can land https://gerrit.wikimedia.org/r/#/c/306033/4 as is [13:11:12] it is good enough [13:11:41] hashar: great, if you'd +1 I'll merge :) [13:12:22] (03CR) 10Hashar: [C: 031] "Ignoring not passing scripts is a good idea. Can polish up the rest (such as /.pep8) later on." [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [13:12:24] volans: done ) [13:12:30] ignoring is a good idea [13:12:32] err [13:12:35] excluding is a good idea [13:12:42] :) [13:12:46] (03CR) 10Muehlenhoff: [C: 031] "Looks good, but we could cover the other packages above as well?" [puppet] - 10https://gerrit.wikimedia.org/r/306658 (owner: 10Filippo Giunchedi) [13:12:59] (03CR) 10Volans: [C: 032] Add flake8 at root of repo [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [13:13:51] (03PS1) 10BBlack: openssl (1.0.2h-1~wmf5) jessie-wikimedia; urgency=medium [debs/openssl] - 10https://gerrit.wikimedia.org/r/306659 [13:14:35] (03Merged) 10jenkins-bot: Add flake8 at root of repo [software] - 10https://gerrit.wikimedia.org/r/306033 (https://phabricator.wikimedia.org/T143559) (owner: 10Hashar) [13:14:41] (03PS2) 10Filippo Giunchedi: nagios_common: use require_package [puppet] - 10https://gerrit.wikimedia.org/r/306658 [13:15:03] moritzm: ^ [13:18:28] 06Operations, 10ops-eqiad, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2582280 (10Cmjohnson) @gehel: that works for me, let me know which nodes you want to remove from row D and... [13:19:55] hashar: last thing... we're missing "executable"-style python scripts without the .py extension... :/ [13:20:07] (03CR) 10Muehlenhoff: [C: 031] "Looks good, PCC also agrees: http://puppet-compiler.wmflabs.org/3839/" [puppet] - 10https://gerrit.wikimedia.org/r/306658 (owner: 10Filippo Giunchedi) [13:20:17] volans: yeah so that is taken in account in /.pep8 [13:20:23] it has: filename = *.py,geturls,swiftcleaner*,profiler-to-carbon [13:20:28] exclude=swiftcleaner.conf [13:20:41] yes but still centralized [13:20:41] can be moved from .pep8 to the tox.ini [flake8] section [13:20:50] and then remove .pep8 file [13:21:00] hard to know you have to add something if tests are not failing ;) [13:21:06] yeah [13:21:38] (03PS2) 10Muehlenhoff: ipsec_allow: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/303837 [13:22:00] it should check the shebang of files without extension in theory [13:22:30] (03PS2) 10Hashar: udpprofile: fix Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306646 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [13:22:56] (03CR) 10Hashar: "Rebased and adjusted the exclude list in tox.ini for flake8 to take in account /udpprofile/" [software] - 10https://gerrit.wikimedia.org/r/306646 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [13:23:08] hashar: I was doing the same :D [13:23:20] just adding the error on the #noqa to be more explicit [13:23:36] (03CR) 10Hashar: [C: 031] udpprofile: fix Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306646 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [13:27:04] if you don't mind I'll update mine :) [13:28:18] (03CR) 10Filippo Giunchedi: [C: 032] nagios_common: use require_package [puppet] - 10https://gerrit.wikimedia.org/r/306658 (owner: 10Filippo Giunchedi) [13:28:43] volans: yeah do :) [13:33:32] RECOVERY - puppet last run on neon is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [13:33:37] (03PS2) 10BBlack: openssl (1.0.2h-1~wmf5) jessie-wikimedia; urgency=medium [debs/openssl] - 10https://gerrit.wikimedia.org/r/306659 (https://phabricator.wikimedia.org/T131908) [13:33:56] (03PS3) 10Volans: udpprofile: fix Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306646 (https://phabricator.wikimedia.org/T143559) [13:40:09] !log temporarily disabling puppet on kafka*, mc* and rdb* hosts (to enable ferm ipsec change in controlled stages) [13:40:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:40:18] swat still going on [13:42:55] !log zfilipin@tin Synchronized php-1.28.0-wmf.15/extensions/UniversalLanguageSelector: SWAT: [[gerrit:306654|ext.uls.compactlinks: consistently normalize language codes (T143867)]] (duration: 00m 49s) [13:42:57] T143867: Compact language links broken in https://es.wikipedia.org/wiki/Luna - https://phabricator.wikimedia.org/T143867 [13:43:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:43:26] hashar: https://gerrit.wikimedia.org/r/#/c/306031 is still needed? [13:44:02] !log zfilipin@tin Synchronized php-1.28.0-wmf.16/extensions/UniversalLanguageSelector: SWAT: [[gerrit:306654|ext.uls.compactlinks: consistently normalize language codes (T143867)]] (duration: 00m 47s) [13:44:04] T143867: Compact language links broken in https://es.wikipedia.org/wiki/Luna - https://phabricator.wikimedia.org/T143867 [13:44:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:44:52] apergos: if you rebase now you should get the new thing, you have to remove the salt-misc line in tox.ini to enable tox on that subdir too ;0 [13:45:25] (03PS3) 10Muehlenhoff: ipsec_allow: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/303837 [13:45:49] !log European SWAT is done [13:45:53] zeljkof: random stuff :D [13:45:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:45:56] volans: ok cool [13:46:43] (03PS1) 10Volans: txt2yaml: fix and enable Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306663 (https://phabricator.wikimedia.org/T143559) [13:46:52] Nikerabbit, kart_ the docs https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers [13:46:52] apergos: something like this one ^^^ ;) [13:47:07] 06Operations, 10ops-eqiad, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2582354 (10faidon) Can we find some other rack in C other than C5? We surely must have some other rack to pu... [13:47:21] (03CR) 10Muehlenhoff: [C: 032] ipsec_allow: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/303837 (owner: 10Muehlenhoff) [13:47:35] (03PS2) 10Volans: txt2yaml: fix and enable Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306663 (https://phabricator.wikimedia.org/T143559) [13:47:39] sorry, like this one apergos ;) ^^^^^ [13:48:19] (03CR) 10Hashar: [C: 031] txt2yaml: fix and enable Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306663 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [13:49:07] (03PS2) 10ArielGlenn: salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 [13:49:15] (03CR) 10Volans: [C: 032] txt2yaml: fix and enable Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306663 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [13:49:31] yeah I figured it wasn't like the first one since that one didn't touch tox.ini :-D [13:49:31] (03CR) 10jenkins-bot: [V: 04-1] salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [13:49:44] (03Merged) 10jenkins-bot: txt2yaml: fix and enable Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306663 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [13:50:18] 80 line characters. not going to happen [13:50:21] * apergos fixes [13:51:20] 06Operations, 10ops-eqiad, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2582362 (10Cmjohnson) I have 3 slots remaining in C4 [13:54:51] (03PS3) 10ArielGlenn: salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 [13:55:02] I wonder how it's going to like that. we shall see [13:55:28] (03CR) 10jenkins-bot: [V: 04-1] salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [13:56:51] apergos: all this just for 2 lines over 80? :) [13:56:58] yep [13:57:05] it's easier to fix them :-P [13:57:13] that code is indevelopment [13:57:23] even if 'development' for some time has been on hold [13:57:25] then put 120, 105 looks strange :D [13:57:28] I hate the 80 chars [13:57:33] nah, 120 is too long :-D [13:57:39] addshore: regarding https://gerrit.wikimedia.org/r/#/c/303863/, would you please log in to tin and write the new public key to your home account someplace? (Just double-checking your identity) [13:57:42] 105 is the sweet spot so that's what I use [13:58:54] (03CR) 10Andrew Bogott: "This looks fine to me. Adam, for a security check, please log in to Tin and write this new key to a file called 'mynewkey' in your home d" [puppet] - 10https://gerrit.wikimedia.org/r/303863 (owner: 10Addshore) [14:02:32] (03PS4) 10ArielGlenn: salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 [14:03:01] (03CR) 10jenkins-bot: [V: 04-1] salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [14:04:27] * apergos snickers [14:05:13] fine fine. how do I get it to not look for setup.py, I thought the sdist stuff was it [14:06:29] apergos: when tox runs, it does first a setup.py sdist to craft the source tarball then install the result in the venv (as i understand it) [14:06:58] ok well at first I had it say 'no need for sdist' but it was still looking for setup.py [14:07:00] and salt-misc does not have any setup.py [14:07:13] no it doesn't, and I don't want to add one [14:07:16] so there is no need to introduce a new tox.ini and some custom flake8 rules [14:07:24] you can use the tox.ini and flake8 rules from the root of the repo [14:07:30] so drop https://gerrit.wikimedia.org/r/#/c/306653/4/salt-misc/tox.ini [14:07:35] well I want my line length rule [14:07:40] how do I get that in there [14:07:50] just for that directory? [14:07:53] and in https://gerrit.wikimedia.org/r/#/c/306653/4/tox.ini drop salt-misc from exclude [14:07:59] (03PS1) 10Ema: varnish: fix jemalloc chunk size config option name [puppet] - 10https://gerrit.wikimedia.org/r/306665 (https://phabricator.wikimedia.org/T135384) [14:08:46] apergos: we can relax max-line-length to 120 at the root of the repo [14:08:53] most scripts do not cut at 78 chars [14:09:06] then there is a single error reported apparently: ./salt-misc/parse-minion-output.py:178:80: E501 line too long (84 > 79 characters) [14:09:15] let's suppose for the sake of learning something that I wanted a separate rule for that directory [14:09:20] whether it's alnie length or something else [14:09:23] how would I manage that? [14:09:28] *a line length [14:09:50] cant :( [14:09:59] unless you chain to a sub tox [14:10:00] only if I add a setup.py eh [14:10:07] eg what you have done on https://gerrit.wikimedia.org/r/#/c/306653/4/tox.ini [14:10:09] but [14:10:21] since there is no setup.py, you want to skip the sdist part [14:10:25] yep [14:10:29] tox has support for that: skipsdist = True [14:10:36] which can be added to [testenv:salt-misc] [14:10:49] (03PS1) 10BBlack: ssl_ciphersuite: demote all 3DES for SWEET32 [puppet] - 10https://gerrit.wikimedia.org/r/306666 [14:10:53] when tox runs that virtualenv "salt-misc" skipsdist will cause it to not run setup.py sdist [14:10:59] can try that with your current patch [14:12:10] ah, so I have clearly misunderstood the "usedevelop" line [14:12:16] I thought that was to avoid sdist [14:12:19] (03CR) 10Hashar: salt-misc: little bit of pep8/pylint (031 comment) [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [14:12:21] commented [14:12:23] what does that do instead? :-D [14:12:39] usedevelop points to the current dir [14:12:47] eg does not install your package in the venv [14:12:56] you can even point me to something to read instead of having to answer all these questions :-D [14:13:08] might still need setup.py sdist. I can't really remember would have to try [14:13:17] ok well I can play, no worries [14:13:17] true! [14:13:29] http://tox.readthedocs.io/ [14:13:29] and http://tox.readthedocs.io/en/latest/config.html [14:13:53] volans: for tox doc ^^^ [14:14:39] 06Operations, 10DBA: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968#2582454 (10jcrespo) [14:14:53] thank you, bookmarked [14:15:28] 06Operations, 10DBA, 10Monitoring: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968#2518246 (10jcrespo) [14:15:36] 06Operations, 05Prometheus-metrics-monitoring: MySQL monitoring with prometheus - https://phabricator.wikimedia.org/T143896#2582458 (10fgiunchedi) [14:16:18] (03PS5) 10ArielGlenn: salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 [14:16:39] 06Operations, 10DBA, 10Monitoring: Display lag on grafana (prometheus) and dbtree from pt-heartbeat instead (or in addition) of Seconds_Behind_Master - https://phabricator.wikimedia.org/T141968#2582477 (10jcrespo) [14:16:41] 06Operations, 05Prometheus-metrics-monitoring: MySQL monitoring with prometheus - https://phabricator.wikimedia.org/T143896#2582476 (10jcrespo) [14:16:48] (03CR) 10jenkins-bot: [V: 04-1] salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [14:17:11] !log re-enabled puppet on kafka*, mc* and rdb* hosts [14:17:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:18:38] (03CR) 10BBlack: [C: 032] ssl_ciphersuite: demote all 3DES for SWEET32 [puppet] - 10https://gerrit.wikimedia.org/r/306666 (owner: 10BBlack) [14:28:19] hashar: yeah is where I was looking for exclude syntax without luck and I realize now that I have to polish it, we used 2 different syntaxes for sub projects and subdirs [14:31:45] (03PS6) 10ArielGlenn: salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 [14:34:03] (03PS1) 10BBlack: tlsproxy: drop ssl_session_timeout to 4h [puppet] - 10https://gerrit.wikimedia.org/r/306669 [14:35:06] (03CR) 10BBlack: [C: 032 V: 032] tlsproxy: drop ssl_session_timeout to 4h [puppet] - 10https://gerrit.wikimedia.org/r/306669 (owner: 10BBlack) [14:35:12] finally found the magic combination [14:35:26] it has to go in the [tox] stanza for whatever reason [14:35:42] :(( [14:35:50] (03CR) 10Faidon Liambotis: [C: 032] "LGTM. Note that the role is applied to fluorine (in prod) too." [puppet] - 10https://gerrit.wikimedia.org/r/303140 (owner: 10Muehlenhoff) [14:36:09] (03CR) 10Faidon Liambotis: [C: 032] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/303181 (owner: 10Muehlenhoff) [14:37:04] !log upgrading httpd to 2.4.10-10+deb8u6+wmf2 on mw1269/mw127[01] [14:37:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:38:45] (03PS7) 10ArielGlenn: salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 [14:39:24] do we pylint any repos? [14:39:36] I mean via tox etc [14:41:09] no idea [14:41:15] (03CR) 10ArielGlenn: [C: 032] salt-misc: little bit of pep8/pylint [software] - 10https://gerrit.wikimedia.org/r/306653 (owner: 10ArielGlenn) [14:41:46] ah, I thought you might have helped st one up someplace [14:41:50] nm then [14:43:11] flake8 also has a bunch of weird plugins https://pypi.python.org/pypi?%3Aaction=search&term=flake8&submit=search [14:43:55] you can always add pylint to the list of deps and commands [14:44:06] and rename the venv from "flake8" to "lint" or "linters" [14:47:40] that is a hella list of plugins [14:47:46] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack/Setup pay-lvs1001[2] - https://phabricator.wikimedia.org/T143900#2582571 (10Cmjohnson) [14:47:50] (03PS2) 10BBlack: varnish: fix jemalloc chunk size config option name [puppet] - 10https://gerrit.wikimedia.org/r/306665 (https://phabricator.wikimedia.org/T135384) (owner: 10Ema) [14:48:05] yeah I might have to try that as my code gets closer to passing pylint completely in some repos [14:48:10] (03CR) 10BBlack: [C: 032 V: 032] varnish: fix jemalloc chunk size config option name [puppet] - 10https://gerrit.wikimedia.org/r/306665 (https://phabricator.wikimedia.org/T135384) (owner: 10Ema) [14:48:19] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack/Setup pay-lvs1001[2] - https://phabricator.wikimedia.org/T143900#2582587 (10Cmjohnson) [14:50:37] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup berryllium replacment - https://phabricator.wikimedia.org/T143902#2582606 (10Cmjohnson) [14:51:04] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: rack/setup berryllium replacment - https://phabricator.wikimedia.org/T143902#2582619 (10Cmjohnson) [14:54:42] mw1269 and mw127[01] updated (httpd), all good from the logs [14:55:27] will complete the jessie ones tomorrow upgrading mw127[2345] [14:57:37] 06Operations, 10netops: configure port for frdb1001 - https://phabricator.wikimedia.org/T143248#2582634 (10Jgreen) a:05Jgreen>03None [15:00:09] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack/Setup pay-lvs1001[2] - https://phabricator.wikimedia.org/T143900#2582637 (10Jgreen) @Cmjohnson let's name this pay-lvs1003 and connected it to pfw1 2/0/4 which is afaik currently connected to eth2 on pay-lvs1001 and inactive [15:01:30] (03Abandoned) 10Volans: udpprofile: fix Flake 8 [software] - 10https://gerrit.wikimedia.org/r/306646 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [15:07:19] apergos: I guess you should add /.tox/ to a new .gitignore in salt-misc like clouseau, or we change the root .gitignore to ignore all the .toxes [15:07:28] hashar: ^^^ [15:07:34] maybe the root in this case [15:08:06] (03PS1) 10Filippo Giunchedi: prometheus: return 204 on / [puppet] - 10https://gerrit.wikimedia.org/r/306671 [15:08:08] (03PS1) 10Filippo Giunchedi: prometheus: add to LVS [puppet] - 10https://gerrit.wikimedia.org/r/306672 (https://phabricator.wikimedia.org/T126785) [15:08:15] I'm fine with both approaches, easier in the root, more portable in the specific ones of at some point gets migrated to a dedicated repo [15:08:42] 06Operations, 10ops-eqiad: Remove and destroy disks from old payments boxes decom server - https://phabricator.wikimedia.org/T140370#2582647 (10Cmjohnson) 05Open>03Resolved Disks have been removed from old payments servers, wiped and then degaussed. The servers have been unracked, removed from racktables.... [15:09:36] 06Operations, 10ops-eqiad, 10media-storage: diagnose failed(?) sda on ms-be1022 - https://phabricator.wikimedia.org/T140597#2582649 (10Cmjohnson) The ssds were swapped, the server needs a re-install. [15:10:08] cmjohnson1: nice ^ thanks I'll kick off a reinstall [15:11:03] PROBLEM - puppet last run on restbase2001 is CRITICAL: CRITICAL: Puppet has 1 failures [15:11:08] don't care much either wat [15:11:09] y [15:11:30] (03PS1) 10Volans: thumbstats: fix flake8 [software] - 10https://gerrit.wikimedia.org/r/306673 (https://phabricator.wikimedia.org/T143559) [15:12:45] (03CR) 10Volans: [C: 032] thumbstats: fix flake8 [software] - 10https://gerrit.wikimedia.org/r/306673 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [15:13:05] !log pooling cp4005 backend (varnish 4 cache_upload) T131502 [15:13:07] T131502: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502 [15:13:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:13:19] (03Merged) 10jenkins-bot: thumbstats: fix flake8 [software] - 10https://gerrit.wikimedia.org/r/306673 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [15:17:09] (03PS1) 10KartikMistry: Beta: Fix cxserver restbase_url [puppet] - 10https://gerrit.wikimedia.org/r/306674 (https://phabricator.wikimedia.org/T129284) [15:17:36] (03PS1) 10Jcrespo: Puppetize static configuration for prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) [15:18:32] (03PS2) 10Jcrespo: Puppetize static configuration for prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) [15:19:39] (03CR) 10jenkins-bot: [V: 04-1] Puppetize static configuration for prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [15:20:50] (03PS1) 10Volans: tox: unify exclude syntax [software] - 10https://gerrit.wikimedia.org/r/306676 (https://phabricator.wikimedia.org/T143559) [15:21:42] (03PS3) 10Jcrespo: Puppetize static configuration for prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) [15:21:58] apergos: hashar: ^^^ [15:23:40] oh that's good ab out the syntax [15:23:44] I mean, if it works :-D [15:23:54] I was not feeling cheerful about the different formats [15:24:17] me neigher, although I had to leave the ./ for the filenames, if I remove it it doesn't find them :/ [15:24:19] (03PS2) 10Dzahn: tor::relay: move debdeploy grains [puppet] - 10https://gerrit.wikimedia.org/r/306606 [15:27:25] (03CR) 10Florianschmidtwelzow: [C: 04-1] "code style" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [15:28:56] (03PS2) 10Volans: tox: unify exclude syntax [software] - 10https://gerrit.wikimedia.org/r/306676 (https://phabricator.wikimedia.org/T143559) [15:30:05] (03CR) 10Hashar: [C: 031] tox: unify exclude syntax (031 comment) [software] - 10https://gerrit.wikimedia.org/r/306676 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [15:31:43] RECOVERY - Check size of conntrack table on ms-be1022 is OK: OK: nf_conntrack is 0 % full [15:32:02] RECOVERY - configured eth on ms-be1022 is OK: OK - interfaces up [15:32:14] RECOVERY - swift-object-auditor on ms-be1022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-object-auditor [15:32:23] RECOVERY - swift-container-replicator on ms-be1022 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/swift-container-replicator [15:32:33] (03PS1) 10Dzahn: aptrepo: add bacula backups to role [puppet] - 10https://gerrit.wikimedia.org/r/306680 [15:33:13] RECOVERY - swift-container-server on ms-be1022 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-container-server [15:33:24] RECOVERY - swift-account-server on ms-be1022 is OK: PROCS OK: 41 processes with regex args ^/usr/bin/python /usr/bin/swift-account-server [15:33:52] RECOVERY - HP RAID on ms-be1022 is OK: OK: Slot 3: OK: 2I:4:2, 2I:4:1, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [15:36:13] (03PS1) 10Dzahn: tftp_server: add bacula backups to role [puppet] - 10https://gerrit.wikimedia.org/r/306682 [15:36:44] RECOVERY - puppet last run on restbase2001 is OK: OK: Puppet is currently enabled, last run 33 seconds ago with 0 failures [15:38:57] (03CR) 10Volans: [C: 032] tox: unify exclude syntax [software] - 10https://gerrit.wikimedia.org/r/306676 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [15:42:48] (03PS1) 10Rush: nova: trigger usage calculation refresh if stale for 30s [puppet] - 10https://gerrit.wikimedia.org/r/306683 [15:45:17] (03PS2) 10Rush: nova: trigger usage calculation refresh if stale for 30s [puppet] - 10https://gerrit.wikimedia.org/r/306683 [15:46:35] halfak: how about https://gerrit.wikimedia.org/r/#/c/302705/ . i see "After the move to ores-redis-02 is this still needed" [15:47:04] (03Merged) 10jenkins-bot: tox: unify exclude syntax [software] - 10https://gerrit.wikimedia.org/r/306676 (https://phabricator.wikimedia.org/T143559) (owner: 10Volans) [15:47:22] \o/ [15:48:38] (03Abandoned) 10Halfak: Lowers ores-redis maxmemory setting to 2.5GB [puppet] - 10https://gerrit.wikimedia.org/r/302705 (owner: 10Halfak) [15:48:41] issues on s1? [15:48:47] mutante, thanks for the ping. I just abandoned. [15:48:55] halfak: welcome :) thx [15:49:27] 80,83 and 89 have soft icinga timeouts [15:50:53] we have a slowdown in traffic [15:51:07] maybe unrelated [15:51:11] 06Operations, 10ops-eqiad, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Improve balance of nodes across rows for elasticsearch cluster eqiad - https://phabricator.wikimedia.org/T143685#2582768 (10Gehel) So final arrangement (even better with not all servers in same rack on row C): **A 3:** 6... [15:52:12] (03CR) 10Andrew Bogott: [C: 032] nova: trigger usage calculation refresh if stale for 30s [puppet] - 10https://gerrit.wikimedia.org/r/306683 (owner: 10Rush) [15:53:08] I can confirm some network or mediawiki issue ongoing in the last 5-10 minutes [15:53:42] we have a few dbs depooled [15:56:33] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [15:57:11] (03Abandoned) 10Ema: Revert "cache_misc: puppetize switch to file storage" [puppet] - 10https://gerrit.wikimedia.org/r/299526 (owner: 10Ema) [15:57:46] 100000 db failures per minute [15:59:02] PROBLEM - puppet last run on labservices1001 is CRITICAL: CRITICAL: Puppet has 1 failures [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T1600). [16:00:04] Josve05a: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:19] jynus: Looks like it was a spike but is calming down? [16:00:19] Rodger [16:00:26] (according to logstash) [16:00:33] yes, it is gone now [16:03:22] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [16:17:12] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2576804 (10leila) [16:18:43] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2582818 (10leila) @flemmerich @ph_singer please let us know if you have done steps 2 and 3 as well? [16:19:52] 06Operations, 10Traffic: Push gdnsd metrics to graphite and create a grafana dashboard - https://phabricator.wikimedia.org/T141258#2582822 (10elukey) a:03elukey [16:21:58] (03CR) 10Gehel: [C: 031] "Kool! Those typo checks are a great idea!" [puppet] - 10https://gerrit.wikimedia.org/r/306516 (owner: 10Dzahn) [16:24:44] RECOVERY - puppet last run on labservices1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [16:26:02] (03CR) 10Gehel: [C: 031] "LGTM, provided that the actual script is good (but that is another review)." [puppet] - 10https://gerrit.wikimedia.org/r/305673 (https://phabricator.wikimedia.org/T143048) (owner: 10MaxSem) [16:26:46] 06Operations, 10ops-eqiad: Rack/setup sodium (carbon/mirror server replacement) - https://phabricator.wikimedia.org/T139171#2582840 (10Cmjohnson) Still working on getting the disks replaced w/out any costs to us and possibly a refund. This is the latest message . Chris, Base on your request below return r... [16:27:28] Josve05a: I'm looking at https://gerrit.wikimedia.org/r/#/c/304696/4 for puppet swat, jynus looks good? [16:27:50] yes it has my +1 [16:28:10] although I will have to monitor it in the following days [16:28:36] ok I'll merge it [16:28:41] :D [16:28:46] (03PS5) 10Filippo Giunchedi: Monthly update of the "slowest" querypages on the English Wikipedia [puppet] - 10https://gerrit.wikimedia.org/r/304696 (https://phabricator.wikimedia.org/T142936) (owner: 10Nemo bis) [16:29:01] I will need to send a reminder for next month [16:29:43] I want to remember that some of the active ones may not work at all (take >24 hours) [16:29:57] heh for this kind of stuff I use the calendar notifications, works decently for one off [16:30:07] (03CR) 10Filippo Giunchedi: [C: 032] Monthly update of the "slowest" querypages on the English Wikipedia [puppet] - 10https://gerrit.wikimedia.org/r/304696 (https://phabricator.wikimedia.org/T142936) (owner: 10Nemo bis) [16:30:15] I will do that [16:31:32] I can check terbium when deployed [16:32:02] ostriches: https://gerrit.wikimedia.org/r/#/c/301484/1 still for puppet swat, correct? [16:32:47] Sure :) [16:33:02] (03PS2) 10Filippo Giunchedi: Git::clone: rename $default_source to $source and add github [puppet] - 10https://gerrit.wikimedia.org/r/301484 (owner: 10Chad) [16:35:10] (03CR) 10Filippo Giunchedi: [C: 032] Git::clone: rename $default_source to $source and add github [puppet] - 10https://gerrit.wikimedia.org/r/301484 (owner: 10Chad) [16:38:45] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM overall, minor nit" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/304327 (https://phabricator.wikimedia.org/T142784) (owner: 10Thcipriani) [16:41:39] (03CR) 10Filippo Giunchedi: "looks good, though having shard: set even if to "none" generally helps keeping things more obvious. e.g. dashboards the shard key will alw" [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [16:43:52] (03CR) 10Jforrester: On public wikis, show "Publish" rather than "Save" on edit pages (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [16:47:07] (03CR) 10Gehel: Adding Icinga checks for Maps (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/291023 (https://phabricator.wikimedia.org/T135647) (owner: 10Gehel) [16:47:42] (03CR) 10Jcrespo: "What about "multi" instead of "none"?" [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [16:49:03] 06Operations, 10netops: configure port for frdb1001 - https://phabricator.wikimedia.org/T143248#2582916 (10faidon) 05Open>03Resolved a:03faidon Done! [16:49:46] (03CR) 10Muehlenhoff: [C: 04-1] "Already done in git" [puppet] - 10https://gerrit.wikimedia.org/r/306606 (owner: 10Dzahn) [16:50:43] (03PS4) 10Jcrespo: Puppetize static configuration for prometheus-mysqld-exporter [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) [16:54:52] (03CR) 10Faidon Liambotis: [C: 04-1] elasticsearch - cleanup roles (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/304067 (owner: 10Gehel) [16:55:24] (03CR) 10Filippo Giunchedi: ""multi" sounds good! one other comment on source=" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [16:56:48] (03CR) 10Faidon Liambotis: "Whether this should be blocked on the relforge role or not, I'll leave up to you. From my PoV, since relforge is new and provided that we'" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/305519 (https://phabricator.wikimedia.org/T133844) (owner: 10Gehel) [17:00:04] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T1700). Please do the needful. [17:00:13] no parsoid deploy [17:00:24] no ores [17:00:26] No ORES. [17:00:27] :D [17:00:59] paravoid: thanks for the reviews! [17:01:02] (03Abandoned) 10Faidon Liambotis: Remove www.email.donate.wikimedia.org from DNS [dns] - 10https://gerrit.wikimedia.org/r/223245 (https://phabricator.wikimedia.org/T102827) (owner: 10Chmarkine) [17:01:15] gehel: np. you shoul nag me more :P [17:01:22] (I wouldn't mind that) [17:01:42] * gehel does not like nagging too much, but will try to do it more... [17:02:30] (03PS3) 10Dzahn: tor::relay: move debdeploy grains [puppet] - 10https://gerrit.wikimedia.org/r/306606 [17:02:35] (03Abandoned) 10Dzahn: tor::relay: move debdeploy grains [puppet] - 10https://gerrit.wikimedia.org/r/306606 (owner: 10Dzahn) [17:02:55] (03CR) 10Faidon Liambotis: [C: 032] Couple of tiny maintain-meta_p.py improvements [software] - 10https://gerrit.wikimedia.org/r/295608 (owner: 10Alex Monk) [17:03:22] PROBLEM - Router interfaces on pfw-eqiad is CRITICAL: CRITICAL: host 208.80.154.218, interfaces up: 109, down: 1, dormant: 0, excluded: 2, unused: 0BRge-2/0/4: down - pay-lvs1001:eth1BR [17:03:36] (03PS2) 10Faidon Liambotis: Couple of tiny maintain-meta_p.py improvements [software] - 10https://gerrit.wikimedia.org/r/295608 (owner: 10Alex Monk) [17:04:33] (03PS4) 10Dzahn: typos/jenkins: add 'wqds' as detectable typo [puppet] - 10https://gerrit.wikimedia.org/r/306516 [17:04:50] (03CR) 10Dzahn: [C: 032] typos/jenkins: add 'wqds' as detectable typo [puppet] - 10https://gerrit.wikimedia.org/r/306516 (owner: 10Dzahn) [17:05:07] (03CR) 10Platonides: "Wow, I wanted to propose this in the wiki due to a different issue, and didn't remember this at all!" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/239455 (https://phabricator.wikimedia.org/T113096) (owner: 10Platonides) [17:05:21] (03PS4) 10BBlack: Expand CSP report only test to elwiki. [puppet] - 10https://gerrit.wikimedia.org/r/306464 (owner: 10Brian Wolff) [17:05:30] (03CR) 10BBlack: [C: 032 V: 032] Expand CSP report only test to elwiki. [puppet] - 10https://gerrit.wikimedia.org/r/306464 (owner: 10Brian Wolff) [17:06:24] (03PS5) 10Dzahn: typos/jenkins: add 'wqds' as detectable typo [puppet] - 10https://gerrit.wikimedia.org/r/306516 [17:06:36] (03CR) 10Dzahn: [V: 032] typos/jenkins: add 'wqds' as detectable typo [puppet] - 10https://gerrit.wikimedia.org/r/306516 (owner: 10Dzahn) [17:13:41] (03CR) 10Dzahn: base/monitoring: add optional SMART disk check (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/304580 (https://phabricator.wikimedia.org/T86552) (owner: 10Dzahn) [17:14:00] (03PS1) 10BryanDavis: striker: Set nginx variant via hiera [puppet] - 10https://gerrit.wikimedia.org/r/306695 [17:14:02] (03PS1) 10BryanDavis: horizon: use service::uwsgi+nginx [puppet] - 10https://gerrit.wikimedia.org/r/306696 [17:14:04] (03PS1) 10BryanDavis: striker: Move nginx back to port 80 [puppet] - 10https://gerrit.wikimedia.org/r/306697 [17:15:04] (03CR) 10Jcrespo: Puppetize static configuration for prometheus-mysqld-exporter (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/306675 (https://phabricator.wikimedia.org/T126757) (owner: 10Jcrespo) [17:15:40] (03CR) 10jenkins-bot: [V: 04-1] horizon: use service::uwsgi+nginx [puppet] - 10https://gerrit.wikimedia.org/r/306696 (owner: 10BryanDavis) [17:16:39] (03CR) 10jenkins-bot: [V: 04-1] striker: Move nginx back to port 80 [puppet] - 10https://gerrit.wikimedia.org/r/306697 (owner: 10BryanDavis) [17:17:06] (03Abandoned) 10Faidon Liambotis: network: move external_networks to hiera as well [puppet] - 10https://gerrit.wikimedia.org/r/303176 (owner: 10Faidon Liambotis) [17:17:22] (03CR) 10Faidon Liambotis: [C: 04-1] "Duplicating this value in labs.yaml seems very... un-hierarchical and non-DRY. How can we fix this properly?" [puppet] - 10https://gerrit.wikimedia.org/r/302695 (owner: 10Alexandros Kosiaris) [17:17:24] (03PS2) 10BryanDavis: horizon: use service::uwsgi+nginx [puppet] - 10https://gerrit.wikimedia.org/r/306696 [17:18:27] (03PS2) 10BryanDavis: striker: Move nginx back to port 80 [puppet] - 10https://gerrit.wikimedia.org/r/306697 [17:18:41] (03CR) 10Dzahn: "using the watroles tool it seems that no instances are using this role. am i checking it right? https://tools.wmflabs.org/watroles/role/ro" [puppet] - 10https://gerrit.wikimedia.org/r/298906 (owner: 10Dzahn) [17:20:37] 06Operations, 10netops: Upgrade cr1-esams & cr2-knams to JunOS 13.3 - https://phabricator.wikimedia.org/T143913#2583037 (10faidon) [17:21:53] (03CR) 10Dzahn: "@Filippo do you know about the unused cassandrahosts partman recipes (vs. the ones that are in use) ?" [puppet] - 10https://gerrit.wikimedia.org/r/306501 (owner: 10Dzahn) [17:22:24] 06Operations, 10netops: Upgrade cr1-ulsfo & cr2-ulsfo to JunOS 13.3 - https://phabricator.wikimedia.org/T143914#2583053 (10faidon) [17:22:26] openssl 1.1.0 published [17:22:47] 06Operations, 10MediaWiki-General-or-Unknown, 06Services, 10Traffic: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093#2583073 (10Mholloway) >>! In T138093#2578360, @Jhernandez wrote: > I'm not sure how apps end up serializing parameters, going to ping @M... [17:22:57] 06Operations, 10netops: Network ACL rules to allow traffic from Analytics to Production for port 9160 - https://phabricator.wikimedia.org/T138609#2405243 (10faidon) @elukey, ping? [17:23:27] paravoid: I was about to write in --^ :) [17:24:11] I'll have a chat on Monday with Joseph but I am 99% sure that we could remove those rules. I also need to clean up ferm rules in puppet first [17:28:29] (03CR) 10Jcrespo: "The configuration seems solved, I am not doubtful about the firewall configuration- but we seemed to disagree on how to continue. What are" [puppet] - 10https://gerrit.wikimedia.org/r/306174 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [17:29:11] (03CR) 10Jcrespo: "The configuration issues seem solved, I am now doubtful about the firewall configuration- but we seemed to disagree on how to continue." [puppet] - 10https://gerrit.wikimedia.org/r/306174 (https://phabricator.wikimedia.org/T126757) (owner: 10Filippo Giunchedi) [17:31:20] 06Operations, 10Traffic, 10netops: Fix static IP fallbacks to Pybal LVS routes - https://phabricator.wikimedia.org/T143915#2583091 (10faidon) [17:38:30] 06Operations, 10Traffic, 10netops: Fix static IP fallbacks to Pybal LVS routes - https://phabricator.wikimedia.org/T143915#2583135 (10BBlack) Yes - I think in eqiad we only need to reshuffle git-ssh.wikimedia.org, ocg.svc.eqiad.wmnet, and our internal recdns IPs. I think it's likely in the other DCs the sit... [17:39:24] (03PS1) 10Giuseppe Lavagetto: puppetmaster: introduce a generic puppetmaster::web_frontend define [puppet] - 10https://gerrit.wikimedia.org/r/306702 (https://phabricator.wikimedia.org/T143869) [17:39:26] (03PS1) 10Giuseppe Lavagetto: puppetmaster: extract the passenger config from the virtualhost [puppet] - 10https://gerrit.wikimedia.org/r/306703 [17:47:29] (03CR) 10Andrew Bogott: [C: 032] striker: Set nginx variant via hiera [puppet] - 10https://gerrit.wikimedia.org/r/306695 (owner: 10BryanDavis) [17:47:35] (03PS2) 10Andrew Bogott: striker: Set nginx variant via hiera [puppet] - 10https://gerrit.wikimedia.org/r/306695 (owner: 10BryanDavis) [17:49:28] (03PS1) 10Mobrovac: Allow service-checker to read YAML-formatted specs [software/service-checker] - 10https://gerrit.wikimedia.org/r/306707 (https://phabricator.wikimedia.org/T136839) [17:52:43] (03PS7) 10Andrew Bogott: WIP: Horizon tab for modifying instance puppet config [puppet] - 10https://gerrit.wikimedia.org/r/294342 (https://phabricator.wikimedia.org/T91990) [17:53:36] (03CR) 10Dzahn: "that discussion was continued on https://phabricator.wikimedia.org/T75997#2582713 it looks" [puppet] - 10https://gerrit.wikimedia.org/r/306413 (https://phabricator.wikimedia.org/T75997) (owner: 10Paladox) [17:58:37] (03PS9) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [17:58:51] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2583207 (10ph_singer) Yes, I did complete them. [17:59:01] (03CR) 10jenkins-bot: [V: 04-1] striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [18:00:05] anomie, ostriches, thcipriani, hashar, twentyafterfour, and aude: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T1800). [18:00:05] James_F, dcausse, Amir1, and MatmaRex: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [18:00:13] hi. [18:00:22] o/ [18:00:37] Hey, the patches are super straightforward, can't be tested and only fix some maintenance scripts [18:00:46] I can SWAT today [18:05:31] (03CR) 10Rush: [C: 032] labstore: Change nfs mount removal logic to not declaring it as file resource [puppet] - 10https://gerrit.wikimedia.org/r/306280 (owner: 10Madhuvishy) [18:05:40] (03PS3) 10Rush: labstore: Change nfs mount removal logic to not declaring it as file resource [puppet] - 10https://gerrit.wikimedia.org/r/306280 (owner: 10Madhuvishy) [18:06:29] (03CR) 10Rush: [V: 032] labstore: Change nfs mount removal logic to not declaring it as file resource [puppet] - 10https://gerrit.wikimedia.org/r/306280 (owner: 10Madhuvishy) [18:06:40] lots of non-config stuff. Come on CI... [18:08:09] (03PS10) 10BryanDavis: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) [18:12:50] (03PS9) 10Madhuvishy: nfs: Modify /data/scratch on nfs clients to point to mount from labstore1003 [puppet] - 10https://gerrit.wikimedia.org/r/306019 (https://phabricator.wikimedia.org/T134896) [18:14:08] (03CR) 10BryanDavis: "Finally got this to work on my testing server." [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [18:14:10] (03CR) 10Dzahn: [C: 04-1] "causes duplicate declaration on carbon.. hmmm.. http://puppet-compiler.wmflabs.org/3840/carbon.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/306680 (owner: 10Dzahn) [18:15:19] dcausse: https://gerrit.wikimedia.org/r/#/c/306687/ is alive on mw1099, check please [18:15:25] live even [18:15:34] thcipriani: it's a jobqueue issue, not sure how to test :/ [18:16:18] dcausse: ack. there don't seem to be any explosions there, so I'll push it out. [18:16:32] thcipriani: thanks [18:18:17] (03PS2) 10Dzahn: aptrepo: add bacula backups to role [puppet] - 10https://gerrit.wikimedia.org/r/306680 [18:18:52] jynus: did you ban these cirrus jobs queries? [18:19:16] !log thcipriani@tin Synchronized php-1.28.0-wmf.16/extensions/CirrusSearch/includes/Job/CheckerJob.php: SWAT: [[gerrit:306687|Fix a typo in BC code that handles toId => toPageId (T143862)]] (duration: 00m 47s) [18:19:18] T143862: s3 throughput tripled since 24 august - https://phabricator.wikimedia.org/T143862 [18:19:22] ^ dcausse live everywhere [18:19:51] thcipriani: seems to work (according to logs) [18:20:07] dcausse: nice :) [18:20:39] (03PS1) 10Dzahn: installserver: put aptrepo role also on install2001 [puppet] - 10https://gerrit.wikimedia.org/r/306713 [18:22:21] dcausse: https://gerrit.wikimedia.org/r/#/c/306690/ is on mw1099 if there's anything to test there [18:22:42] thcipriani: it's just maint script :) [18:23:05] MatmaRex: https://gerrit.wikimedia.org/r/#/c/306706/ is also live on mw1099 since it came down with the rebase/pull [18:23:19] dcausse: ack, going out everywhere :) [18:23:32] thcipriani: thanks! :) [18:24:13] thcipriani: thanks, it works as expected [18:24:44] (03CR) 10Dzahn: [C: 032] "now no-op on carbon. adds backups on install1001. http://puppet-compiler.wmflabs.org/3841/" [puppet] - 10https://gerrit.wikimedia.org/r/306680 (owner: 10Dzahn) [18:24:55] (03PS3) 10Dzahn: aptrepo: add bacula backups to role [puppet] - 10https://gerrit.wikimedia.org/r/306680 [18:25:31] MatmaRex: cool, thanks for checking, I'll get that out after the cirrussearch change [18:25:56] !log thcipriani@tin Synchronized php-1.28.0-wmf.16/extensions/CirrusSearch/includes: SWAT: [[gerrit:306690|Use the UserTesting framework in maint scripts]] (duration: 00m 53s) [18:26:04] ^ dcausse live everywhere [18:28:08] thcipriani: thanks!, I can't really test (I'll run it tomorrow) [18:29:42] !log thcipriani@tin Synchronized php-1.28.0-wmf.16/resources/src/mediawiki/htmlform/hide-if.js: SWAT: [[gerrit:306706|mw.htmlform: Do not refer to OO.ui if it might not be loaded (T143850)]] (duration: 00m 49s) [18:29:43] T143850: Conditional hiding of password fields broken in signup form - https://phabricator.wikimedia.org/T143850 [18:29:48] ^ MatmaRex live everywhere [18:29:56] dcausse: sounds good, thanks [18:30:06] Amir1: fine if yours go out together? [18:30:15] thcipriani: yes [18:30:25] (03PS2) 10Dzahn: tftp_server: add bacula backups to role [puppet] - 10https://gerrit.wikimedia.org/r/306682 [18:31:44] thcipriani: thanks! [18:33:23] !log thcipriani@tin Synchronized php-1.28.0-wmf.16/extensions/ORES: SWAT: [[gerrit:306689|Fix CheckModelVersions by changing order of actions (T143799)]] [[gerrit:306691|Fix for purging scores (T143798)]] (duration: 00m 51s) [18:33:25] T143799: Update model versions is badly broken in ORES extension - https://phabricator.wikimedia.org/T143799 [18:33:25] T143798: Update model versions is badly broken in ORES extension - https://phabricator.wikimedia.org/T143798 [18:33:40] ^ Amir1 changes live [18:33:49] Can't be tested [18:33:54] thanks ! [18:33:58] yw :) [18:36:31] (03PS3) 10Dzahn: tftp_server: add bacula backups to role [puppet] - 10https://gerrit.wikimedia.org/r/306682 [18:37:48] (03Abandoned) 10BryanDavis: horizon: use service::uwsgi+nginx [puppet] - 10https://gerrit.wikimedia.org/r/306696 (owner: 10BryanDavis) [18:38:17] (03Abandoned) 10BryanDavis: striker: Move nginx back to port 80 [puppet] - 10https://gerrit.wikimedia.org/r/306697 (owner: 10BryanDavis) [18:39:40] (03CR) 10BryanDavis: "The uwsgi+nginx POC patch has been abandoned due to the co-location of labtestwikitech on labtestweb2001." [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [18:40:12] (03CR) 10Dzahn: [C: 032] "no-op on carbon, adds backups on install1001/2001 like carbon has them" [puppet] - 10https://gerrit.wikimedia.org/r/306682 (owner: 10Dzahn) [18:49:21] (03PS2) 10Andrew Bogott: The labs puppet backend now requires python3-yaml. [puppet] - 10https://gerrit.wikimedia.org/r/306289 [18:50:45] (03PS1) 10Dzahn: installserver: remove duplicated TFTP part [puppet] - 10https://gerrit.wikimedia.org/r/306718 [18:51:03] (03CR) 10Andrew Bogott: [C: 032] The labs puppet backend now requires python3-yaml. [puppet] - 10https://gerrit.wikimedia.org/r/306289 (owner: 10Andrew Bogott) [18:55:07] (03PS2) 10Dzahn: installserver: remove duplicated TFTP part [puppet] - 10https://gerrit.wikimedia.org/r/306718 [18:58:20] (03CR) 10Dzahn: [C: 032] "no-op, the only diff on carbon is the motd" [puppet] - 10https://gerrit.wikimedia.org/r/306718 (owner: 10Dzahn) [18:58:33] (03PS3) 10Dzahn: installserver: remove duplicated TFTP part [puppet] - 10https://gerrit.wikimedia.org/r/306718 [18:59:10] jouncebot: next [18:59:11] In 0 hour(s) and 0 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T1900) [18:59:39] hashar: *ding* *ding* *ding* .. coins falling out [19:00:04] hashar: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T1900). Please do the needful. [19:00:41] hmm, my server time is off 6 minutes [19:01:16] https://gerrit.wikimedia.org/r/306719 all wikis to 1.28.0-wmf.16 [19:02:10] !log hashar@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.16 [19:07:07] (03PS2) 10Dzahn: installserver: put aptrepo role also on install2001 [puppet] - 10https://gerrit.wikimedia.org/r/306713 [19:09:56] (03PS1) 10DCausse: CirrusSearch BM25 A/B test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) [19:10:24] (03CR) 10Dzahn: [C: 032] "apt.wm.org is still an alias for carbon, this is just preparing it for later" [puppet] - 10https://gerrit.wikimedia.org/r/306713 (owner: 10Dzahn) [19:10:39] (03CR) 10jenkins-bot: [V: 04-1] CirrusSearch BM25 A/B test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) (owner: 10DCausse) [19:10:40] (03PS3) 10Dzahn: installserver: put aptrepo role also on install2001 [puppet] - 10https://gerrit.wikimedia.org/r/306713 [19:15:08] wmf.16 looks almost fine [19:15:25] we have some spikes of "Error connecting to db_server: error" though :- [19:17:27] all being on "enwiki" :/ [19:18:38] Ugh again? [19:18:42] That happened this morning. [19:18:51] (with wmf.15, so not exactly 16/15 related) [19:19:19] (03PS2) 10DCausse: CirrusSearch BM25 A/B test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) [19:20:07] (03Abandoned) 10Dduvall: ci: Role for running Raita [puppet] - 10https://gerrit.wikimedia.org/r/208024 (owner: 10Dduvall) [19:21:31] (03PS3) 10DCausse: CirrusSearch BM25 A/B test config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306721 (https://phabricator.wikimedia.org/T143586) [19:29:10] (03PS1) 10Gehel: maps - grant privileges on sequences to all known users [puppet] - 10https://gerrit.wikimedia.org/r/306728 [19:31:00] (03PS2) 10Gehel: maps - grant privileges on sequences to all known users [puppet] - 10https://gerrit.wikimedia.org/r/306728 [19:42:29] (03PS5) 10Phedenskog: Enable PerformanceInspector extension for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304992 [19:49:09] 06Operations: Update ICU version to 55.1 - https://phabricator.wikimedia.org/T143931#2583636 (10Ladsgroup) [19:52:53] (03PS1) 10Brian Wolff: Include favicon.ico in image CSP report-only header [puppet] - 10https://gerrit.wikimedia.org/r/306733 [20:05:29] Hi, what's with labs replicating? SQL query "select page_title from page where page_namespace=14 and page_is_redirect=1;" on cswiki_p returns no rows even pages which should meet this query exists, see https://cs.wikipedia.org/w/index.php?title=Kategorie:Obce_ve_Francii_podle_departementu&redirect=no [20:07:07] page_is_redirect can occasionally be messed up in db [20:07:23] as in, I've heard of that happening before [20:07:33] although its pretty rare [20:07:42] so may or may not be a replication problem [20:07:50] And is there any way to refill it? [20:08:21] If its messed up on the mediawiki side (not the replication side), a null edit would probably fix it [20:08:59] Urbanecm_: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Database/Replica_drift [20:09:09] assuming it's on the replication side. Please try the null edit first. [20:09:58] Also query "select count(*) from revision where rev_user_text="UrbanecmBot" and rev_timestamp like "20160825%";" returns 18 even my bot must save more than 18 edits this date. See https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:P%C5%99%C3%ADsp%C4%9Bvky/UrbanecmBot . [20:10:14] Urbanecm_: https://tools.wmflabs.org/replag/ [20:11:37] valhallasw`cloud: I made a null edit on https://cs.wikipedia.org/w/index.php?title=Kategorie:Obce_ve_Francii_podle_departementu&redirect=no several times, nothing changes... [20:12:09] Thanks for the link. [20:12:09] s [20:12:15] Urbanecm_: it's just some replag. [20:12:19] <_joe_> valhallasw`cloud: I merged your patch, but still not deployed it (puppet-compiler) [20:12:33] _joe_: ah, hey. [20:12:43] when would be convenient for you? [20:12:48] I didn't knew that, thanks for the explanation ;) [20:13:23] <_joe_> valhallasw`cloud: actually, I'm leaving for a week tomorrow evening [20:13:44] <_joe_> let's see if I make it tomorrow, but the list of things I should finish tomorrow is getting worryingly long [20:14:28] sounds good [20:15:02] I'll probably be online from ~10.30 CEST [20:17:17] <_joe_> heh no promises but it's easy enough that I should be able to do it tomorrow morning [20:18:13] I think the biggest effort might be to actually load facts for tools hosts. I created a dump on tools-puppetmaster, and that seemed to work OK, so I have hopes it'll just work [20:22:27] 06Operations, 10MediaWiki-General-or-Unknown, 06Services, 10Traffic: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093#2583763 (10Fjalapeno) On iOS parameters are constructed using a dictionary which is unordered. (There is no concept of an ordered dictio... [20:29:37] 06Operations, 10Cassandra, 10procurement: SSDs for repurposed AMS nodes - https://phabricator.wikimedia.org/T143935#2583786 (10Eevans) [20:30:00] 06Operations, 06WMF-Legal, 06WMF-NDA-Requests: ZhouZ needs access to WMF-NDA group - https://phabricator.wikimedia.org/T98722#2583800 (10ZhouZ) I see - I have done more investigation into this process. Will update the other task then and close this one. [20:35:05] jouncebot: refresh [20:35:07] I refreshed my knowledge about deployments. [20:35:15] jouncebot: next [20:35:15] In 0 hour(s) and 24 minute(s): Tool Labs admin console ("Striker") (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T2100) [20:38:21] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 3 others: Look into encrypting Elasticsearch traffic - https://phabricator.wikimedia.org/T124444#2583852 (10ksmith) [20:38:29] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 3 others: Activate SSL + connection pooling for CirrusSearch on PROD - https://phabricator.wikimedia.org/T131839#2583855 (10ksmith) [20:39:01] 06Operations, 10CirrusSearch, 06Discovery, 03Discovery-Search-Sprint, 13Patch-For-Review: Only use newer (elastic10{16..47}) servers as master capable elasticsearch nodes - https://phabricator.wikimedia.org/T112556#2583861 (10ksmith) [20:39:55] 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Restart elasticsearch clusters for Java update - https://phabricator.wikimedia.org/T135499#2583867 (10ksmith) [20:39:55] 06Operations, 10CirrusSearch, 06Discovery, 03Discovery-Search-Sprint, and 3 others: "Elastica: missing curl_init_pooled method" due to mwscript job running with PHP 5 on terbium - https://phabricator.wikimedia.org/T132751#2583868 (10ksmith) [20:40:51] 06Operations, 06Discovery, 10Elasticsearch, 10MediaWiki-Vendor, and 3 others: Upgrade ruflin/elastica to 2.3.1 - https://phabricator.wikimedia.org/T127831#2583880 (10ksmith) [20:41:15] 06Operations, 06Discovery, 10Elasticsearch, 10Wikimedia-Logstash, and 2 others: Upgrade ElasticSearch to 1.7.5 - https://phabricator.wikimedia.org/T122697#2583884 (10ksmith) [20:42:54] (03PS6) 10Phedenskog: Enable PerformanceInspector extension for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304992 [20:48:47] (03CR) 10Ori.livneh: [C: 032] Enable PerformanceInspector extension for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304992 (owner: 10Phedenskog) [20:49:12] (03Merged) 10jenkins-bot: Enable PerformanceInspector extension for labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/304992 (owner: 10Phedenskog) [20:50:50] I'm going to deploy a labs-only change [20:50:55] (03CR) 10BBlack: [C: 032] Include favicon.ico in image CSP report-only header [puppet] - 10https://gerrit.wikimedia.org/r/306733 (owner: 10Brian Wolff) [20:53:10] !log ori@tin Synchronized wmf-config/extension-list-labs: I02ae5be610: Enable PerformanceInspector extension for labs [1/3] (duration: 00m 46s) [20:53:59] !log ori@tin Synchronized wmf-config/InitialiseSettings-labs.php: I02ae5be610: Enable PerformanceInspector extension for labs [2/3] (duration: 00m 48s) [20:54:46] !log ori@tin Synchronized wmf-config/CommonSettings-labs.php: I02ae5be610: Enable PerformanceInspector extension for labs [3/3] (duration: 00m 46s) [21:00:04] bd808 and yuvipanda: Dear anthropoid, the time has come. Please deploy Tool Labs admin console ("Striker") (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T2100). [21:00:15] o/ [21:00:30] * bd808 looks for a yuvipanda [21:01:39] (03PS2) 10Giuseppe Lavagetto: puppetmaster: add puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/306702 (https://phabricator.wikimedia.org/T143869) [21:01:41] (03PS2) 10Giuseppe Lavagetto: puppetmaster: extract the passenger config from the virtualhost [puppet] - 10https://gerrit.wikimedia.org/r/306703 [21:01:42] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [21:03:00] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: add puppetmaster::web_frontend [puppet] - 10https://gerrit.wikimedia.org/r/306702 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [21:03:24] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: extract the passenger config from the virtualhost [puppet] - 10https://gerrit.wikimedia.org/r/306703 (owner: 10Giuseppe Lavagetto) [21:08:40] 06Operations, 10Ops-Access-Requests: Requesting access to the statistics host(s) for flemmerich - https://phabricator.wikimedia.org/T143881#2584007 (10AlexMonk-WMF) [21:08:42] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2584008 (10AlexMonk-WMF) [21:10:14] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2576804 (10AlexMonk-WMF) > bastiononly This group was deleted recently. It's no longer necessary. [21:14:57] /home/papaul/.local/share/bijiben/dd88100b-2c4f-41bd-b180-3e9216fe782a.note [21:16:14] 06Operations, 10DBA, 06Labs, 07Tracking: Database replication services (tracking) - https://phabricator.wikimedia.org/T50930#2584027 (10AlexMonk-WMF) [21:23:44] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [21:26:15] 06Operations, 10MediaWiki-General-or-Unknown, 06Services, 10Traffic: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093#2584068 (10dr0ptp4kt) I haven't come across a case where the name-value pair ordering in the URL or form data is material. @Anomie @tgr... [21:36:53] (03CR) 10Alex Monk: "This is a legitimate error and should not be silenced, we should be dealing with the actual problem instead, rather than trying to hide it" [puppet] - 10https://gerrit.wikimedia.org/r/298785 (https://phabricator.wikimedia.org/T132324) (owner: 10Elukey) [21:38:33] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 622 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4894953 keys - replication_delay is 622 [21:40:34] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 4841253 keys - replication_delay is 4 [21:43:49] hi bd808 [21:44:07] * yuvipanda apologises profusely and offers bd808 beer [21:44:13] a wild yuvipanda! [21:44:30] * bd808 tosses a raspberry so he won't run away [21:45:01] I think I still haven't had any [21:45:03] no beer required (I've got a fridge full) [21:45:12] use a Great Ball [21:45:18] just to be on the safe side [21:45:37] * yuvipanda offers bd808 some home made brownies [21:45:43] I think this yuvipanda's CP rates an Ultra Ball [21:45:49] soon I can make my own brownie icecream sandwich [21:46:15] bacon, omelettes and browines [21:46:30] you are set for the major food groups [21:46:39] that's where I'm at now. Next plans include bread pudding and french toast [21:47:03] and pancakes, crepes. no waffles though, don't have the thing [21:47:06] ooh, and milkshake [21:47:39] bd808 do you still wanna deploy now? or too late for you? [21:47:51] no. lets doooo it [21:47:53] https://gerrit.wikimedia.org/r/#/c/306604/ [21:47:54] ok [21:48:19] (03PS11) 10Yuvipanda: striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [21:49:50] (03CR) 10Yuvipanda: [C: 032 V: 032] striker: Replace nginx with apache [puppet] - 10https://gerrit.wikimedia.org/r/306604 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [21:50:32] at this point I don't know if I should laugh or cry when I go to palladium, type puppet merge, and wonder why my change isn't there [21:50:38] (spoiler - it's because I didn't press 'submit') [21:52:23] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [22:02:43] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 60.00% of data above the critical threshold [50.0] [22:04:34] 06Operations, 10MediaWiki-General-or-Unknown, 06Services, 10Traffic: Investigate query parameter normalization for MW/services - https://phabricator.wikimedia.org/T138093#2584228 (10Tgr) The action API is not cached in Varnish unless the client sets the `smaxage` query parameter (in which case they probabl... [22:06:06] ok yuvipanda. I think we are ready for https://gerrit.wikimedia.org/r/#/c/305142/ [22:06:22] heh (yuvi said the same thing in another channel) [22:06:34] ok, back to here [22:07:28] (03PS6) 10Yuvipanda: Add toolsadmin.wikimedia.org to misc varnish [puppet] - 10https://gerrit.wikimedia.org/r/305142 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [22:07:41] (03CR) 10Yuvipanda: [C: 032 V: 032] Add toolsadmin.wikimedia.org to misc varnish [puppet] - 10https://gerrit.wikimedia.org/r/305142 (https://phabricator.wikimedia.org/T136256) (owner: 10BryanDavis) [22:08:01] OH [22:08:02] MY [22:08:03] GOD [22:08:07] I FORGOT TO HIT SUBMIT AGAIN [22:08:08] ... [22:08:13] push the button yuvipanda [22:08:18] I'm trying. [22:08:30] push it real good [22:09:00] https://www.youtube.com/watch?v=vCadcBR95oU [22:09:14] ahhhhhhhhhhhhhhh, push it [22:09:16] bd808 done. [22:09:48] Error: 404, Domain not served here [22:10:05] bd808 oh no, you hit it! now it'll be cached [22:10:12] at least on that one machine. [22:10:19] I did a ?... [22:10:37] nah, it's ok, just wait it out. I'm trying to figure out how to run salt on just the misc varnishes [22:10:41] I usually just wait for the 20mins [22:10:44] puppet timeperiod [22:11:16] bd808 if you hit it before that, it caches a 404 on that host for a while [22:12:02] cp1061 was the one that responded to me [22:12:39] 10 minutes is the cap for 404s [22:12:50] thanks bblack [22:13:10] (and this is why you said to wait and not add the dns until after the varnish bits) [22:13:23] PROBLEM - puppet last run on palladium is CRITICAL: CRITICAL: puppet fail [22:13:27] well 10 minutes isn't right either [22:14:07] salt -v -t 5 -b 17 -C 'G@cluster:cache_maps' cmd.run 'puppet agent -t' [22:14:13] does that look ok to you, bblack? [22:14:23] there's a 10 minute cap on 4xx TTLs in any given cache layer. if you're hitting a ulsfo machine the corner case could be up to 40 minutes. but that would be rare, and usually only on a URL that's very heavily spammed to hit timing races [22:14:27] the palladium failure is me, I ran puppt on it accidentally rather than puppetmerge [22:14:49] yuvipanda: yes it looks right, except I think you want misc not maps? [22:15:04] I'm guessing that came from me from root's bash history, because I pick numbers like 17 a lot :) [22:15:16] yes :D [22:15:35] salt -v -t 5 -b 17 -C 'G@cluster:cache_misc' cmd.run 'puppet agent -t' [22:15:50] !log forcing a puppet run on misc varnish hosts [22:15:52] 06Operations, 06Discovery, 06Discovery-Search-Backlog, 10Elasticsearch: Check that elasticsearch actually uses shard allocation awareness - https://phabricator.wikimedia.org/T143571#2584303 (10debt) 05Open>03Resolved p:05Triage>03Normal a:03debt This looks like this was already done - closing for... [22:16:17] it has become an ingrained thoughtless rule of thumb for me that if I'm picking an arbitrary parameter for something in systems work (especially a timeout or something parallelism related) to try to pick a prime, or at least a strange/odd number. [22:16:40] less chance of horrific multiplicative coincidences with other things of various natures [22:16:52] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:16:59] the clush fanout is 8 [22:17:02] maybe it should be 7 :) [22:17:19] :) [22:18:04] bd808 is live for me now! [22:18:14] omg omg omg [22:18:41] 8719 tool maintainers! [22:18:47] 1498 hosted tools [22:18:52] such numbers! [22:19:23] RECOVERY - puppet last run on palladium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:19:32] argh config error [22:19:34] looking [22:19:55] yeah just saw [22:20:12] it's ldaps/tls stuff [22:20:30] also I need to make a pretty error page obviously [22:21:54] (03PS1) 10BryanDavis: striker: remove TLS flag for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/306826 [22:22:00] yuvipanda: ^ [22:22:11] (03PS1) 10Giuseppe Lavagetto: puppetmaster: add ca and ca_server settings to frontend [puppet] - 10https://gerrit.wikimedia.org/r/306827 (https://phabricator.wikimedia.org/T143869) [22:22:13] 06Operations, 06Discovery, 06Discovery-Search, 10Elasticsearch: Make elasticsearch configuration more robust to loss of network connectivity - https://phabricator.wikimedia.org/T143552#2584348 (10debt) p:05Triage>03Normal This looks like something we'll need to chat about that can make this run more ef... [22:22:13] (03PS1) 10Giuseppe Lavagetto: puppetmaster: split backend and frontend vhosts [puppet] - 10https://gerrit.wikimedia.org/r/306828 (https://phabricator.wikimedia.org/T143869) [22:22:15] (03PS1) 10Giuseppe Lavagetto: puppetmaster: move vhost from passenger class [puppet] - 10https://gerrit.wikimedia.org/r/306829 (https://phabricator.wikimedia.org/T143869) [22:22:17] (03PS1) 10Giuseppe Lavagetto: puppetmaster::frontend: move vhost to role [puppet] - 10https://gerrit.wikimedia.org/r/306830 (https://phabricator.wikimedia.org/T143869) [22:22:19] (03PS1) 10Giuseppe Lavagetto: puppetmaster::frontend: add vhost for FQDN [puppet] - 10https://gerrit.wikimedia.org/r/306831 (https://phabricator.wikimedia.org/T143869) [22:22:21] (03PS1) 10Giuseppe Lavagetto: puppetmaster::frontend: get workers from hiera [puppet] - 10https://gerrit.wikimedia.org/r/306832 (https://phabricator.wikimedia.org/T143869) [22:22:21] I only need one of ldaps:// or TLS [22:22:23] (03PS1) 10Giuseppe Lavagetto: [WiP] puppetmaster::gitclone: support primary/secundary masters [puppet] - 10https://gerrit.wikimedia.org/r/306833 [22:22:42] (03CR) 10Yuvipanda: [C: 032 V: 032] striker: remove TLS flag for LDAP [puppet] - 10https://gerrit.wikimedia.org/r/306826 (owner: 10BryanDavis) [22:22:57] I tabbed out, but remembered to tab back in and press the damn button [22:23:09] progress [22:23:46] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: add ca and ca_server settings to frontend [puppet] - 10https://gerrit.wikimedia.org/r/306827 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [22:24:14] <_joe_> grrr [22:24:39] <_joe_> trailing whitespace, meh [22:25:05] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: split backend and frontend vhosts [puppet] - 10https://gerrit.wikimedia.org/r/306828 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [22:26:19] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: move vhost from passenger class [puppet] - 10https://gerrit.wikimedia.org/r/306829 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [22:28:07] bd808 you have access to the logs right? [22:28:07] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster::frontend: move vhost to role [puppet] - 10https://gerrit.wikimedia.org/r/306830 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [22:28:30] yuvipanda: in logstash, yes [22:28:36] on the host, I don't think so [22:29:00] I can get there via the deploy-service user but that is very low privledge [22:29:10] ok. [22:29:22] let me know what I can do to help? [22:29:25] (03CR) 10MZMcBride: On public wikis, show "Publish" rather than "Save" on edit pages (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [22:30:09] doh. I needed to set that flag to false. True is the default [22:30:35] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster::frontend: add vhost for FQDN [puppet] - 10https://gerrit.wikimedia.org/r/306831 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [22:32:19] (03PS1) 10BryanDavis: striker: set TLS to false [puppet] - 10https://gerrit.wikimedia.org/r/306834 [22:32:48] yuvipanda: ^ try again plz [22:32:58] bd808 I manually set it to false, reloaded just now [22:33:38] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster::frontend: get workers from hiera [puppet] - 10https://gerrit.wikimedia.org/r/306832 (https://phabricator.wikimedia.org/T143869) (owner: 10Giuseppe Lavagetto) [22:33:59] yuvipanda: mysql connection is failing [22:34:14] (1045, "Access denied for user 'striker'@'208.80.154.147' (using password: NO)") [22:34:31] did the password not get set in the private hiera? [22:34:50] yeah, it's blank [22:35:28] looking [22:35:30] I did set it, maybe in the worng place [22:35:32] did you see the "copy from passwords::striker::application_db_password" note [22:35:46] yeah, I did do it [22:36:06] oh... yeah I suppose we might be able to get this far without any of the secrets [22:36:08] bd808 it's set to [22:36:10] PASSWORD: "%{::passwords::striker::application_db_password}" [22:36:29] oh. just literally copy and paste it [22:36:48] nothing is loading the passwords module so hiera won't do that [22:37:06] actually that wouldn't work even if it was loaded [22:37:14] that would only copy another hiera var [22:37:21] (03CR) 10jenkins-bot: [V: 04-1] [WiP] puppetmaster::gitclone: support primary/secundary masters [puppet] - 10https://gerrit.wikimedia.org/r/306833 (owner: 10Giuseppe Lavagetto) [22:37:32] bd808 it's being used in other places, like ocg [22:38:00] they must load the class in their role then. [22:38:06] I could add that I guess [22:40:57] bd808 yeah, can we do that? i think it'll be a surprise to jy.nus otherwise [22:41:02] sorry I didn't catch that [22:41:13] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2584448 (10leila) [22:41:36] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2576804 (10leila) @AlexMonk-WMF thanks for letting us know. I removed that item from the description. [22:42:08] (03PS1) 10BryanDavis: striker: require passwords::striker [puppet] - 10https://gerrit.wikimedia.org/r/306836 [22:42:51] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-management: Request access to data for reader research - https://phabricator.wikimedia.org/T143718#2584453 (10AlexMonk-WMF) [22:43:28] (03CR) 10Yuvipanda: [C: 032] striker: set TLS to false [puppet] - 10https://gerrit.wikimedia.org/r/306834 (owner: 10BryanDavis) [22:43:35] (03CR) 10Yuvipanda: [C: 032 V: 032] striker: require passwords::striker [puppet] - 10https://gerrit.wikimedia.org/r/306836 (owner: 10BryanDavis) [22:49:32] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] [22:53:34] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [22:54:32] bd808 running puppet [22:55:50] bd808 the password is in place now. still 500 tho [22:56:30] (1193, "Unknown system variable 'TRANSACTION'") [22:56:33] wtf is that [22:57:08] hmmm [22:57:30] that's from my default connection setup [22:57:40] waht version of mysql are we running on m5-master? [22:58:16] let me look [22:59:01] mariadb Ver 15.1 Distrib 10.0.16-MariaDB, for Linux (x86_64) using readline 5.1 [22:59:19] * bd808 scratches head [22:59:42] you can look this up on tenril [22:59:44] tendril* [22:59:44] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 100.00% of data above the critical threshold [50.0] [23:00:00] m5-master is db1009, tendril says it's Release 10.0.16 [23:00:05] RoanKattouw, ostriches, MaxSem, awight, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160825T2300). [23:00:05] MaxSem, ebernhardson, and bawolff: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:33] Woo! [23:01:06] weird that I didn't see this on my test system. I see what it doesn't like but I'm not 100% sure why [23:01:43] https://stackoverflow.com/questions/16946938/django-unknown-system-variable-transaction-on-syncdb ? [23:02:38] * MaxSem will deploy [23:02:59] Krenair: yeah. that accepted answer isn't quite right but that's the problem [23:03:21] can't mix the foo=bar sets with the transaction level apaprently [23:03:46] yuvipanda: good news for you is this is on my side, no techops +2 needed [23:04:49] :D [23:05:54] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [23:09:44] we kinda have a ton of DB connection errors ^ but it seems to be calming down [23:10:23] did someone accidentally a mysqld? :P [23:11:18] (03PS1) 10Ladsgroup: ores: Define extra config for ores [puppet] - 10https://gerrit.wikimedia.org/r/306839 (https://phabricator.wikimedia.org/T143567) [23:12:39] no, those connection errors are a bad query plan mysql generated, it thinks scanning a few tens of millions of rows is a better idea than 30 key lookups in an index [23:12:47] but the other patch in swat kinda/mostly fixes that [23:13:54] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [23:14:28] a better fix will have to wait for jcrespo ... my best guess is something is wrong with index stats because those indexes think the integer namespace field has a cardinality of ~3 million, which seems wrong... [23:14:51] that's a lot of namespaces [23:14:56] yea :) [23:15:12] I might believe ~300 [23:15:46] !log maxsem@tin Synchronized php-1.28.0-wmf.16/extensions/CirrusSearch: (no message) (duration: 00m 57s) [23:16:44] indeed :) [23:18:02] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [50.0] [23:20:50] (03PS1) 10Ppchelko: Change-Prop: Remove unused request templates from the config. [puppet] - 10https://gerrit.wikimedia.org/r/306842 [23:23:12] bawolff, pulled on mw1099 [23:23:33] ok [23:23:56] (03CR) 10Ppchelko: "Puppet compiler: https://puppet-compiler.wmflabs.org/3846/scb1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/306842 (owner: 10Ppchelko) [23:24:03] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [23:25:26] bd808 I am going to step away for a bit, that ok? [23:25:42] yuvipanda: yup. thanks! [23:27:07] (03CR) 10Chad: On public wikis, show "Publish" rather than "Save" on edit pages (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306303 (https://phabricator.wikimedia.org/T131132) (owner: 10Jforrester) [23:27:19] MaxSem: Tested. Confirmed it worked [23:27:29] yuvipanda: \o/ logged in! [23:27:44] !log Updated striker to 0b6ef02 [23:27:50] \o/ me too! [23:28:40] omg it all works! [23:29:06] !log maxsem@tin Synchronized php-1.28.0-wmf.16/includes/DefaultSettings.php: https://gerrit.wikimedia.org/r/#/c/306837/ (duration: 00m 47s) [23:29:23] \o/ [23:29:32] bd808 do I need to have already connected my phab account with SUL? [23:29:59] yuvipanda: it should work with either SUL or ldap [23:30:39] !log maxsem@tin Synchronized php-1.28.0-wmf.16/includes/api/ApiCSPReport.php: https://gerrit.wikimedia.org/r/#/c/306837/ (duration: 00m 46s) [23:30:44] bawolff, ^ [23:30:52] Woo! :) [23:30:54] Thanks [23:31:18] bd808 in home, if I click 'Connect now' for phabricator, I get ' No related Phabricator accounts found. ' [23:31:29] bd808 but otherwise, Awesome so far [23:31:29] hmmm [23:32:17] yuvipanda: "Unknown or missing ldap names: yuvipanda" [23:32:18] (03PS3) 10MaxSem: Disable on most Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306587 [23:32:26] (03CR) 10MaxSem: [C: 032] Disable on most Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306587 (owner: 10MaxSem) [23:32:45] is it using the wrong name from ldap? [23:32:54] (03Merged) 10jenkins-bot: Disable on most Wikipedias [mediawiki-config] - 10https://gerrit.wikimedia.org/r/306587 (owner: 10MaxSem) [23:32:57] not sure. [23:33:00] that looks like a shell name [23:35:00] yuvipanda: filed a bug T143956 [23:35:01] T143956: Yuvipanda can't connect his phab account - https://phabricator.wikimedia.org/T143956 [23:35:15] thanks bd808! [23:35:21] ooo I get a bug with my name on it! [23:38:20] hmmm.. so phab thinks your name is Yuvipanda and ldap told me its yuvipanda [23:40:30] !log maxsem@tin Synchronized wmf-config/InitialiseSettings.php: https://gerrit.wikimedia.org/r/#/c/306587/ (duration: 00m 47s) [23:43:03] who was azariv? [23:49:06] Platonides: me [23:50:38] but you no longer hold that email [23:50:56] @wikimedia.org? [23:51:03] yes [23:51:10] deleted ages ago yeah [23:51:23] it's still listed as owner for comproj ml [23:51:34] odd [23:51:38] I should fix that [23:51:54] and mailmain has been spamming about a subscription request for a month [23:54:20] ah [23:55:26] The listadmin password prolly got reset and sent to my non-functional email, will have to create a ticket to get it sent to my current one. [23:55:58] Though for the most part that list is dead, like really dead.