[00:00:04] addshore, hashar, anomie, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, thcipriani, Niharika, and zeljkof: Dear deployers, time to do the Evening SWAT (Max 8 patches) deploy. Dont look at me like that. You signed up for it. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180209T0000). [00:00:04] tgr: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be around during the process. Note: If you break AND fix the wikis, you will be rewarded with a sticker. [00:02:51] (03CR) 10Chad: [C: 032] group1 back to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409128 (owner: 10Chad) [00:04:32] tgr: chad's working on the train catch up, sorry :/ [00:05:04] (03Merged) 10jenkins-bot: group1 back to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409128 (owner: 10Chad) [00:05:13] didn't have anything important, I'll just move it to Monday [00:05:44] 10Operations, 10Icinga, 10monitoring, 10Patch-For-Review: icinga ACK shows as CRIT when delivered via SMS - https://phabricator.wikimedia.org/T185862#3957084 (10Dzahn) a:03Dzahn [00:06:00] PROBLEM - Router interfaces on cr1-eqsin is CRITICAL: CRITICAL: host 103.102.166.129, interfaces up: 63, down: 2, dormant: 0, excluded: 0, unused: 0 [00:06:17] greg-g: or I could sync it today evening - it's a -labs.php only change [00:07:14] oh, gotcha, then yeah, either or, up to you [00:07:18] (03CR) 10jenkins-bot: group1 back to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409128 (owner: 10Chad) [00:09:23] 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10HHVM: Convert Wikimedia production HHVM instances to have hhvm.php7.all set true - https://phabricator.wikimedia.org/T173786#3957090 (10Krinkle) [00:09:36] 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10HHVM: Convert Wikimedia production HHVM instances to have hhvm.php7.all set true - https://phabricator.wikimedia.org/T173786#3539734 (10Krinkle) [00:09:55] 10Operations, 10MediaWiki-Platform-Team, 10Performance-Team, 10HHVM: Convert Wikimedia production HHVM instances to have hhvm.php7.all set true - https://phabricator.wikimedia.org/T173786#3539734 (10Krinkle) [00:11:29] (03PS1) 10Dzahn: icinga: set contact group for paging test host [puppet] - 10https://gerrit.wikimedia.org/r/409188 (https://phabricator.wikimedia.org/T185862) [00:13:11] (03PS2) 10Dzahn: icinga: set contact group for paging test host [puppet] - 10https://gerrit.wikimedia.org/r/409188 (https://phabricator.wikimedia.org/T185862) [00:15:19] tgr: Gimme a few mins, then you can do it [00:16:18] (03CR) 10Dzahn: [C: 032] icinga: set contact group for paging test host [puppet] - 10https://gerrit.wikimedia.org/r/409188 (https://phabricator.wikimedia.org/T185862) (owner: 10Dzahn) [00:16:40] !log demon@tin rebuilt and synchronized wikiversions files: group1 to wmf.20 *duck and cover* [00:16:41] hmm I have not gotten more cronspam from stat1005 so maybe it has fixed itself, we shall see [00:16:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:10] apergos: i saw high load and then it went down and that "process stats" command was gone [00:17:32] well that's not the command that breaks, though that might be why something else ooms [00:17:51] anyhoo, if i see spam tomorrow I'll follow up [00:17:52] Nope, wmf.20 sucks. [00:17:58] bah still? [00:18:30] linksupdate / jobqueue [00:18:31] !log demon@tin rebuilt and synchronized wikiversions files: surprise, it broke. revert group1 back to wmf.20 [00:18:41] Immediate spike in db lag from mw's side [00:18:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:18:51] I have stacktraces this time [00:19:18] (03PS1) 10Chad: Revert "group1 back to wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409189 [00:19:28] (03CR) 10Chad: [C: 032] Revert "group1 back to wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409189 (owner: 10Chad) [00:19:39] (03Abandoned) 10Chad: Group2 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409129 (owner: 10Chad) [00:20:55] (03Merged) 10jenkins-bot: Revert "group1 back to wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409189 (owner: 10Chad) [00:21:02] (03CR) 10jenkins-bot: Revert "group1 back to wmf.20" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409189 (owner: 10Chad) [00:35:44] 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Remove cloud-admin rights from YuviPanda - https://phabricator.wikimedia.org/T186289#3957131 (10bd808) p:05Triage>03Normal a:03bd808 [00:38:28] no_justification: well bummer [00:39:55] 10Operations, 10wikitech.wikimedia.org, 10cloud-services-team (Kanban): Remove cloud-admin rights from YuviPanda - https://phabricator.wikimedia.org/T186289#3957151 (10bd808) ``` 2018-02-09T00:38:45 BryanDavis (talk | contribs | block) changed group membership for Yuvipanda from cloud administrator, OAuth ad... [00:40:18] lol more cronspam from stat1005 just arrived... [00:40:52] i am trying to debug the icinga SMS stuff.. and it's a mystery :p [00:41:12] i copied the exact same command except changing the content.. and it will just not send me SMS [00:41:40] and there is no magic involved, it's just an email to that mail2sms gateway [00:42:21] mutante: how long is the content? Perhaps it exceeds some sms limit? [00:46:10] Platonides: that could indeed be it, i just tested by doing a more manual echo .. | mail and i get that [00:46:43] and this is why it was good to not just change the existing command for all, heh [00:46:46] sms had a limit of about 120 characters or so [00:47:06] plus an unusual encoding when compared with computers [00:47:52] 140 isn't it? [00:47:58] no_justification: I'm on to something, but can't find where the ball is dropped atm [00:48:00] and that's what v.olans said on review "The NOTIFICATIONTYPE variable has pretty long values, and given that SMS have a limited number of chars we should try to keep it short, to just few significant bits. " haha [00:48:02] probably [00:48:17] AaronSchulz: tyvm [00:48:21] Just lemme know <3 [00:48:42] i'll try to shorten it while also adding the $NOTIFICATIONTYPE variable, because currently the output is kind of broken [00:48:53] "payload length is limited by the constraints of the signaling protocol to precisely 140 bytes" [00:48:55] if you send an ACK it doesnt arrive as ACK, it looks like another CRIT.. duh [00:49:14] https://en.wikipedia.org/wiki/SMS#Message_size [00:49:18] 'k, thanks! [00:50:04] is the output an email module or a pipe command? [00:50:41] in that case it would be simple to add a cut -c 1-135 to the mail [00:50:57] (I am leaving a few bytes of margin "just in case" :P) [00:52:45] echo "$NOTIFICATIONTYPE$: $HOSTNAME$ is $HOSTSTATE$\n$HOSTOUTPUT$\n$HOSTACKAUTHOR$: $HOSTACKCOMMENT$" | /usr/bin/mail -s "$NOTIFICATIONTYPE$ $HOSTNAME" $CONTACTADDRESS1$ [00:53:40] yea, it would cut off the comments if any, that's an option [00:55:28] better to receive a truncated sms than no sms at all imho [01:13:46] AaronSchulz: If it helps you debug, I left wmf.20 on group0 wikis [01:27:20] (03PS3) 10BryanDavis: Remove access for myself [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [01:27:38] (03CR) 10jerkins-bot: [V: 04-1] Remove access for myself [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [01:32:23] i guess that's why Twitter picked 140 chars.. to be like an SMS [01:33:15] 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove cloud-admin rights from YuviPanda - https://phabricator.wikimedia.org/T186289#3957262 (10bd808) @MoritzMuehlenhoff could you take care of removing @yuvipanda from the `ops` LDAP group? This ldif should do it,... [01:35:31] no_justification: pretty sure it is https://gerrit.wikimedia.org/r/c/404056/9/includes/libs/rdbms/loadbalancer/LoadBalancer.php interacting with the server array config template (uses local domain by default). It *should* override the value to null to trigger line 162 of DatabaseMysqlBase::open. [01:45:57] (03PS1) 10Dzahn: icinga: fix variable name inside paging test command [puppet] - 10https://gerrit.wikimedia.org/r/409195 (https://phabricator.wikimedia.org/T185862) [01:46:42] (03CR) 10Dzahn: [C: 032] icinga: fix variable name inside paging test command [puppet] - 10https://gerrit.wikimedia.org/r/409195 (https://phabricator.wikimedia.org/T185862) (owner: 10Dzahn) [01:51:13] it wasn't the length, it was a missing $ at the _end_ of a variable name [01:51:41] for Nagios this is $RIGHT$ and $RIGHT is an known macro :P [01:51:41] $HOSTNAME" [01:51:51] good grief [01:52:23] Warning: Error grabbing macro 'HOSTNAME" ' value ''! Maybe used in the wrong scope? Check the docs. [01:52:26] :p [01:52:35] gotta find it in the right logfile [01:52:43] or it seems just silent [01:55:32] also, we send that email with "-s subject" and that doesnt even do anything for SMS. all we get is the literal string "subject: foo" inside the SMS contet [01:55:37] how wonderful [01:55:55] well, it's written in bold [01:56:29] hm you know what else is wonderful? [01:56:43] it's 4 am and I'm still in here. see ya tomorrow [01:56:52] :o ! run! [01:56:53] good night [01:56:58] night! [02:02:28] a host "foobar" will show up soon as DOWN.. it's a test :) [02:05:40] 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove cloud-admin rights from YuviPanda - https://phabricator.wikimedia.org/T186289#3957292 (10bd808) @yuvipanda, you are the only admin of the [[https://tools.wmflabs.org/openstack-browser/project/matrix|matrix]]... [02:33:34] 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove cloud-admin rights from YuviPanda - https://phabricator.wikimedia.org/T186289#3939839 (10Tgr) The matrix project might be useful for {T186061} (although I'm not sure if taking over / upgrading an existing ins... [02:38:24] !log andrew@tin Started deploy [horizon/deploy@60cac8e]: updating with designate dashboard [02:38:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:40:32] (03PS7) 10Dzahn: phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 [02:41:06] !log andrew@tin Finished deploy [horizon/deploy@60cac8e]: updating with designate dashboard (duration: 02m 42s) [02:41:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:42:08] (03PS8) 10Dzahn: phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 [02:44:58] (03CR) 10Dzahn: [C: 032] "compiler shows it should be no-op on phab1001: http://puppet-compiler.wmflabs.org/9893/phab1001.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/408947 (owner: 10Dzahn) [02:45:11] (03PS9) 10Dzahn: phabricator: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/408947 [02:49:45] PROBLEM - puppet last run on phab1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:50:00] yes, icinga-wm , one sec :) [02:50:34] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:51:27] (03PS1) 10Dzahn: phabricator: remove requires for libapache2-mod-php5 [puppet] - 10https://gerrit.wikimedia.org/r/409198 [02:51:44] (03CR) 10jerkins-bot: [V: 04-1] phabricator: remove requires for libapache2-mod-php5 [puppet] - 10https://gerrit.wikimedia.org/r/409198 (owner: 10Dzahn) [02:53:43] (03PS2) 10Dzahn: phabricator: remove requires for libapache2-mod-php5 [puppet] - 10https://gerrit.wikimedia.org/r/409198 [02:55:34] (03CR) 10Dzahn: [C: 032] "also independently we don't want to hardcode relying on mod_php anyways" [puppet] - 10https://gerrit.wikimedia.org/r/409198 (owner: 10Dzahn) [02:55:50] (03PS3) 10Dzahn: phabricator: remove requires for libapache2-mod-php5 [puppet] - 10https://gerrit.wikimedia.org/r/409198 [02:58:41] (03CR) 10Dzahn: [C: 032] "the only _actual_ change this caused is that the default for logrotate changed" [puppet] - 10https://gerrit.wikimedia.org/r/408947 (owner: 10Dzahn) [02:59:07] (03CR) 10Dzahn: [C: 032] "all no-op after https://gerrit.wikimedia.org/r/#/c/409198/" [puppet] - 10https://gerrit.wikimedia.org/r/408947 (owner: 10Dzahn) [02:59:45] RECOVERY - puppet last run on phab1001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [03:00:25] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [03:06:56] (03PS1) 10Dzahn: performance::site: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409200 [03:23:07] (03PS2) 10Dzahn: graphite/performance::site: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409200 [03:23:35] (03CR) 10jerkins-bot: [V: 04-1] graphite/performance::site: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409200 (owner: 10Dzahn) [03:28:13] (03PS3) 10Dzahn: graphite/performance::site: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409200 [03:28:45] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 876.51 seconds [03:32:52] (03CR) 10Dzahn: [C: 031] "wmf-style: total violations delta -9" [puppet] - 10https://gerrit.wikimedia.org/r/409200 (owner: 10Dzahn) [03:36:37] (03PS1) 10BBlack: dns5001 macaddr [puppet] - 10https://gerrit.wikimedia.org/r/409202 (https://phabricator.wikimedia.org/T156027) [03:36:39] (03PS1) 10BBlack: eqsin: use local NTP, define peers [puppet] - 10https://gerrit.wikimedia.org/r/409203 (https://phabricator.wikimedia.org/T156027) [03:37:20] (03CR) 10BBlack: [C: 032] dns5001 macaddr [puppet] - 10https://gerrit.wikimedia.org/r/409202 (https://phabricator.wikimedia.org/T156027) (owner: 10BBlack) [03:37:37] (03CR) 10BBlack: [C: 032] eqsin: use local NTP, define peers [puppet] - 10https://gerrit.wikimedia.org/r/409203 (https://phabricator.wikimedia.org/T156027) (owner: 10BBlack) [03:42:18] (03PS1) 10Dzahn: icinga: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409204 [03:44:14] PROBLEM - NTP peers on hydrogen is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [03:44:32] (03PS2) 10Dzahn: icinga: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409204 [03:45:14] RECOVERY - NTP peers on hydrogen is OK: NTP OK: Offset 0.000366 secs [03:46:35] PROBLEM - NTP peers on acamar is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [03:48:35] RECOVERY - NTP peers on acamar is OK: NTP OK: Offset 0.000882 secs [03:50:59] (03CR) 10Dzahn: "wmf-style: total violations delta -7" [puppet] - 10https://gerrit.wikimedia.org/r/409204 (owner: 10Dzahn) [03:51:44] PROBLEM - NTP peers on chromium is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [03:52:44] RECOVERY - NTP peers on chromium is OK: NTP OK: Offset -9.6e-05 secs [03:52:55] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 204.75 seconds [03:54:17] (03CR) 10Dzahn: "Chad, how is this nowadays? i saw you have already installed gitiles now" [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [03:54:38] (03PS4) 10Dzahn: Gerrit: Set gitiles configuation to be used as the repo viewer [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [03:58:04] PROBLEM - NTP peers on achernar is CRITICAL: NTP CRITICAL: Server not synchronized, Offset unknown [04:00:04] RECOVERY - NTP peers on achernar is OK: NTP OK: Offset 0.000225 secs [04:06:09] (03CR) 10Chad: "We don't need it to be an erb file (so we can drop the addition to jetty.pp), and in fact we already committed the linkname change. I'll r" [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [04:07:22] (03PS1) 10Dzahn: netmon/netbox/smokeping/librenms: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409210 [04:10:19] (03CR) 10Chad: "Actually I hadn't committed that. Either way, patch incoming!" [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [04:12:08] (03PS5) 10Chad: Gerrit: Set gitiles configuation to be used as the repo viewer [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [04:12:52] (03CR) 10Chad: [C: 031] Gerrit: Set gitiles configuation to be used as the repo viewer [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [04:13:36] (03CR) 10Chad: [C: 031] "Also: if we want to bring Phab linking back, it should be redone as a plugin. Wouldn't be hard, only a couple of files." [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [04:28:58] (03PS1) 10Chad: Gerrit: Proxy gitiles through gerrit.wikimedia.org/g/ [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) [04:29:47] https://gerrit.wikimedia.org/g/- works, but URLs are generated based on the config so it heads back to the ugly urls [04:29:55] (also incoming links aren't the pretty url) [04:29:57] We can do this! [04:32:08] hmm, https://gerrit.wikimedia.org/g/ says I'm not logged in [04:33:11] the GerritAccount cookie has a path of /r set [04:58:06] (03PS1) 10BBlack: ntp servers/peers option tweaks [puppet] - 10https://gerrit.wikimedia.org/r/409214 [05:03:22] (03CR) 10BBlack: [C: 032] ntp servers/peers option tweaks [puppet] - 10https://gerrit.wikimedia.org/r/409214 (owner: 10BBlack) [05:12:11] legoktm: Derppppppp [05:12:17] That's annoying! [05:13:29] There's config for this! [05:13:31] :) [05:17:14] (03PS2) 10Chad: Gerrit: Proxy gitiles through gerrit.wikimedia.org/g/ [puppet] - 10https://gerrit.wikimedia.org/r/409211 (https://phabricator.wikimedia.org/T184116) [05:17:16] (03PS1) 10Chad: Gerrit: Set cookie path to / [puppet] - 10https://gerrit.wikimedia.org/r/409216 [05:18:10] (03Abandoned) 10Chad: Gerrit: Rename link to gitiles [puppet] - 10https://gerrit.wikimedia.org/r/408936 (owner: 10Chad) [05:19:07] legoktm: Defaults to install path, so /r/ in our case [05:19:11] Swapped to just "/" [05:19:18] Since we don't plan to install non-gerrit stuff on gerrit.wm.o [05:19:19] :) [05:19:22] :D [05:19:32] Good catch! [05:19:57] I found some cool settings today though [05:20:44] https://gerrit.wikimedia.org/r/Documentation/dev-plugins.html#included-in - we could write an addition to the "Included In" area to have a like "Deployed Yes/No" bit [05:21:18] The WMF branches aren't a giveaway? :p [05:22:28] A "is:wikimedia-deployed" search option would be really nice though when looking for patches to review (plus the inverse) [05:24:57] Well, we prune old branches [05:25:48] if it's deployed it should always have at least one WMF branch right? [05:26:07] Eh, true. But doesn't tell you which of the 2 (or 3 or 4) :) [05:26:11] Your wiki is on :) [05:26:36] Actually, easier/more useful...I was thinking of an "is it fixed?" toolforge thingie. [05:26:51] Throw in a T# or a gerrit change ID and it'll try to guess if it's fixed & live [05:26:54] Idk [05:26:55] I have ideas [05:27:00] that would be pretty neat [05:27:21] Then divide it by group/wiki ID [05:27:34] originally we hoped releasetaggerbot would do that but I don't think it's been as useful as it could be [05:27:36] Sooooooo.... "Is T123 fixed on enwiki?" -> No [05:27:36] T123: Turn on "diffusion.allow-http-auth" - https://phabricator.wikimedia.org/T123 [05:27:40] .... on testwiki -> yes [05:27:44] Etc [05:27:50] I think releasetaggerbot is annoying :) [05:28:20] it is if the one patch that has been merged did not fix the issue [05:29:18] Gitiles has a *lot* of undocumented config [05:29:31] There's a whole section on cache.* [06:04:42] legoktm: https://gerrit-review.googlesource.com/c/plugins/gitiles/+/158630 :) [06:04:54] I haven't pushed upstream in awhile :) [06:05:05] sweeet [06:06:22] Live hacked it but it didn't seem to work :\ [06:06:26] I might've gotten it wrong [06:07:35] no_justification: the other upstream request I'd like is for a way to view raw files, a few of my tools depend upon that ability [06:08:53] https://bugs.chromium.org/p/gerrit/issues/list?can=2&q=gitiles :\ [06:10:11] Eh, that's the plugin. Here's the core product: https://github.com/google/gitiles/issues [06:10:31] https://github.com/google/gitiles/issues/7 [06:10:33] There's your bug [06:11:27] There's a patch! https://gerrit-review.googlesource.com/c/gitiles/+/78140 [06:13:27] woot :D [06:13:36] I will wait patiently then [06:13:57] You might have to wait awhile....Shawn passed away like 2 weeks ago :( [06:14:19] It was mentioned on repo-discuss :\ [06:23:08] (03PS1) 10BBlack: ntp server config tweaks [puppet] - 10https://gerrit.wikimedia.org/r/409219 [06:23:30] (03CR) 10jerkins-bot: [V: 04-1] ntp server config tweaks [puppet] - 10https://gerrit.wikimedia.org/r/409219 (owner: 10BBlack) [06:24:32] 06:23:24 + git pull --quiet zuul production [06:24:33] 06:23:26 fatal: write error: Connection reset by peer [06:25:14] (03CR) 10BBlack: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/409219 (owner: 10BBlack) [06:26:23] (03CR) 10BBlack: [C: 032] ntp server config tweaks [puppet] - 10https://gerrit.wikimedia.org/r/409219 (owner: 10BBlack) [06:27:24] PROBLEM - ores on scb1002 is CRITICAL: connect to address 10.64.16.21 and port 8081: Connection refused [06:27:31] legoktm: +993 -56 is a big diff :\ [06:28:13] I think the security part is the complicated thing [06:28:24] with the redirect plus auth token on a separate domain... [06:28:45] Yeah [06:29:03] Hmm, might not need my urlBase patch to the plugin [06:29:08] gerrit.urlAlias.* might handle this [06:29:23] Er, prolly not it's in a different root [06:29:30] /r/ vs /g/ [06:29:37] (why oh why did we use /r/? [06:30:18] Eh, urls could conflict (they dropped # at the start of all of them) [06:33:24] RECOVERY - ores on scb1002 is OK: HTTP OK: HTTP/1.0 200 OK - 3691 bytes in 0.011 second response time [06:36:52] (03CR) 10Marostegui: "Overall it all makes sense to me (as you said, pending checking what PC says..), see minor comments inline" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) (owner: 10Jcrespo) [06:39:02] (03PS1) 10Marostegui: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409223 (https://phabricator.wikimedia.org/T162807) [06:41:03] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409223 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [06:41:40] (03PS1) 10Marostegui: db1080: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/409225 [06:42:48] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409223 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [06:43:04] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409223 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [06:44:15] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1080 - T162807 (duration: 01m 12s) [06:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:44:30] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [06:47:36] (03CR) 10Marostegui: [C: 032] db1080: Update socket path [puppet] - 10https://gerrit.wikimedia.org/r/409225 (owner: 10Marostegui) [06:47:53] !log Fix data drifts, upgrade kernel, mariadb and socket path on db1080 - T162807 [06:48:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:13] !log Fix replication on labsdb1010 - T186579 [06:52:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:52:27] T186579: labsdb1010 crashed - https://phabricator.wikimedia.org/T186579 [07:02:01] (03PS1) 10BryanDavis: toolforge: add user requested packages [puppet] - 10https://gerrit.wikimedia.org/r/409226 (https://phabricator.wikimedia.org/T179343) [07:33:31] (03PS1) 10Marostegui: db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409232 (https://phabricator.wikimedia.org/T162807) [07:39:44] !log forced remount of /mnt/hdfs on stat1005 [07:39:49] apergos: --^ :) [07:39:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:47:40] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409232 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:49:20] (03Merged) 10jenkins-bot: db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409232 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:49:30] (03CR) 10jenkins-bot: db-eqiad.php: Slowly repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409232 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [07:50:57] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Slowly repool db1080 - T162807 (duration: 01m 11s) [07:51:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:51:09] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [07:57:20] !log Stop replication on labsdb1004 to fix replication issues [07:57:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:04:26] 10Operations, 10wikitech.wikimedia.org, 10Patch-For-Review, 10cloud-services-team (Kanban): Remove cloud-admin rights from YuviPanda - https://phabricator.wikimedia.org/T186289#3957601 (10MoritzMuehlenhoff) >>! In T186289#3957262, @bd808 wrote: > @MoritzMuehlenhoff could you take care of removing @yuvipand... [08:22:29] (03PS4) 10Muehlenhoff: Remove access for myself [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [08:27:14] PROBLEM - Check systemd state on krypton is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [08:29:43] legoktm no_justification I could try and get upstream to merge that change. Jrn seems to be working on reviewing gitilea changes. [08:30:14] RECOVERY - Check systemd state on krypton is OK: OK - running: The system is fully operational [08:30:29] (03CR) 10Muehlenhoff: "@yuvipanda: I also added your user to the absented group in PS4. There's still one open issue which prevents merging, though: You are the " [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [08:31:44] no_justification: about your change, I am in the process of backporting a change that should fix polygerrit but will break your change [08:31:59] So needs to be updated above my change. [08:32:04] k [08:32:10] Ie a gitilea baseUrl config [08:32:19] Gitilea [08:32:23] Uh auto spell [08:33:07] no_justification: https://gerrit-review.googlesource.com/#/c/plugins/gitiles/+/157731/ [08:33:29] krypton is me, I am testing the new prometheus-burrow-exporter [08:34:42] no_justification: I think that’s the fix your looking for :) [08:37:27] I saw that [08:37:31] After I pushed mine [08:37:32] Heh [08:39:23] Heh [08:40:40] (03PS1) 10Chad: Remove CleanChanges from wmf deploys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409246 (https://phabricator.wikimedia.org/T186859) [08:40:42] (03CR) 10Chad: [C: 032] Remove CleanChanges from wmf deploys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409246 (https://phabricator.wikimedia.org/T186859) (owner: 10Chad) [08:42:20] (03Merged) 10jenkins-bot: Remove CleanChanges from wmf deploys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409246 (https://phabricator.wikimedia.org/T186859) (owner: 10Chad) [08:42:35] (03CR) 10jenkins-bot: Remove CleanChanges from wmf deploys [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409246 (https://phabricator.wikimedia.org/T186859) (owner: 10Chad) [08:44:11] !log demon@tin Synchronized multiversion/submodules.json: rm CleanChanges (duration: 01m 13s) [08:44:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:45:37] !log demon@tin Synchronized wmf-config/: rm cleanchanges (duration: 01m 14s) [08:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:48:14] no_justification: left you a comment on there :) [08:52:11] Responded [08:53:02] no_justification: thanks, I guess I will try this tonight when I get back home (currently at college) :) [08:54:06] I couldn't get it to work earlier when I live hacked it into gerrit.wm.o so it might be wrong still [08:54:07] :) [08:54:20] I think that will break polygerrit [08:54:56] Which relys on baseUrl (/me was the one that added that into poly) [08:55:32] It didn't have any response at all :p [08:55:41] heh [08:56:10] I can add David that seems to be reviewing a lot of plugins [08:58:27] Added him :). Though I will try to see if this works tonight. If not we can reword your change to use gitiles.baseUrl (new config that we would be adding) [09:06:00] !log Fix data drifts on db1067 - T162807 [09:06:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:06:13] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [09:06:41] (03PS1) 10Chad: Rewrite old Special:Code urls to Phabricator SVN clones [puppet] - 10https://gerrit.wikimedia.org/r/409290 (https://phabricator.wikimedia.org/T116948) [09:10:01] elukey: thanks fo the fix! [09:16:50] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409293 [09:26:32] (03CR) 10Filippo Giunchedi: [C: 031] hiera/wmflib/pybal: rename ganglia_clusters to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/406794 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [09:27:00] 10Operations: Etherpad 1.6.3 security release - https://phabricator.wikimedia.org/T186866#3957743 (10MoritzMuehlenhoff) [09:27:31] (03CR) 10Filippo Giunchedi: [C: 04-1] "> Patch Set 3:" [puppet] - 10https://gerrit.wikimedia.org/r/382931 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [09:30:44] PROBLEM - Router interfaces on cr2-ulsfo is CRITICAL: CRITICAL: host 198.35.26.193, interfaces up: 76, down: 1, dormant: 0, excluded: 0, unused: 0 [09:31:05] PROBLEM - Router interfaces on cr1-codfw is CRITICAL: CRITICAL: host 208.80.153.192, interfaces up: 122, down: 1, dormant: 0, excluded: 0, unused: 0 [09:31:44] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 78, down: 0, dormant: 0, excluded: 0, unused: 0 [09:32:05] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 124, down: 0, dormant: 0, excluded: 0, unused: 0 [09:32:39] (03CR) 10Filippo Giunchedi: "Looks good to me overall, see inline." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/406997 (owner: 10Matthias Mullie) [09:34:57] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409293 (owner: 10Marostegui) [09:36:29] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409293 (owner: 10Marostegui) [09:37:12] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409293 (owner: 10Marostegui) [09:37:56] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1080 - T162807 (duration: 01m 11s) [09:38:06] (03CR) 10Filippo Giunchedi: "Not needed anymore AFAIUI" [puppet] - 10https://gerrit.wikimedia.org/r/402095 (https://phabricator.wikimedia.org/T181627) (owner: 10Gehel) [09:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:38:09] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [09:39:05] (03Abandoned) 10Gehel: elasticsearch / prometheus: enable prometheus jmx_exporter [puppet] - 10https://gerrit.wikimedia.org/r/402095 (https://phabricator.wikimedia.org/T181627) (owner: 10Gehel) [09:40:15] (03CR) 10Filippo Giunchedi: icinga: add notification type to SMS content (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/406535 (https://phabricator.wikimedia.org/T185862) (owner: 10Dzahn) [09:45:22] (03PS10) 10Jcrespo: mariadb: Redo mariadb::backup class into role/profile style [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) [09:45:36] (03CR) 10Jcrespo: "Fixed all but the third comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) (owner: 10Jcrespo) [09:49:18] 10Operations, 10ops-eqiad: Missing servers in racktables - https://phabricator.wikimedia.org/T186814#3957825 (10faidon) 05Resolved>03Open The original inquiry was for kafka1023, which is a different box than analytics1023 (confusing!). kafka1023 is active and online, but does not exist in Racktables. Ther... [09:50:08] (03CR) 10Marostegui: [C: 031] "So pending to see what puppet compiler says, I am fine with all these changes." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) (owner: 10Jcrespo) [09:53:19] (03CR) 10Jcrespo: "So apparently on core_multiinstance hosts, we do not open a second port yet- something (maybe) to review in the future, but not part of th" [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) (owner: 10Jcrespo) [09:54:05] (03CR) 10Jcrespo: [C: 04-1] "We got some errors: https://puppet-compiler.wmflabs.org/compiler02/9913/" [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) (owner: 10Jcrespo) [09:54:53] 10Operations, 10ops-eqiad: Missing servers in racktables - https://phabricator.wikimedia.org/T186814#3956080 (10MoritzMuehlenhoff) The Jupyterhub spare (notebook1002) was repurposed as kafka1023 in https://phabricator.wikimedia.org/T181518 [09:56:08] (03PS1) 10Marostegui: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409304 [09:59:08] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409304 (owner: 10Marostegui) [10:00:52] (03CR) 10Hashar: [C: 04-1] "We need the old code review comments to be reacheable. That really helps figuring out design decisions from the past!" [puppet] - 10https://gerrit.wikimedia.org/r/409290 (https://phabricator.wikimedia.org/T116948) (owner: 10Chad) [10:00:54] (03Merged) 10jenkins-bot: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409304 (owner: 10Marostegui) [10:00:56] (03CR) 10jenkins-bot: db-eqiad.php: Increase traffic for db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409304 (owner: 10Marostegui) [10:02:35] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Increase traffic for db1080 - T162807 (duration: 01m 12s) [10:02:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:02:49] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [10:08:32] (03PS1) 10Marostegui: db-eqiad.php: Fully repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409305 [10:12:26] (03PS11) 10Jcrespo: mariadb: Redo mariadb::backup class into role/profile style [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) [10:12:49] (03PS12) 10Jcrespo: mariadb: Redo mariadb::backup class into role/profile style [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) [10:22:49] (03PS13) 10Jcrespo: mariadb: Redo mariadb::backup class into role/profile style [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) [10:24:24] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fully repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409305 (owner: 10Marostegui) [10:26:51] (03Merged) 10jenkins-bot: db-eqiad.php: Fully repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409305 (owner: 10Marostegui) [10:27:02] (03CR) 10jenkins-bot: db-eqiad.php: Fully repool db1080 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409305 (owner: 10Marostegui) [10:27:59] 10Operations, 10Page-Previews, 10RESTBase, 10Traffic, and 2 others: Cached page previews not shown when refreshed - https://phabricator.wikimedia.org/T184534#3957874 (10phuedx) >>! In T184534#3954704, @BBlack wrote: > I think to really comprehend the right fix here, I'd need to rewind a little and figure o... [10:29:43] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fully repool db1080 - T162807 (duration: 01m 11s) [10:29:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:29:57] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [10:32:06] (03CR) 10Arturo Borrero Gonzalez: [C: 032] toolforge: add user requested packages [puppet] - 10https://gerrit.wikimedia.org/r/409226 (https://phabricator.wikimedia.org/T179343) (owner: 10BryanDavis) [10:32:14] (03PS2) 10Arturo Borrero Gonzalez: toolforge: add user requested packages [puppet] - 10https://gerrit.wikimedia.org/r/409226 (https://phabricator.wikimedia.org/T179343) (owner: 10BryanDavis) [10:36:34] !log uploaded php-luasandbox 2.0.14~stretch2 for stretch-wikimedia to apt.wikimedia.org (this removes the php-luasandbox binary from our internal luasandbox build in favour of the php-luasandbox package maintained by legoktm from stretch-backports). As such the php-luasandbox source package we build internall now only provides the HHVM extension (and we can retire it entirely when migrating to PHP7) [10:36:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:42:17] (03PS1) 10Marostegui: db-eqiad.php: Depool db1089 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409307 (https://phabricator.wikimedia.org/T162807) [10:45:07] (03CR) 10Jcrespo: [C: 031] "This looks ok: https://puppet-compiler.wmflabs.org/compiler02/9915/" [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) (owner: 10Jcrespo) [10:46:10] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1089 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409307 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [10:47:52] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1089 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409307 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [10:48:03] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1089 and db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409307 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [10:50:18] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Depool db1089 and db1067 for data checksumming - T162807 (duration: 01m 11s) [10:50:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:50:31] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [10:51:21] !log Stop replication in sync on db1067 and db1089 - T162807 [10:51:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:24] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [10:58:34] PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds [10:58:35] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [10:58:44] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [10:58:54] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [10:58:55] PROBLEM - Check systemd state on stat1005 is CRITICAL: Return code of 255 is out of bounds [11:00:44] (03PS14) 10Jcrespo: mariadb: Redo mariadb::backup class into role/profile style [puppet] - 10https://gerrit.wikimedia.org/r/409008 (https://phabricator.wikimedia.org/T184697) [11:04:49] (03PS1) 10Elukey: Initial packaging of version 0.0.4 [debs/prometheus-burrow-exporter] - 10https://gerrit.wikimedia.org/r/409308 (https://phabricator.wikimedia.org/T180442) [11:05:17] (03Abandoned) 10Elukey: Initial packaging of version 0.0.4 [debs/prometheus-burrow-exporter] - 10https://gerrit.wikimedia.org/r/409308 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [11:08:14] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds [11:10:25] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [11:10:35] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [11:10:44] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [11:10:54] RECOVERY - DPKG on stat1005 is OK: All packages OK [11:13:34] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [11:15:15] (03PS1) 10Elukey: Initial packaging [debs/prometheus-burrow-exporter] (debian) - 10https://gerrit.wikimedia.org/r/409310 (https://phabricator.wikimedia.org/T180442) [11:16:47] (03PS2) 10Elukey: Initial packaging [debs/prometheus-burrow-exporter] (debian) - 10https://gerrit.wikimedia.org/r/409310 (https://phabricator.wikimedia.org/T180442) [11:28:45] (03CR) 10Elukey: "elukey@boron:~$ lintian prometheus-burrow-exporter_0.0.4-1_amd64.changes" [debs/prometheus-burrow-exporter] (debian) - 10https://gerrit.wikimedia.org/r/409310 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [11:35:42] 10Operations, 10MediaWiki-Vagrant, 10MediaWiki-extensions-Scribunto, 10Patch-For-Review: php-luasandbox in Wikimedia's Stretch apt repo depends on php5 - https://phabricator.wikimedia.org/T183888#3957977 (10MoritzMuehlenhoff) Our internal php-luasandbox package has been rebuilt to only provide the hhvm-lua... [11:38:14] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Fri 2018-02-09 11:38:08 UTC. [11:40:41] (03PS1) 10Gilles: Upgrade to 1.12 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/409314 (https://phabricator.wikimedia.org/T186492) [11:42:45] (03CR) 10Gilles: "Did something change in the Debian build integration test?" [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/409314 (https://phabricator.wikimedia.org/T186492) (owner: 10Gilles) [11:45:24] (03CR) 10Gilles: "Ah, it's non-voting, I probably forgot that it always did that, nevermind." [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/409314 (https://phabricator.wikimedia.org/T186492) (owner: 10Gilles) [11:50:36] (03Abandoned) 10Chad: Rewrite old Special:Code urls to Phabricator SVN clones [puppet] - 10https://gerrit.wikimedia.org/r/409290 (https://phabricator.wikimedia.org/T116948) (owner: 10Chad) [12:00:45] (03PS1) 10Chad: gitiles: 2.14.6-1 [software/gerrit] - 10https://gerrit.wikimedia.org/r/409318 [12:01:53] (03PS1) 10Gilles: Improve Thumbor error logging [puppet] - 10https://gerrit.wikimedia.org/r/409319 (https://phabricator.wikimedia.org/T186492) [12:02:21] (03CR) 10jerkins-bot: [V: 04-1] Improve Thumbor error logging [puppet] - 10https://gerrit.wikimedia.org/r/409319 (https://phabricator.wikimedia.org/T186492) (owner: 10Gilles) [12:04:24] (03PS2) 10Gilles: Improve Thumbor error logging [puppet] - 10https://gerrit.wikimedia.org/r/409319 (https://phabricator.wikimedia.org/T186492) [12:10:34] paladox: 409318 was the build with your fix @0f31748 (I called it 2.14.6-1 just cuz it seemed easy) [12:13:41] (03CR) 10Filippo Giunchedi: "> Patch Set 1:" [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/409314 (https://phabricator.wikimedia.org/T186492) (owner: 10Gilles) [12:21:04] (03PS1) 10Arturo Borrero Gonzalez: apt: apt-upgrade: sort output of the list operation [puppet] - 10https://gerrit.wikimedia.org/r/409322 (https://phabricator.wikimedia.org/T181647) [12:34:43] no_justification: thanks :) [12:35:34] I will try to work on the gitiles thing on a different base url. Later today (after 4pm UK time) that’s when college finishes for me :). [12:37:14] (03CR) 10Paladox: [C: 031] "Thanks :)" [software/gerrit] - 10https://gerrit.wikimedia.org/r/409318 (owner: 10Chad) [12:40:38] no_justification: heh did you see that other David reply to your change? [12:40:46] Not yet, no [12:41:04] no_justification: I think he must have got confused [12:41:17] Or dosent realise we proxy from apache to jetty. [12:43:30] (03PS3) 10Ema: wmf-upgrade-varnish: initial release [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) [12:46:01] <_joe_> ema: heh, that could've used the switchdc spinoff :P [12:47:31] _joe_: indeed! ENOTFOUND though :) [12:47:50] <_joe_> :/ [12:49:22] I am really interested in what is best to use, eventually it would be great to have *something* to handle reboots of a cluster like Hadoop/ES/etc.. [12:53:43] 10Operations, 10Gerrit, 10Phabricator, 10Traffic, 10periodic-update: Phabricator and Gerrit: Improve the way that maintenance downtime is communicated to users. - https://phabricator.wikimedia.org/T180655#3958098 (10demon) https://gerrit.googlesource.com/plugins/motd/+/master could be useful on gerrit's... [12:57:56] <_joe_> elukey: whenever we clone volans, that could be done [12:58:06] <_joe_> for now, ETOOMANUTHINGSTODO [13:00:29] yep yep [13:03:46] (03PS2) 10Arturo Borrero Gonzalez: apt: apt-upgrade: sort output of the list operation [puppet] - 10https://gerrit.wikimedia.org/r/409322 (https://phabricator.wikimedia.org/T181647) [13:04:38] (03CR) 10Arturo Borrero Gonzalez: [C: 032] apt: apt-upgrade: sort output of the list operation [puppet] - 10https://gerrit.wikimedia.org/r/409322 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [13:11:58] (03PS1) 10Arturo Borrero Gonzalez: apt: apt-upgrade: add switch for the node name output [puppet] - 10https://gerrit.wikimedia.org/r/409323 (https://phabricator.wikimedia.org/T181647) [13:28:37] (03PS1) 10Marostegui: db-eqiad.php: Repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409325 (https://phabricator.wikimedia.org/T162807) [13:30:19] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409325 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [13:31:06] (03PS4) 10Ema: wmf-upgrade-varnish: initial release [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) [13:31:55] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409325 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [13:32:11] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1089 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409325 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [13:32:29] (03CR) 10Chad: [V: 032 C: 032] gitiles: 2.14.6-1 [software/gerrit] - 10https://gerrit.wikimedia.org/r/409318 (owner: 10Chad) [13:33:01] !log demon@tin Started deploy [gerrit/gerrit@9c0acf6]: updating gitiles plugin [13:33:02] * volans|off off trying to clone himself [13:33:11] !log demon@tin Finished deploy [gerrit/gerrit@9c0acf6]: updating gitiles plugin (duration: 00m 10s) [13:33:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:23] volans|off: When you figure out how plz share I've been stuck on that for years [13:33:25] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1089 - T162807 (duration: 01m 12s) [13:33:27] #1 blocker! [13:33:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:33:40] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [13:34:17] (03CR) 10Chad: "In addition to a service restart, this is gonna log everyone out I think :\" [puppet] - 10https://gerrit.wikimedia.org/r/409216 (owner: 10Chad) [13:34:26] (03CR) 10Ema: wmf-upgrade-varnish: initial release (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [13:34:57] no_justification: sure, will do! [13:48:47] (03PS1) 10Ema: cache::canary: test varnish downgrade on pinkunicorn [puppet] - 10https://gerrit.wikimedia.org/r/409326 [13:51:24] (03CR) 10Ema: [C: 032] cache::canary: test varnish downgrade on pinkunicorn [puppet] - 10https://gerrit.wikimedia.org/r/409326 (owner: 10Ema) [13:56:41] (03PS1) 10Ema: Revert "cache::canary: test varnish downgrade on pinkunicorn" [puppet] - 10https://gerrit.wikimedia.org/r/409329 [13:57:45] (03CR) 10Ema: [C: 032] Revert "cache::canary: test varnish downgrade on pinkunicorn" [puppet] - 10https://gerrit.wikimedia.org/r/409329 (owner: 10Ema) [14:05:51] (03PS5) 10Ema: wmf-upgrade-varnish: initial release [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) [14:06:35] (03CR) 10Ema: wmf-upgrade-varnish: initial release (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409047 (https://phabricator.wikimedia.org/T168529) (owner: 10Ema) [14:07:57] (03CR) 10Faidon Liambotis: [C: 04-1] "This is getting less pretty by every commit that touches this (global variables!). Should probably reconsider going for what was proposed " [puppet] - 10https://gerrit.wikimedia.org/r/409323 (https://phabricator.wikimedia.org/T181647) (owner: 10Arturo Borrero Gonzalez) [14:20:05] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 1.12 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/409314 (https://phabricator.wikimedia.org/T186492) (owner: 10Gilles) [14:24:57] !log demon@tin Synchronized php-1.31.0-wmf.20/tests/phpunit/includes/db/LBFactoryTest.php: no-op to prior (duration: 01m 12s) [14:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:26:46] (03PS1) 10Ema: lvs: don't bind prometheus-node-exporter on INADDR_ANY [puppet] - 10https://gerrit.wikimedia.org/r/409338 (https://phabricator.wikimedia.org/T176182) [14:41:55] 10Operations, 10ops-eqiad: check americium eth1 cabling and link - https://phabricator.wikimedia.org/T185219#3958401 (10Jgreen) Thanks, it's behaving now. [14:42:05] 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3958405 (10Jgreen) [14:42:07] 10Operations, 10ops-eqiad: check americium eth1 cabling and link - https://phabricator.wikimedia.org/T185219#3958403 (10Jgreen) 05Open>03Resolved a:03Jgreen [14:42:17] (03CR) 10Ema: "https://puppet-compiler.wmflabs.org/compiler03/9919/" [puppet] - 10https://gerrit.wikimedia.org/r/409338 (https://phabricator.wikimedia.org/T176182) (owner: 10Ema) [14:44:07] 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3958412 (10Jgreen) We've done all hosts but civi1001, frdb1001, and frdb1001 which require fundraising downtime. [14:44:16] 10Operations, 10fundraising-tech-ops, 10netops: bonded/redundant network connections for fundraising hosts - https://phabricator.wikimedia.org/T171962#3958413 (10Jgreen) 05Open>03Resolved [14:44:19] 10Operations, 10ops-eqiad, 10netops, 10Patch-For-Review: eqiad: rack frack refresh equipment - https://phabricator.wikimedia.org/T169644#3958414 (10Jgreen) [14:51:10] 10Operations: Integrate stretch 9.3 point update - https://phabricator.wikimedia.org/T182655#3958422 (10MoritzMuehlenhoff) These are fully rolled out: linux icu [14:52:34] 10Operations: Integrate jessie 8.10 point release - https://phabricator.wikimedia.org/T182656#3958423 (10MoritzMuehlenhoff) These are fully rolled out: libxtst icu libio-socket-ssl-perl [14:57:18] (03PS1) 10Chad: group1 to wmf.20 again again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409345 [15:00:26] !log upgraded mailman on fermium for security updates [15:00:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:02:37] 10Operations, 10Cassandra, 10RESTBase-Cassandra, 10Services (doing), and 2 others: Upload cassandra package(s) to wikimedia apt repository - https://phabricator.wikimedia.org/T186619#3958463 (10MoritzMuehlenhoff) >> @Eevans wrote: > Even as of right now we have versions 2.1.13 and 2.2.6, (in addition to 3.... [15:15:22] (03CR) 10Elukey: "ping :) After https://phabricator.wikimedia.org/T186510 it would be really great to prioritize this work to avoid eventlog1001's memory to" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [15:17:52] (03PS6) 10Muehlenhoff: Add support for selective automatic restarts of stateless services after library upgrades (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) [15:18:33] (03CR) 10jerkins-bot: [V: 04-1] Add support for selective automatic restarts of stateless services after library upgrades (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:22:16] (03PS7) 10Muehlenhoff: Add support for selective automatic restarts of stateless services after library upgrades (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) [15:22:47] (03CR) 10jerkins-bot: [V: 04-1] Add support for selective automatic restarts of stateless services after library upgrades (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:24:22] nice! --^ [15:29:24] (03PS8) 10Muehlenhoff: Add support for selective automatic restarts of stateless services after library upgrades (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) [15:30:20] (03CR) 10jerkins-bot: [V: 04-1] Add support for selective automatic restarts of stateless services after library upgrades (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:31:42] 10Operations, 10ops-eqiad: Missing servers in racktables - https://phabricator.wikimedia.org/T186814#3958546 (10Cmjohnson) That makes sense! A racktables task was never set. Updating racktables now with that information. Thanks! [15:32:54] (03PS1) 10Ottomata: EventLogging: emit X-Client-IP and parse as `ip` field [puppet] - 10https://gerrit.wikimedia.org/r/409354 (https://phabricator.wikimedia.org/T186833) [15:32:58] 10Operations, 10ops-eqiad: Add label to kafka1023 - https://phabricator.wikimedia.org/T186895#3958570 (10Cmjohnson) p:05Triage>03High [15:34:54] (03PS3) 10Giuseppe Lavagetto: Safely load yaml files [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 [15:34:56] (03PS4) 10Giuseppe Lavagetto: Add support for jsonschema-based entities [software/conftool] - 10https://gerrit.wikimedia.org/r/408585 [15:34:58] (03PS9) 10Muehlenhoff: Add support for selective automatic restarts of stateless services (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) [15:36:21] * moritzm shakes fist at "Line 1: First line should be <=80 characters" commit message test [15:36:37] (03CR) 10Chad: Add support for selective automatic restarts of stateless services (WIP) (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/399618 (https://phabricator.wikimedia.org/T135991) (owner: 10Muehlenhoff) [15:37:00] PROBLEM - Host dns5001.mgmt is DOWN: PING CRITICAL - Packet loss = 100% [15:37:11] ^ Very cool indeed [15:37:59] (03CR) 10Chad: [C: 031] "Will +2 if you want, but dunno what your merge plans are :)" [software/conftool] - 10https://gerrit.wikimedia.org/r/408290 (owner: 10Giuseppe Lavagetto) [15:42:10] (03CR) 10Mforns: "LGTM! But see comment, just in case I'm right :]" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409354 (https://phabricator.wikimedia.org/T186833) (owner: 10Ottomata) [15:43:42] (03CR) 10Ottomata: EventLogging: emit X-Client-IP and parse as `ip` field (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/409354 (https://phabricator.wikimedia.org/T186833) (owner: 10Ottomata) [15:44:27] (03CR) 10Ottomata: "https://gerrit.wikimedia.org/r/#/c/409350/ must be deployed first" [puppet] - 10https://gerrit.wikimedia.org/r/409354 (https://phabricator.wikimedia.org/T186833) (owner: 10Ottomata) [15:45:49] (03PS2) 10Filippo Giunchedi: WIP: check prometheus metric [puppet] - 10https://gerrit.wikimedia.org/r/409054 (https://phabricator.wikimedia.org/T181410) [15:46:13] (03CR) 10jerkins-bot: [V: 04-1] WIP: check prometheus metric [puppet] - 10https://gerrit.wikimedia.org/r/409054 (https://phabricator.wikimedia.org/T181410) (owner: 10Filippo Giunchedi) [15:46:55] (03CR) 10Mforns: [C: 031] "LGTM!" [puppet] - 10https://gerrit.wikimedia.org/r/409354 (https://phabricator.wikimedia.org/T186833) (owner: 10Ottomata) [15:47:28] !log upload etherpad-lite 1.6.3-1 to apt.wikimedia.org/jessie-wikimedia/main T186866 [15:47:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:47:41] !log upgrade etherpad.wikimedia.org to 1.6.3-1 [15:47:41] T186866: Etherpad 1.6.3 security release - https://phabricator.wikimedia.org/T186866 [15:47:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:49:14] !log upgrade etherpad.wikimedia.org to 1.6.3-1 T186866 [15:49:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:52:54] (03PS1) 10BBlack: ntp servers/peers option tweaking [puppet] - 10https://gerrit.wikimedia.org/r/409357 [15:55:30] (03PS3) 10Filippo Giunchedi: prometheus: add check prometheus metric script [puppet] - 10https://gerrit.wikimedia.org/r/409054 (https://phabricator.wikimedia.org/T181410) [15:55:59] (03CR) 10jerkins-bot: [V: 04-1] prometheus: add check prometheus metric script [puppet] - 10https://gerrit.wikimedia.org/r/409054 (https://phabricator.wikimedia.org/T181410) (owner: 10Filippo Giunchedi) [15:56:47] 10Operations, 10ops-eqsin: dns5002 mgmt console unreachable - https://phabricator.wikimedia.org/T186902#3958682 (10BBlack) p:05Triage>03High [15:56:49] (03CR) 10Filippo Giunchedi: [C: 031] lvs: don't bind prometheus-node-exporter on INADDR_ANY [puppet] - 10https://gerrit.wikimedia.org/r/409338 (https://phabricator.wikimedia.org/T176182) (owner: 10Ema) [15:58:52] 10Operations: Etherpad 1.6.3 security release - https://phabricator.wikimedia.org/T186866#3958695 (10akosiaris) 05Open>03Resolved a:03akosiaris etherpad.wikimedia.org has been updated. We should now be safe from these vulns, resolving. [16:08:40] (03CR) 10Filippo Giunchedi: [C: 031] Initial packaging [debs/prometheus-burrow-exporter] (debian) - 10https://gerrit.wikimedia.org/r/409310 (https://phabricator.wikimedia.org/T180442) (owner: 10Elukey) [16:09:42] (03PS4) 10Filippo Giunchedi: prometheus: add check prometheus metric script [puppet] - 10https://gerrit.wikimedia.org/r/409054 (https://phabricator.wikimedia.org/T181410) [16:09:48] (03CR) 10BBlack: [C: 032] ntp servers/peers option tweaking [puppet] - 10https://gerrit.wikimedia.org/r/409357 (owner: 10BBlack) [16:10:13] (03CR) 10jerkins-bot: [V: 04-1] prometheus: add check prometheus metric script [puppet] - 10https://gerrit.wikimedia.org/r/409054 (https://phabricator.wikimedia.org/T181410) (owner: 10Filippo Giunchedi) [16:13:15] (03PS1) 10Chad: Adding reviewers plugin [software/gerrit] - 10https://gerrit.wikimedia.org/r/409363 [16:13:28] (03PS1) 10Chad: Adding webhooks plugin [software/gerrit] - 10https://gerrit.wikimedia.org/r/409364 [16:20:03] (03PS1) 10Marostegui: db-eqiad.php: Repool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409366 (https://phabricator.wikimedia.org/T162807) [16:20:56] (03CR) 10Marostegui: [C: 04-2] "Server still catching up - wait for the lag to be gone" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409366 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [16:24:10] (03PS4) 10Ottomata: [WIP] point eventlogging processes at Kafka jumbo [puppet] - 10https://gerrit.wikimedia.org/r/404773 (https://phabricator.wikimedia.org/T183297) [16:25:27] (03PS11) 10Ottomata: [WIP] Refactor cache::kafka::eventlogging into profile and enable TLS [puppet] - 10https://gerrit.wikimedia.org/r/403067 (https://phabricator.wikimedia.org/T183297) [16:26:47] (03PS5) 10Ottomata: [WIP] point eventlogging processes at Kafka jumbo [puppet] - 10https://gerrit.wikimedia.org/r/404773 (https://phabricator.wikimedia.org/T183297) [16:28:00] (03CR) 10Chad: [C: 031] "Can (and should) land before 2.15. Safe but needs service restart" [puppet] - 10https://gerrit.wikimedia.org/r/409052 (owner: 10Paladox) [16:29:02] (03CR) 10Chad: [V: 032 C: 032] Fix gerrit support for latest scap version [software/gerrit] - 10https://gerrit.wikimedia.org/r/404221 (https://phabricator.wikimedia.org/T184882) (owner: 10Paladox) [16:29:34] Thanks :) [16:29:36] (03PS6) 10Dzahn: Gerrit: Set gitiles configuation to be used as the repo viewer [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [16:29:42] !log demon@tin Started deploy [gerrit/gerrit@7ca3b02]: no-op to gerrit: deploying scap config change [16:29:52] !log demon@tin Finished deploy [gerrit/gerrit@7ca3b02]: no-op to gerrit: deploying scap config change (duration: 00m 10s) [16:29:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:30:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:32:07] (03CR) 10Dzahn: [C: 032] Gerrit: Set gitiles configuation to be used as the repo viewer [puppet] - 10https://gerrit.wikimedia.org/r/401799 (https://phabricator.wikimedia.org/T184116) (owner: 10Paladox) [16:32:18] (03PS2) 10Chad: scap_source: also execute scap deploy --init [puppet] - 10https://gerrit.wikimedia.org/r/389473 (owner: 10Giuseppe Lavagetto) [16:32:24] Thanks :) [16:32:49] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409366 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [16:32:52] away with gitweb ;) [16:33:07] mutante we still use gitweb [16:33:14] Just the gitiles plugin sets it now [16:33:53] but not the entire custom section in config [16:34:04] no_justification: on tin: Your branch is ahead of 'origin/master' by 1 commit and looks like your commit [16:34:13] I would like to deploy https://gerrit.wikimedia.org/r/#/c/409366/ [16:34:21] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409366 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [16:34:22] paladox: so glad it has nothing to do with gitblit anymore [16:36:25] marostegui: Whoops, yeah I was gonna deploy but got sidetracked [16:36:34] ok, I will wait for you :) [16:36:45] Nope go on ahead [16:36:47] Tossed mine [16:36:50] ah oki [16:36:59] thanks .) [16:37:01] :) [16:37:05] yw [16:37:11] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1067 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409366 (https://phabricator.wikimedia.org/T162807) (owner: 10Marostegui) [16:37:37] Heh [16:38:29] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1067 - T162807 (duration: 01m 12s) [16:38:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:38:42] T162807: Run pt-table-checksum on s1 (enwiki) - https://phabricator.wikimedia.org/T162807 [16:38:43] I am done no_justification :-) [16:43:02] (03PS3) 10Dzahn: hiera/wmflib/pybal: rename ganglia_clusters to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/406794 (https://phabricator.wikimedia.org/T177225) [16:43:39] <_joe_> mutante: there is a race condition with that change [16:43:54] <_joe_> if I'm not wrong [16:44:19] _joe_: i compiled that on '*' http://puppet-compiler.wmflabs.org/9860/ [16:44:49] <_joe_> mutante: yeah the problem might be with collected resources, let me check [16:44:58] oh! thank you [16:46:58] <_joe_> mutante: no, it's actually ok, my memory of the code was wrong, I already fixed the wtf we had there [16:47:03] <_joe_> so it's all good [16:47:50] (03PS1) 10Alexandros Kosiaris: Add runtime dependency to pkg_resources [software/service-checker] - 10https://gerrit.wikimedia.org/r/409374 [16:48:12] yay! so.. i'll go ahead! last night i almost did it and then decided to at least wait until more people around, heh [16:48:38] (03CR) 10Dzahn: [C: 032] hiera/wmflib/pybal: rename ganglia_clusters to wikimedia_clusters [puppet] - 10https://gerrit.wikimedia.org/r/406794 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [16:50:02] (03CR) 10Alexandros Kosiaris: "Aha, I have to wonder how I missed that despite your mentiond. Anyway https://gerrit.wikimedia.org/r/409374" [docker-images/production-images] - 10https://gerrit.wikimedia.org/r/405205 (https://phabricator.wikimedia.org/T184220) (owner: 10Dduvall) [16:50:15] all those fails on the "*" compiler output are unrelated things btw.. i even fixed some of them but not all [16:50:35] no issue on lvs1001 [16:51:15] !log andrew@tin Started deploy [horizon/deploy@de72527]: Rolling out pyldap wheel [16:51:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:52:16] this was the last remnant of Ganglia, hooray [16:53:13] (03Abandoned) 10Dzahn: hiera/wmflib: drop ganglia_clusters variable entirely? [puppet] - 10https://gerrit.wikimedia.org/r/382931 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [16:53:40] !log andrew@tin Finished deploy [horizon/deploy@de72527]: Rolling out pyldap wheel (duration: 02m 26s) [16:53:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:54:20] (03Abandoned) 10Dzahn: pybal: use lvs::config not ganglia_clusters to determine if appserver [puppet] - 10https://gerrit.wikimedia.org/r/382930 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [17:01:05] (03CR) 10Paladox: [C: 031] Gerrit: Set cookie path to / [puppet] - 10https://gerrit.wikimedia.org/r/409216 (owner: 10Chad) [17:03:30] godog: currently wondering about /usr/local/bin/prometheus-ganglia-gen afaict we use that and it would just be another candidate to rename because of the word "ganglia" in it (nowadays) [17:11:46] no_justification your cookie thingy works [17:11:53] at least it worked for me on https://gerrit.git.wmflabs.org/g/ [17:12:05] (03CR) 10Chad: [C: 032] group1 to wmf.20 again again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409345 (owner: 10Chad) [17:13:12] paladox: I think your patch version of gitiles is broken :\ [17:13:21] no_justification oh, i am using your patch [17:13:23] from upstream [17:13:27] I mean in prod ;-) [17:13:38] (03Merged) 10jenkins-bot: group1 to wmf.20 again again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409345 (owner: 10Chad) [17:14:12] See, missing gitiles but there's an almost unclickable link where it should be https://usercontent.irccloud-cdn.com/file/wIS4qOv3/gerrit_bad_links.png [17:14:51] gitiles 0f31748 Enabled [17:15:42] no_justification oh [17:15:56] * paladox goes hunting for fixes [17:16:22] no_justification aha [17:16:27] your looking in the wrong place [17:16:36] with gitiles, they do it for the change number [17:16:45] (yeh this is confusing) :) [17:17:12] no_justification something like https://phabricator.wikimedia.org/F13389253 [17:17:12] (03CR) 10jenkins-bot: group1 to wmf.20 again again [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409345 (owner: 10Chad) [17:17:30] Ahhh, I see it now [17:17:45] If I swap back to gwtui I see it in the old spot, but I see the new spot in polygerrit no [17:17:50] Just different, but makes more sense tbh [17:18:18] yeh [17:18:43] no_justification also, i think we should try and force everyone to be logged out before merging your cookie change [17:19:03] !log demon@tin rebuilt and synchronized wikiversions files: group1 to wmf.20 [17:19:14] kind of broke it for me, ie when i tryed to sign in on one browser that was signed in, it broke it for me. So i had to remove it from the cookie store. [17:19:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:20:13] no_justification woo hoo [17:20:15] your change works [17:20:19] https://gerrit.example.com/gitiles//test/+/5bbe440622b309e553ac73a8d4db8ac23bcd41f4 [17:20:57] AaronSchulz: https://phabricator.wikimedia.org/P6675 ugh [17:21:05] Not nearly as bad as before, but still ugh [17:21:48] yay, works correctly now. [17:21:57] And i have +2 rights on that repo no_justification [17:21:59] (gitiles) [17:22:02] plugins/gitiles [17:22:51] no_justification left a comment on your change, to fix the doc [17:23:10] (03PS1) 10Dzahn: wmflib/prometheus: get_clusters, update Ganglia related comments [puppet] - 10https://gerrit.wikimedia.org/r/409384 (https://phabricator.wikimedia.org/T177225) [17:24:42] no_justification see https://gerrit.git.wmflabs.org/r/#/c/58/ [17:26:10] Yay, my plans all worked! [17:28:57] (03Draft1) 10Paladox: Gerrit: Set gerrit.baseUrl in gitiles.config [puppet] - 10https://gerrit.wikimedia.org/r/409385 [17:29:00] (03PS2) 10Paladox: Gerrit: Set gerrit.baseUrl in gitiles.config [puppet] - 10https://gerrit.wikimedia.org/r/409385 [17:29:01] no_justification ^^ [17:29:12] (03CR) 10Dzahn: [C: 032] "comments only" [puppet] - 10https://gerrit.wikimedia.org/r/409384 (https://phabricator.wikimedia.org/T177225) (owner: 10Dzahn) [17:31:32] no_justification just need's you to update the doc here https://gerrit-review.googlesource.com/#/c/plugins/gitiles/+/158630/ and i can merge. [17:34:23] paladox: re: https://gerrit.wikimedia.org/r/#/c/409052/ did you make upstream add that config option? heh.. just recently it didnt exist, right [17:35:04] mutante i backported the change, and said it was a wmf feature request as some users from wmf did not want private changes in gerrit. [17:35:06] no_justification: odd [17:36:33] (03CR) 10Chad: "Should be rebased on top of I7d4a83f, requires that." [puppet] - 10https://gerrit.wikimedia.org/r/409385 (owner: 10Paladox) [17:36:53] (03PS3) 10Paladox: Gerrit: Set gerrit.baseUrl in gitiles.config [puppet] - 10https://gerrit.wikimedia.org/r/409385 [17:36:57] paladox: very nice! thanks [17:37:01] Yep :) [17:37:25] AaronSchulz: Yeah, seems to have passed though? [17:37:27] Idk [17:37:29] I'm over wmf.20 [17:37:31] It's jinxed [17:38:12] no_justification lol, i think that fix uri thing dosen't work [17:38:13] heh [17:38:14] https://gerrit.git.wmflabs.org/r//r/plugins/gitiles/test/+/62f81e43c3949cca541b1a0d951d93a7d36c7d56 [17:38:22] (03CR) 10Dzahn: [C: 031] Gerrit: Set change.disablePrivateChanges to true [puppet] - 10https://gerrit.wikimedia.org/r/409052 (owner: 10Paladox) [17:38:36] anyways your change works. [17:41:20] no_justification merged [17:42:11] (03PS1) 10Dzahn: prometheus: ganglia-gen outdated resource names (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/409390 [17:47:17] (03PS5) 10BryanDavis: Remove access for myself [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [17:47:19] (03PS1) 10BryanDavis: nagios: Add WMCS team to team-paws contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/409393 [17:50:42] no_justification: __destruct() only triggers those methods for sanity...normally shutdown() should have triggered them before the request objects start getting destroyed. I see a callback triggered there, which is kind of late to be calling wfGetDB() and friends. I've seen this kind of thing in job traces before, though I never found out what makes that sometimes happen...maybe some exception being caught poorly by [17:50:42] something. [17:51:12] (03CR) 10BryanDavis: "> @yuvipanda: I also added your user to the absented group in PS4." [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [17:51:42] (03PS2) 10BryanDavis: nagios: Add WMCS team to team-paws contactgroup [puppet] - 10https://gerrit.wikimedia.org/r/409393 [17:51:44] (03PS6) 10BryanDavis: Remove access for myself [puppet] - 10https://gerrit.wikimedia.org/r/407577 (https://phabricator.wikimedia.org/T186289) (owner: 10Yuvipanda) [17:53:57] (03CR) 10Dzahn: "@bd808 i checked if all the contacts exist in private repo and they do with one exception, there isn't a bd808 there yet it seems" [puppet] - 10https://gerrit.wikimedia.org/r/409393 (owner: 10BryanDavis) [17:55:37] AaronSchulz: Fair 'nuff. This release has been fun anyway [17:55:42] I think it's gone. Maybe transient. [17:56:43] (03PS1) 10Chad: group2 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409394 [17:56:57] (03CR) 10Chad: [C: 04-2] "Gonna let group1 sit a little longer first" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409394 (owner: 10Chad) [17:59:01] (03CR) 10jerkins-bot: [V: 04-1] group2 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409394 (owner: 10Chad) [18:00:13] (03CR) 10Paladox: [C: 031] "We can merge this. Im hopping this will be safe." [puppet] - 10https://gerrit.wikimedia.org/r/409216 (owner: 10Chad) [18:00:24] (03CR) 10Dzahn: [C: 032] "added an Icinga contact bd808 in private repo" [puppet] - 10https://gerrit.wikimedia.org/r/409393 (owner: 10BryanDavis) [18:01:09] RECOVERY - IPMI Sensor Status on helium is OK: Sensor Type(s) Temperature, Power_Supply Status: OK [18:08:44] PROBLEM - configured eth on stat1005 is CRITICAL: Return code of 255 is out of bounds [18:08:45] (03CR) 10Chad: "Well everyone's probably gonna get logged out :\" [puppet] - 10https://gerrit.wikimedia.org/r/409216 (owner: 10Chad) [18:09:12] PROBLEM - MD RAID on stat1005 is CRITICAL: Return code of 255 is out of bounds [18:09:22] PROBLEM - dhclient process on stat1005 is CRITICAL: Return code of 255 is out of bounds [18:09:32] PROBLEM - DPKG on stat1005 is CRITICAL: Return code of 255 is out of bounds [18:10:13] PROBLEM - Check the NTP synchronisation status of timesyncd on stat1005 is CRITICAL: Return code of 255 is out of bounds [18:10:42] PROBLEM - puppet last run on stat1005 is CRITICAL: Return code of 255 is out of bounds [18:10:50] (03Draft1) 10Paladox: Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 [18:10:55] (03PS2) 10Paladox: Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 [18:10:59] no_justification ^^ [18:12:42] !log demon@tin Synchronized php-1.31.0-wmf.20/includes/filerepo/file/LocalFile.php: Fix CommentStore->createComment() call in LocalFile.php (duration: 01m 12s) [18:12:50] (03CR) 10Paladox: [C: 031] Adding reviewers plugin [software/gerrit] - 10https://gerrit.wikimedia.org/r/409363 (owner: 10Chad) [18:12:57] anomie: Fix live, thx for the quick patch ^ [18:12:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:17:08] (03CR) 10Chad: [C: 04-2] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409394 (owner: 10Chad) [18:20:38] (03PS1) 10Dzahn: icinga: optimize new notification command SMS content [puppet] - 10https://gerrit.wikimedia.org/r/409400 (https://phabricator.wikimedia.org/T185862) [18:24:54] (03CR) 10Chad: [C: 032] group2 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409394 (owner: 10Chad) [18:25:52] RECOVERY - configured eth on stat1005 is OK: OK - interfaces up [18:26:12] RECOVERY - MD RAID on stat1005 is OK: OK: Active: 8, Working: 8, Failed: 0, Spare: 0 [18:26:18] (03Merged) 10jenkins-bot: group2 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409394 (owner: 10Chad) [18:26:22] RECOVERY - dhclient process on stat1005 is OK: PROCS OK: 0 processes with command name dhclient [18:26:32] RECOVERY - DPKG on stat1005 is OK: All packages OK [18:30:42] RECOVERY - puppet last run on stat1005 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [18:31:08] !log demon@tin rebuilt and synchronized wikiversions files: group2 to wmf.20 [18:31:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:31:30] (03PS1) 10Herron: change wikipedia.com zone from symlink to file [dns] - 10https://gerrit.wikimedia.org/r/409405 (https://phabricator.wikimedia.org/T184230) [18:31:32] (03PS1) 10Herron: change wikipedia.com SPF record to fail all (-all) [dns] - 10https://gerrit.wikimedia.org/r/409406 (https://phabricator.wikimedia.org/T184230) [18:31:34] (03PS1) 10Herron: change wikipedia.com DMARC domain and subdomain policies to reject [dns] - 10https://gerrit.wikimedia.org/r/409407 (https://phabricator.wikimedia.org/T184230) [18:32:42] (03CR) 10jenkins-bot: group2 to wmf.20 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409394 (owner: 10Chad) [18:38:16] no_justification: judging from what I suspected and Wn3hQgpAMFcAACvokewAAACD on logstash, I think the CommentStore error was related [18:40:13] RECOVERY - Check the NTP synchronisation status of timesyncd on stat1005 is OK: OK: synced at Fri 2018-02-09 18:40:09 UTC. [18:41:24] AaronSchulz: Gotcha [18:41:51] still shouldn't happen ideally though...something needs some cleanup in the deletion code [18:43:58] (03PS1) 10Ayounsi: Add new asw2-a/b/c-eqiad mgmt IPs [dns] - 10https://gerrit.wikimedia.org/r/409410 [18:49:02] (03CR) 10Ayounsi: [C: 032] Add new asw2-a/b/c-eqiad mgmt IPs [dns] - 10https://gerrit.wikimedia.org/r/409410 (owner: 10Ayounsi) [18:49:19] There's also a fair number of "Expectation (writes <= 0) by MediaWiki::restInPeace not met (actual: 3):" [18:49:29] They're bubbling up on this one graph now that I cleared out something out [18:49:32] *else [18:50:03] echo, echo, echo page_touched, site_stats, watchlist, page_links [18:50:05] *sigh* [18:58:26] (03CR) 10Krinkle: "The main thing I'd like feedback on is how to combine the Kafka-way of consuming with the poll loop that is going on in coal. The presence" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [19:02:43] 10Operations, 10Datasets-General-or-Unknown, 10hardware-requests: Replace snapshot1001 with a proper testbed host (new hardware) - https://phabricator.wikimedia.org/T184616#3959144 (10RobH) [19:02:46] 10Operations, 10Datasets-General-or-Unknown, 10hardware-requests: Replace snapshot1001 with a proper testbed host (new hardware) - https://phabricator.wikimedia.org/T184616#3959145 (10ArielGlenn) Well it's annual planning time so let's see if we can sneak this into the budget. Here's what I need: this shoul... [19:03:56] (03CR) 10Dzahn: [C: 032] "so far just affects me and my test host" [puppet] - 10https://gerrit.wikimedia.org/r/409400 (https://phabricator.wikimedia.org/T185862) (owner: 10Dzahn) [19:06:47] (03PS1) 10Dzahn: icinga: fix typo in new notification command [puppet] - 10https://gerrit.wikimedia.org/r/409413 [19:14:46] mutante wondering if you could review https://gerrit.wikimedia.org/r/409363 please? :) [19:15:55] (03CR) 10Dzahn: [C: 032] icinga: fix typo in new notification command [puppet] - 10https://gerrit.wikimedia.org/r/409413 (owner: 10Dzahn) [19:16:10] !log demon@tin Synchronized php-1.31.0-wmf.20/extensions/Scribunto/common/Hooks.php: silence divide by zero / no such index 0 errors (duration: 00m 56s) [19:16:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:16:53] PROBLEM - HHVM jobrunner on mw1338 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:19:52] RECOVERY - HHVM jobrunner on mw1338 is OK: HTTP OK: HTTP/1.1 200 OK - 206 bytes in 0.002 second response time [19:20:00] no_justification: https://gerrit.wikimedia.org/r/#/c/409418/ should help with those occasional atomic section errors in general [19:21:02] paladox: did you really mean the one in operations/software/gerrit repo [19:21:22] mutante uh woops [19:21:25] wrong one [19:21:43] mutante https://gerrit.wikimedia.org/r/c/409399/ [19:22:13] 10Operations, 10Mail, 10Patch-For-Review: Disavow emails from wikipedia.com - https://phabricator.wikimedia.org/T184230#3876973 (10Krinkle) @herron As follow-up, we should probably remove DNS entries for subdomains that don't have redirects configured. I checked the ones listed under "Other" as starting poi... [19:25:56] (03CR) 10Jforrester: "Yay, less cruft." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409246 (https://phabricator.wikimedia.org/T186859) (owner: 10Chad) [19:26:43] no_justification: hmm, I can kill the clearSharedCache() warnings pretty easily [19:28:25] paladox: can i have some more context please.. like a ticket link or more commit message what it is for or upstream link explaining the option [19:29:54] (03PS3) 10Paladox: Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 [19:29:58] mutante done [19:30:24] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 (owner: 10Paladox) [19:30:45] (03PS4) 10Paladox: Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 [19:30:51] (03PS5) 10Paladox: Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 [19:34:02] no_justification: https://gerrit.wikimedia.org/r/#/c/409421/ [19:40:07] (03CR) 10Dzahn: "thanks for adding more detail. i see Chad is installing the plugin in https://gerrit.wikimedia.org/r/#/c/409363/ seems like they shoul" [puppet] - 10https://gerrit.wikimedia.org/r/409399 (owner: 10Paladox) [19:42:06] 10Operations, 10Goal, 10Patch-For-Review, 10User-fgiunchedi, 10cloud-services-team (Kanban): Port non-deprecated Diamond collectors to Prometheus - https://phabricator.wikimedia.org/T177196#3959213 (10debt) [19:42:12] 10Operations, 10Discovery-Search (Current work), 10Goal, 10Patch-For-Review, and 2 others: Port elasticsearch metrics to Prometheus - https://phabricator.wikimedia.org/T181627#3959212 (10debt) 05Open>03Resolved [19:42:23] PROBLEM - Apache HTTP on mw2123 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [19:43:14] RECOVERY - Apache HTTP on mw2123 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 616 bytes in 0.123 second response time [19:45:47] (03CR) 10Chad: [C: 031] "I'd suggest adding `ignoreDrafts = true` as well. Otherwise lgtm and can land without a restart" [puppet] - 10https://gerrit.wikimedia.org/r/409399 (owner: 10Paladox) [19:46:25] no_justification to reviewers.config? [19:46:45] I mean we don't have to, but we discourage drafts anyway :p [19:46:53] That's why I put a +1, tis optional for now, idc [19:47:03] no_justification ok, i can add ignoreDrafts = true [19:47:05] 10Operations, 10ops-eqsin: dns5002 mgmt console unreachable - https://phabricator.wikimedia.org/T186902#3959221 (10RobH) Draft email to equinix singapore smarthands directions: > Support, > > We're unable to access one of our systems remotely, named dns5002, in rack 06:040020:0604, U 29. This means that e... [19:47:09] to reviewers.config [19:47:31] (03PS6) 10Paladox: Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 [19:47:31] done [19:48:59] no_justification ^^ :) [19:53:43] (03CR) 10Dzahn: "if those are not set it shows up in the SMS content as a single "."" [puppet] - 10https://gerrit.wikimedia.org/r/406535 (https://phabricator.wikimedia.org/T185862) (owner: 10Dzahn) [19:56:05] (03CR) 10Dzahn: [C: 032] Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 (owner: 10Paladox) [19:56:14] (03PS7) 10Dzahn: Gerrit: Add reviewers.config to replace git/review wiki page [puppet] - 10https://gerrit.wikimedia.org/r/409399 (owner: 10Paladox) [19:56:21] thanks :) [19:56:52] discourages drafts, heh :) [19:57:09] heh [19:57:21] Hi ops [19:58:09] there's a CentralNotice bug that's showing random rows from comment_rc_comment in a couple log pages on meta: T186905 [19:58:09] T186905: Weird entries in CN Banner content log - https://phabricator.wikimedia.org/T186905 [19:58:28] potentially including suppressed stuff, I guess [19:58:44] (03PS4) 10Dzahn: graphite/performance::site: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409200 [19:58:50] anomie has already patched the bug [19:59:37] but I'm a bit leary to deploy CentralNotice updates on a Friday [19:59:59] and wonder if there's a simpler settings change like blacklisting that special page [20:00:02] just for the weekend [20:00:13] What would you recommend? [20:00:38] I'd recommend deploying [20:00:43] (I've been deploying stuff all day) [20:00:51] ejegg: There's no setting change that'll fix it, unless maybe it was to disable that special page entirely. [20:01:09] I can deploy it if you're feeling apprehensive [20:01:18] heh, cool, I need to dust off my deployment skills anyway. [20:01:21] Just go for the backport, IMO, it's not a complicate change. [20:01:32] OK, cool [20:01:59] cherry-picked to wmf.20 for you [20:02:05] https://gerrit.wikimedia.org/r/c/409420/ [20:02:07] * greg-g looks around for the "no deploys on Friday police" [20:02:10] * greg-g sees himself in the mirror [20:02:15] (03CR) 10Dzahn: [C: 032] "compiled on graphite hosts, no errors" [puppet] - 10https://gerrit.wikimedia.org/r/409200 (owner: 10Dzahn) [20:03:31] Um, or did it not? [20:03:43] ah, CentralNotice is wierd [20:03:45] Oh yeah, there isn't a wmf. branch [20:03:54] I should know this [20:03:57] it's this wmf_deploy branch [20:03:58] (03CR) 10Dzahn: [C: 032] "noop" [puppet] - 10https://gerrit.wikimedia.org/r/409200 (owner: 10Dzahn) [20:04:09] since we have to coordinate across wiki families [20:04:12] greg-g: Hey, train's done at least! :D [20:04:37] you know, I'm happy about that, unhappy it took so long, but, that's our job [20:04:39] just cherry-picked it there [20:04:59] ejegg: Ok, can you wait to merge for a few minutes? I've already got two merges landing right now [20:05:08] yeah, no worries [20:07:06] !log demon@tin Synchronized php-1.31.0-wmf.20/includes/MediaWiki.php: Catch Error exceptions in MediaWiki::run() (duration: 00m 57s) [20:07:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:07:23] PROBLEM - puppet last run on graphite2002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:08:15] !log demon@tin Synchronized php-1.31.0-wmf.20/includes/libs/rdbms/loadbalancer/LoadBalancer.php: Catch Error exceptions in MediaWiki::run() (duration: 00m 55s) [20:08:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:09:04] 10Operations, 10Wikimedia-Logstash, 10hardware-requests: decommission logstash100[1-3] - https://phabricator.wikimedia.org/T175830#3959326 (10RobH) [20:12:30] 10Operations: prometheus: ganglia-gen and outdated Ganglia:cluster resource name - https://phabricator.wikimedia.org/T186918#3959330 (10Dzahn) [20:12:43] !log demon@tin Synchronized php-1.31.0-wmf.20/includes/user/User.php: Avoid pointless DB_MASTER connections in User::clearSharedCache() (duration: 00m 55s) [20:12:47] ejegg: Scap's all yours, I'm outta the way for awhile [20:12:51] (I'm gonna grab lunch) [20:12:54] thanks! [20:12:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:13:08] quiddity: re: your comment earlier...sticking with wmf.20, so rolled forward [20:13:24] (03PS1) 10RobH: logstash100[1-3] decommission [puppet] - 10https://gerrit.wikimedia.org/r/409433 (https://phabricator.wikimedia.org/T175830) [20:13:36] There's a crapton of follow-up tasks (wmf.20 has a lot of minor but noisy bugs), but not enough for me to roll back and lose a week [20:13:43] (03PS2) 10Dzahn: prometheus: ganglia-gen outdated resource names (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/409390 (https://phabricator.wikimedia.org/T186918) [20:13:50] k [20:14:20] (03CR) 10RobH: [C: 032] logstash100[1-3] decommission [puppet] - 10https://gerrit.wikimedia.org/r/409433 (https://phabricator.wikimedia.org/T175830) (owner: 10RobH) [20:14:23] 10Operations, 10Patch-For-Review, 10Prometheus-metrics-monitoring: prometheus: ganglia-gen and outdated Ganglia:cluster resource name - https://phabricator.wikimedia.org/T186918#3959349 (10Dzahn) [20:14:29] ty again :) [20:16:32] (03PS1) 10RobH: decom logstash100[1-3] prod dns [dns] - 10https://gerrit.wikimedia.org/r/409434 (https://phabricator.wikimedia.org/T175830) [20:16:43] 10Operations, 10Analytics-Kanban, 10User-Elukey: Expand meitnerium's root partition to 100G - https://phabricator.wikimedia.org/T186020#3959374 (10Dzahn) 05Open>03Resolved [20:17:17] (03CR) 10RobH: [C: 032] decom logstash100[1-3] prod dns [dns] - 10https://gerrit.wikimedia.org/r/409434 (https://phabricator.wikimedia.org/T175830) (owner: 10RobH) [20:20:20] 10Operations, 10Wikimedia-Logstash, 10hardware-requests: decommission logstash100[1-3] - https://phabricator.wikimedia.org/T175830#3959378 (10RobH) [20:20:30] 10Operations, 10ops-eqiad, 10Wikimedia-Logstash, 10hardware-requests: decommission logstash100[1-3] - https://phabricator.wikimedia.org/T175830#3604520 (10RobH) [20:21:19] 10Operations, 10ops-eqiad, 10Wikimedia-Logstash, 10hardware-requests: decommission logstash100[1-3] - https://phabricator.wikimedia.org/T175830#3604520 (10RobH) a:05RobH>03Cmjohnson ready for on-site wipe and unracking steps [20:25:06] ejegg: deploying the fix? [20:25:21] yeah [20:39:43] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:43:18] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom spare server caesium - https://phabricator.wikimedia.org/T182805#3959455 (10RobH) a:05RobH>03Cmjohnson [20:43:32] 10Operations, 10ops-eqiad, 10DC-Ops, 10hardware-requests: decom spare server caesium - https://phabricator.wikimedia.org/T182805#3834960 (10RobH) [20:44:26] ejegg: How's it going? [20:44:54] submodule pointer update just merged, about to hop onto deploy server [20:46:47] kk! [20:58:58] (03PS1) 10Chad: Updating gitiles to stable-2.14's head [software/gerrit] - 10https://gerrit.wikimedia.org/r/409442 [21:00:32] (03PS1) 10Herron: WIP: puppet-facts-export.py: support puppetdb version 4 [puppet] - 10https://gerrit.wikimedia.org/r/409443 [21:01:08] (03CR) 10Paladox: [C: 031] "thank you :)" [software/gerrit] - 10https://gerrit.wikimedia.org/r/409442 (owner: 10Chad) [21:01:40] (03CR) 10jerkins-bot: [V: 04-1] WIP: puppet-facts-export.py: support puppetdb version 4 [puppet] - 10https://gerrit.wikimedia.org/r/409443 (owner: 10Herron) [21:02:50] no_justification: are the deploys done for today? [21:03:10] ejegg is finishing his, I had 2 more lined up [21:03:14] Whatsup? [21:04:51] (03PS2) 10Herron: WIP: puppet-facts-export.py: support puppetdb version 4 [puppet] - 10https://gerrit.wikimedia.org/r/409443 [21:04:53] some mobile apis are broken, want to do some live debugging [21:06:15] (03PS1) 10MarcoAurelio: Log accessing private abusefilter details [mediawiki-config] - 10https://gerrit.wikimedia.org/r/409445 (https://phabricator.wikimedia.org/T160357) [21:10:29] tgr: Lemme deploy these last 2 things [21:10:45] scap's just waiting for canary traffic [21:10:50] !log ejegg@tin Synchronized php-1.31.0-wmf.20/extensions/CentralNotice/CentralNoticePageLogPager.php: Sync CentralNotice for banner content log fix (duration: 00m 56s) [21:10:57] ok, done! [21:11:02] no_justification ^^^ [21:11:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:16:52] (03CR) 10Ottomata: [C: 031] "Perhaps you can use kafka-python consumer poll() method?" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [21:20:49] !log demon@tin Synchronized php-1.31.0-wmf.20/extensions/Flow/includes/Block/TopicList.php: T186911 (duration: 00m 55s) [21:21:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:21:03] T186911: Undefined index: content (in TopicList) - https://phabricator.wikimedia.org/T186911 [21:21:12] James_F: live ^ [21:28:56] !log demon@tin Synchronized php-1.31.0-wmf.20/extensions/AbuseFilter/includes/api/ApiQueryAbuseLog.php: T186914 (duration: 00m 54s) [21:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:29:13] T186914: ApiQueryAbuseLog: Undefined index wiki - https://phabricator.wikimedia.org/T186914 [21:29:27] (03CR) 10Imarlier: "> Perhaps you can use kafka-python consumer poll() method?" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [21:29:58] legoktm: You're live too ^ [21:30:04] :D [21:30:35] Numbers going down. I'm afk for awhile [21:30:37] I need a beer [21:34:14] (03CR) 10Ottomata: [C: 031] "Cool! don't know much about it, but also consider confluent-kafka-python. It uses librdkafka, and we've got it available via a .deb alre" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [21:39:57] I'll do some live debugging [21:40:12] !log andrew@tin Started deploy [horizon/deploy@de72527]: At this point I'm just hoping scap will really deploy the wheels on my second try [21:40:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:40:26] !log andrew@tin Finished deploy [horizon/deploy@de72527]: At this point I'm just hoping scap will really deploy the wheels on my second try (duration: 00m 14s) [21:40:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:41:41] (03CR) 10Imarlier: "> Cool! don't know much about it, but also consider" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [21:42:10] no_justification: andrewbogott: I'm doing some live-editing of mw-staging on tin [21:42:22] (I hope andrew@tin is andrewbogott) [21:42:54] tgr: yeah, that's me. I don't think what you're doing affects me though, unless I misunderstand [21:45:08] andrewbogott: I don't think either, just in case [21:58:24] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3959677 (10Halfak) I found this in our deploy repo. {P6677} Not sure what is going on as this change was not submitted to gerrit AFAICT [22:08:35] !log andrew@tin Started deploy [horizon/deploy@de72527]: Doing this while halfaker watches [22:08:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:08:53] !log andrew@tin Finished deploy [horizon/deploy@de72527]: Doing this while halfaker watches (duration: 00m 17s) [22:09:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:10:32] !log andrew@tin Started deploy [horizon/deploy@de72527]: Doing this while halfaker watches, again [22:10:36] !log andrew@tin Finished deploy [horizon/deploy@de72527]: Doing this while halfaker watches, again (duration: 00m 03s) [22:10:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:11:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:12:33] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3959698 (10RobH) a:05Cmjohnson>03RobH [22:15:29] (03PS1) 10RobH: db1030 decom [dns] - 10https://gerrit.wikimedia.org/r/409451 (https://phabricator.wikimedia.org/T184397) [22:15:41] (03CR) 10jerkins-bot: [V: 04-1] db1030 decom [dns] - 10https://gerrit.wikimedia.org/r/409451 (https://phabricator.wikimedia.org/T184397) (owner: 10RobH) [22:16:25] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10hardware-requests: eqiad: (2) systems for labstore expansion (labstore1008 & labstore1009) - https://phabricator.wikimedia.org/T186931#3959717 (10chasemp) p:05Triage>03Normal [22:17:39] (03PS2) 10RobH: db1030 decom [dns] - 10https://gerrit.wikimedia.org/r/409451 (https://phabricator.wikimedia.org/T184397) [22:18:28] (03CR) 10RobH: [C: 032] db1030 decom [dns] - 10https://gerrit.wikimedia.org/r/409451 (https://phabricator.wikimedia.org/T184397) (owner: 10RobH) [22:19:30] 10Operations, 10Cloud-VPS, 10cloud-services-team, 10hardware-requests: eqiad: (2) systems for labstore expansion (labstore1008 & labstore1009) - https://phabricator.wikimedia.org/T186931#3959733 (10chasemp) [22:20:25] (03PS1) 10RobH: decom db1030 [puppet] - 10https://gerrit.wikimedia.org/r/409454 (https://phabricator.wikimedia.org/T184397) [22:20:55] (03CR) 10RobH: [C: 032] decom db1030 [puppet] - 10https://gerrit.wikimedia.org/r/409454 (https://phabricator.wikimedia.org/T184397) (owner: 10RobH) [22:21:49] (03CR) 10Ottomata: [C: 031] "There's a pretty hard and fast rule about not deploying from Pip. But, you can build a wheels based deployable with all dependencies prep" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [22:23:55] (03CR) 10Ottomata: [C: 031] "Also, in my experience, building python .debs hasn't been too hard. https://wikitech.wikimedia.org/wiki/Git-buildpackage#How_to_build_a_P" [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [22:24:21] 10Operations, 10ORES, 10Patch-For-Review, 10Scoring-platform-team (Current): Reimage ores* hosts with Debian Stretch - https://phabricator.wikimedia.org/T171851#3959737 (10Halfak) OK I put all of the changes in "alex_stuff" ``` halfak@tin:/srv/deployment/ores/deploy$ git branch -l CELERY_4 STABLE S... [22:25:03] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1030 - https://phabricator.wikimedia.org/T184397#3959738 (10RobH) a:05RobH>03Cmjohnson [22:26:58] !log halfak@tin Started deploy [ores/deploy@c98ec8b]: (non-production) experimenting with stretch deploy T185901 [22:27:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:27:15] T185901: Preliminary deployment of ORES to new cluster - https://phabricator.wikimedia.org/T185901 [22:27:45] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1039 - https://phabricator.wikimedia.org/T184262#3877626 (10RobH) a:05Cmjohnson>03RobH [22:29:05] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1029 and db1031 - https://phabricator.wikimedia.org/T184054#3959750 (10RobH) a:05Cmjohnson>03RobH [22:29:41] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests, 10Patch-For-Review: Decommission db1034 - https://phabricator.wikimedia.org/T182556#3959752 (10RobH) a:05Cmjohnson>03RobH [22:30:18] PROBLEM - puppet last run on baham is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:30:18] PROBLEM - puppet last run on hafnium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:30:20] (03CR) 10Imarlier: "> Also, in my experience, building python .debs hasn't been too hard." [puppet] - 10https://gerrit.wikimedia.org/r/403560 (https://phabricator.wikimedia.org/T110903) (owner: 10Krinkle) [22:30:23] 10Operations, 10ops-eqiad, 10Packaging, 10hardware-requests: Decommission host copper.eqiad.wmnet - https://phabricator.wikimedia.org/T176957#3959754 (10RobH) a:05Cmjohnson>03RobH [22:30:39] PROBLEM - puppet last run on wtp1026 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:30:45] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission ocg1001-3 - https://phabricator.wikimedia.org/T177958#3959756 (10RobH) a:03RobH [22:30:58] PROBLEM - puppet last run on db1072 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:30:59] 10Operations, 10ops-eqiad, 10hardware-requests: Decommission ocg1001-3 - https://phabricator.wikimedia.org/T177958#3676074 (10RobH) stealing this, will add in the checklist and manually verify the steps. [22:31:18] PROBLEM - puppet last run on restbase1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:31:19] PROBLEM - puppet last run on conf1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:31:25] puppetdb? [22:31:39] PROBLEM - puppet last run on mw1238 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:31:57] 10Operations, 10Analytics, 10hardware-requests, 10Patch-For-Review: Decommission stat1002.eqiad.wmnet - https://phabricator.wikimedia.org/T173097#3959759 (10RobH) a:03RobH [22:32:09] PROBLEM - puppet last run on labvirt1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:32:13] 10Operations, 10ops-esams, 10hardware-requests: Decommission cp300[3456] - https://phabricator.wikimedia.org/T167376#3959760 (10RobH) a:03RobH [22:32:28] PROBLEM - puppet last run on db1073 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:32:36] herron ^^ [22:32:39] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:32:41] or mutante ^^ [22:32:59] PROBLEM - puppet last run on labvirt1013 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:33:09] PROBLEM - puppet last run on conf1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:33:28] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:33:48] PROBLEM - puppet last run on mw1312 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:34:09] PROBLEM - puppet last run on db1094 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:34:28] PROBLEM - puppet last run on mw1282 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:36:11] paladox: whatever it was looks to be fixed already. Or it was transient. [22:36:19] thanks [22:36:28] andrewbogott it was most likly puppetdb [22:36:31] eyah its workinf for me [22:36:39] i just logged into one of the alerting mw systems and it ran puppet fine [22:36:57] !log halfak@tin Finished deploy [ores/deploy@c98ec8b]: (non-production) experimenting with stretch deploy T185901 (duration: 09m 59s) [22:37:09] RECOVERY - puppet last run on labvirt1012 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [22:37:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:37:13] T185901: Preliminary deployment of ORES to new cluster - https://phabricator.wikimedia.org/T185901 [22:37:29] i was worried since the last two origin/production merges were me ;] [22:38:11] i didnt do anything to fix though just ran puppet on a couple of hosts and had no issues. [22:39:28] RECOVERY - puppet last run on mw1282 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [22:41:39] RECOVERY - puppet last run on mw1238 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [22:41:59] !log halfak@tin Started deploy [ores/deploy@c98ec8b]: (non-production) experimenting with stretch deploy T185901 (trying again) [22:42:07] !log tgr@tin Synchronized php-1.31.0-wmf.20/includes/parser/ParserOutput.php: emergency fix for T186927 (duration: 00m 57s) [22:42:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:14] T185901: Preliminary deployment of ORES to new cluster - https://phabricator.wikimedia.org/T185901 [22:42:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:42:29] T186927: mw-parser-output divs leaking into mobileview output again - https://phabricator.wikimedia.org/T186927 [22:42:39] !log halfak@tin Finished deploy [ores/deploy@c98ec8b]: (non-production) experimenting with stretch deploy T185901 (trying again) (duration: 00m 40s) [22:42:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:43:56] !log tgr@tin Synchronized php-1.31.0-wmf.20/extensions/MobileFrontend/includes/api/ApiMobileView.php: emergency fix for T186927 (duration: 00m 55s) [22:44:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:45:15] !log tgr@tin Synchronized php-1.31.0-wmf.20/extensions/TextExtracts/includes/ApiQueryExtracts.php: emergency fix for T186927 (duration: 00m 55s) [22:45:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:47:12] !log halfak@tin Started deploy [ores/deploy@c98ec8b]: (non-production) experimenting with stretch deploy T185901 (trying again again) [22:47:16] !log halfak@tin Finished deploy [ores/deploy@c98ec8b]: (non-production) experimenting with stretch deploy T185901 (trying again again) (duration: 00m 03s) [22:47:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:47:29] T185901: Preliminary deployment of ORES to new cluster - https://phabricator.wikimedia.org/T185901 [22:47:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [22:49:59] PROBLEM - haproxy failover on dbproxy1005 is CRITICAL: CRITICAL check_failover servers up 2 down 1 [22:57:51] mariadb,db1009,0,0,0,0,,0,0,0,,0,,0,0,0,0,DOWN [22:58:08] RECOVERY - puppet last run on conf1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [22:58:48] RECOVERY - puppet last run on mw1312 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [22:59:09] RECOVERY - puppet last run on db1094 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:00:18] RECOVERY - puppet last run on hafnium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:00:18] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:00:48] RECOVERY - puppet last run on wtp1026 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:00:58] RECOVERY - puppet last run on db1072 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:01:18] RECOVERY - puppet last run on restbase1013 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [23:01:18] RECOVERY - puppet last run on conf1003 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:01:42] !log restart haproxy on dbproxy1005 [23:01:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:01:59] RECOVERY - haproxy failover on dbproxy1005 is OK: OK check_failover servers up 0 down 0 [23:02:28] RECOVERY - puppet last run on db1073 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [23:02:39] RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:02:58] RECOVERY - puppet last run on labvirt1013 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:03:28] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 4 minutes ago with 0 failures [23:08:40] (03PS1) 10Dzahn: otrs: apache -> httpd modules [puppet] - 10https://gerrit.wikimedia.org/r/409462 [23:09:10] (03CR) 10jerkins-bot: [V: 04-1] otrs: apache -> httpd modules [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [23:11:28] (03PS2) 10Dzahn: otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 [23:11:56] (03CR) 10jerkins-bot: [V: 04-1] otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [23:13:56] (03PS3) 10Dzahn: otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 [23:14:23] (03CR) 10jerkins-bot: [V: 04-1] otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 (owner: 10Dzahn) [23:15:34] (03PS4) 10Dzahn: otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 [23:17:28] paladox: you know how it shows trailing whitespace in red .. but not when using inline editor [23:17:44] mutante there's an inline edit preference [23:17:50] there's two preferences [23:17:52] diff and edit [23:18:19] mutante https://gerrit.wikimedia.org/r/#/settings/edit-preferences [23:18:33] oh, and each have their own theme setting! sweet [23:18:34] thanks [23:18:45] yep, your welcome :) [23:19:00] also the number of themes... [23:19:09] i think upstream is going to remove that as a pref and just switch it on by default [23:19:13] in polygerrit [23:19:20] there are so many (now?) [23:19:25] yep [23:19:27] mutante even more [23:19:34] in the recent codemirror updates [23:19:45] hah, it encourages wasting time by trying them all:) [23:19:53] mutante heh :) [23:19:59] in polygerrit it looks way nicer [23:20:04] when i tryed them [23:21:02] blackboard is nice, only problem is i dont see my cursor [23:21:33] heh [23:22:01] i like material [23:23:52] paraiso_dark isnt actually dark, it's just white background :.. ok.. anyways :) [23:23:59] heh [23:26:53] !log tgr@tin Synchronized php-1.31.0-wmf.20/extensions/TextExtracts/includes/ApiQueryExtracts.php: emergency fix for T186927 (now incldes actual code change!) (duration: 00m 55s) [23:27:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:27:09] T186927: mw-parser-output divs leaking into mobileview output again - https://phabricator.wikimedia.org/T186927 [23:28:08] !log tgr@tin Synchronized php-1.31.0-wmf.20/extensions/MobileFrontend/includes/api/ApiMobileView.php: emergency fix for T186927 (now incldes actual code change!) (duration: 00m 55s) [23:28:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [23:37:49] (03Draft1) 10Paladox: gerrit: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409468 [23:37:53] (03PS2) 10Paladox: gerrit: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409468 [23:38:27] (03CR) 10jerkins-bot: [V: 04-1] gerrit: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409468 (owner: 10Paladox) [23:40:11] (03PS3) 10Paladox: gerrit: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409468 [23:48:50] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/9922/cobalt.wikimedia.org/ thanks" [puppet] - 10https://gerrit.wikimedia.org/r/409468 (owner: 10Paladox) [23:49:29] paladox: click on the jenkins-bot result output, then search for the word "delta" in it [23:49:55] awww. bad example, hah [23:50:13] on all the other changes with the same topic [23:50:33] heh [23:50:34] 23:40:43 wmf-style: total violations delta 0 [23:50:54] (03CR) 10Paladox: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/409468 (owner: 10Paladox) [23:50:58] it's 0 because that one already wasn't "include apache" but was class ";;apache... [23:51:13] but see the same on any other change with the same topic you just set [23:52:14] (03PS5) 10Dzahn: otrs: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409462 [23:55:15] paladox: better example. that was for phab https://integration.wikimedia.org/ci/job/operations-puppet-tests-docker/14636/console [23:55:27] 02:45:50 wmf-style: total violations delta -6 [23:55:28] heh [23:55:32] see how it lists them below [23:55:38] yep [23:56:06] all the resolved ones. so yea. that's what we want. and hashar was tracking the grand total of that number [23:56:55] yep [23:57:08] (03CR) 10Dzahn: [C: 032] gerrit: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/409468 (owner: 10Paladox) [23:57:42] thanks :) [23:59:16] applied on gerrit2001. no problems. noop. [23:59:31] :)