[00:00:01] enterprisey: a Jenkins job will soon port it to beta, and errors will be fixed [00:00:05] twentyafterfour: Dear anthropoid, the time has come. Please deploy Phabricator update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160602T0000). [00:00:36] (03PS3) 10Dereckson: Set $wgSpamBlacklistEventLogging to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290370 (owner: 10Kaldari) [00:00:46] (03CR) 10Dereckson: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290370 (owner: 10Kaldari) [00:00:47] Dereckson: legoktm alright, I'll let that be then :D [00:00:59] I'd stop the SWAT, Dereckson [00:01:05] ori: okay [00:01:25] (03Merged) 10jenkins-bot: Set $wgSpamBlacklistEventLogging to true on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/290370 (owner: 10Kaldari) [00:01:49] greg-g: May I grab a deployment window to fix a CentralNotice issue? Readers with old browsers are getting ECMAScript errors [00:01:59] ori: green light to deploy 290370 or I revert it for further deployment in a next window? [00:02:03] we served >100,000 error responses, time to step back and analyze what went wrong [00:02:12] k [00:02:49] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:02:52] users noticed (per #wikimedia-tech) [00:02:57] SpamBlackList vs Spamblacklist [00:03:02] greg-g: I see that we're already in twentyafterfour's Phabricator window, donno if that's a blocker. [00:03:09] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is OK: OK - nfs-exports is active [00:03:16] !log started nfs-exports on labstore1001 [00:03:19] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:03:19] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:03:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:03:38] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:04:09] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:05:03] !log Deploy of cdff5e3 to RESTBase production complete [00:05:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:06:48] ori: so, was deployed during SWAT a change to use extension registration for SpamBlacklist, with a typo, it has been reverted [00:07:43] awight: that's not a blocker, but Dereckson / ori: are ya'll still deplying/investigating (I haven't read scrollback) [00:08:12] We found the issue, and it's reverted. Servers has recovered. [00:08:20] yeah, I am not suggesting that there is a deep mystery as to what caused the spike of 5XXs, I'm just suggesting that it's pretty bad that it happened and that it makes sense not to treat it as business as usual and to instead talk through how it slipped through the cracks and how we could automate it not happening again. [00:08:37] i'm just going to say that https://phabricator.wikimedia.org/T136387 is still not fixed, that it's ridiculous that the fix was not deployed when twentyafterfour said it was deployed a couple hours ago, and that it's ridiculous that nobody cares when i point this out [00:08:57] i'm going to go sleep and hope someone unfucks it before tomorrow :( [00:09:11] yes, that is ridiculous [00:09:20] let's get that deployed asap [00:09:25] are you sure it's not in production? [00:09:56] ori: well, i still experience the issue. Dereckson checked earlier and we apparently have the bad version of CentralNotice deployed. so yeah [00:09:58] ori: improve repo unit tests for extension-list probably [00:10:12] MatmaRex: We're paying attention and trying to set up a lightning deploy right now. [00:10:20] MatmaRex: thanks for all the attention you've given this! [00:10:22] it was supposedly fixed, so i didn't add it to SWAT [00:10:27] awight: yeah, i'm not blaming you [00:10:28] (and the actual wfLoadExtension @ CommonSettings) [00:10:28] awight: do it now [00:10:38] ori: will do [00:11:01] i'd blame someone but no one is apparently responsible [00:11:18] anyway. i'm really off to sleep. night [00:11:20] hehe. This is definitely my fault originally. [00:11:25] good night MatmaRex [00:11:52] I'm filling a bug with a unit test strategy. [00:11:54] Good night MatmaRex [00:12:12] Dereckson: thanks [00:12:22] (03PS1) 10Dzahn: add puppet code for endowment.wm.org site [puppet] - 10https://gerrit.wikimedia.org/r/292303 (https://phabricator.wikimedia.org/T136735) [00:13:11] MatmaRex? I deployed the patch [00:13:39] can someone recap if/what recent 5xx is? [00:14:08] PROBLEM - YARN NodeManager Node-State on analytics1039 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [00:14:50] awight: ETA? [00:15:02] ori: 10 minutes, if zuul cooperates [00:15:08] bblack: a typo in a include → wfLoadExtension migration [00:15:29] awight: no way, definitely 100% my fault originally originally [00:15:49] instead of falling on swords can someone actually describe what happened? [00:16:00] RECOVERY - YARN NodeManager Node-State on analytics1039 is OK: OK: YARN NodeManager analytics1039.eqiad.wmnet:8041 Node-State: RUNNING [00:16:05] not the bug, bugs happen all the time -- how it is that we thought we had a fix deployed but didn't [00:16:09] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There are 3 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [00:17:03] does anyone know? twentyafterfour, AndyRussG, awight? [00:17:29] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There are 2 unmerged changes in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [00:17:45] ori: I don't understand that part of it. My guess is that a submodule pointer was reverted rather than reverting on the submodule's deployment branch. [00:18:17] This make it possible for the bad code to go out with the train, then caused additional confusion when re-re-reverting later. [00:18:56] ^ [00:19:55] tatus-codes?from=1464823141450&to=1464826741451&var-site=All&var-cache_type=text&var-status_type=5&theme=dark [00:19:58] ugh [00:20:07] https://grafana.wikimedia.org/dashboard/db/varnish-aggregate-client-status-codes?from=1464823203096&to=1464826803098&var-site=All&var-cache_type=text&var-status_type=5 [00:20:25] ^ that 5xx spike is pretty ugly, I assume that's the one with the wfLoadExtension typo issue [00:20:36] yes [00:22:13] ori: I don't know what happened with twentyafterfour's re-reversion following the train, somehow were messed up following some SWATs [00:22:26] both of these incidents need postmortems, IMO [00:22:57] the idea being not to shame anyone but to figure out what went wrong so we can take measures to prevent it from happening again [00:23:06] I can write the post mortem for mine. [00:23:19] My new theory about the second failed revert was that "git submodule update" wasn't performed. Cos git status said that the CentralNotice submodule had new commits, and the HEAD in that directory was the wrong one. [00:23:43] thanks, Dereckson, that's outstanding of you. [00:25:25] the typo one is frustrating because the barest pre-deploy verification would have caught it, requesting any page on pretty much any wiki from a server that had the change would have revealed it [00:26:04] this is a long-standing design bug in our deployment infrastructure [00:26:09] but we don't have a process for doing that as part of the deploy [00:26:12] discussed many times before [00:26:39] !log awight@tin Synchronized php-1.28.0-wmf.4/extensions/CentralNotice: Fix for T136387 (duration: 00m 38s) [00:26:40] T136387: CentralNotice failing in older browsers due use of ECMAScript 6 syntax - https://phabricator.wikimedia.org/T136387 [00:26:46] since i'm actually not asleep yet… an hour is clearly not enough to deploy 8 swat patches [00:26:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:27:00] swat regularly run over. an additional verification step would make it even slower, so no one does it [00:27:01] (03PS1) 10Eevans: limit mmap'd disk access to indexes only on Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/292305 (https://phabricator.wikimedia.org/T126629) [00:27:12] it might be the case that our standards for how simple and reliable those 8 SWAT patches are has mutated, too [00:27:15] yes, that's what I wrote on https://phabricator.wikimedia.org/T136782#2347773: this kind of change should be deployed on mw1017 first [00:27:18] i don't know why it's not enough. maybe because CI sucks and takes really long to merge anything [00:27:33] I think the original intent is that SWAT patches were relatively straightforward? [00:27:33] or maybe because scap is slow and takes really long to dpeloy anything [00:27:44] the patches i see *are* straightforward [00:27:50] but they still take a long time [00:28:05] there are also more of them than there used to. maybe we're breaking more stuff because of the faster deployment train [00:28:14] the traffic jam is artificial, we should probably deploy small changes like that throughout the day [00:28:19] MatmaRex: if you read wikitech archives, full scap patches were excluded in the past [00:28:39] mostly due to time needed, yes [00:28:43] ori: MatmaRex: AndyRussG: The fix for T136387 is deployed now, I'm logging out of tin. Thanks, all! [00:28:43] T136387: CentralNotice failing in older browsers due use of ECMAScript 6 syntax - https://phabricator.wikimedia.org/T136387 [00:28:54] there's an incentive to call every change trivial and small [00:28:58] awight: did you verify it? [00:29:08] (thanks, btw) [00:29:13] ori: doing so now [00:29:16] ori: i did, UploadWizard works now [00:29:31] Dereckson: i don't even mean full-scap. but just syncing e.g. all of CentralNotice apparently takes minutes [00:29:33] MatmaRex: thank you! [00:29:35] if there were someone around that was willing to merge https://gerrit.wikimedia.org/r/#/c/292305/ for me, I could re-enable puppet on restbase1007.eqiad.wmnet (this is the only production node the change applies to, and it just enshrines what i locally hacked in already). [00:29:45] (judging by timestamps earlier here on IRC) [00:30:12] minutes is too fast, really [00:30:53] yes, we all want rolling deploys with various stages of traffic validation [00:31:40] bd808: but a sync to mw1017 and a check would have done the trick [00:31:41] awight: ori: MatmaRex: confirmed on mediawiki.org that I'm no longer getting the CentralNotice bug [00:31:42] so let's shift some resources to addressing the deployment problems and away from creating more deployment traffic? [00:31:43] all we need to do is find a time machine... [00:31:58] ori: who would deploy the changes throughout the day? [00:32:14] any of the 30+ people with deploy rights? [00:32:17] bd808: is there anything stopping us from making modest steps in that direction? [00:32:21] are any of them around? hardly ever [00:32:30] gwicke: resources? Priorites? [00:32:41] * bd808 gives MatmaRex shell [00:32:53] i don't have statistics, but it seems that Dereckson has been handling most of the SWAT deployments since he got the right to do them [00:32:53] I feel like I'm beating a dead horse, but as deploy rate ramps up the effectiveness of our process and QA efforts has to ramp up as well. Over the past X months, that has clearly fallen out of step and the QA side is behind for our rate. [00:33:10] bd808: hell no. i have enough responsibilities [00:33:17] arguably, we should give not-breaking-the-site-that-often a bit higher priority [00:33:45] We have no QA mostly. This is supposed to be the responsibility of each vertical and team per the 2015 reorg [00:33:58] so I've been holding off updating phabricator due to other issues. Should I just scrap plans for this deployment? [00:34:00] (or maybe, not until i can drop some of the stuff i'm working on day to day) [00:34:08] twentyafterfour: yes, IMO [00:34:12] It's quite some weeks, Tyler does the majority of the morning shift and I do half the evening ones. [00:34:25] But no, other does the evening shift too. [00:34:47] right, RoanKattouw also does them when he needs something deployed himself. [00:34:47] It doesn't necessarily mean we need to shift people's actual job roles our organization around. Developers can work devote time to working on the QA infrastructure and processes their own deploys move through too. [00:34:51] its not just 30 deployers, its 60 [00:35:02] if none of them are around we should clean up the admin class [00:35:04] if we have a lack of progress on that, and a feature train that's too fast, it seems logical to shift some dev-time from the latter to the former. [00:35:28] no argument from me [00:35:30] so let's freeze SWAT [00:35:52] until the QA side catches up [00:36:02] that's going to be painful and thus motivating [00:36:16] aversion therapy! [00:36:24] * ori zaps urandom [00:36:26] I think that would need some actionable plan of what needs to be improved and not just kvetching here [00:36:58] freezing swat is going to hold back fixes as much as new features, wouldn't it? ... freezing the train seems more appropriate [00:37:01] can we go back to two-week deploy cadence? [00:37:20] I imagine that when ori says "let's freeze swat" he's not suggesting he himself will stop updating prod code [00:37:24] We could require deployment to the beta cluster before any non-emergency deployment can go to production. [00:37:37] (03PS2) 10Yuvipanda: Stop using package=>latest [puppet] - 10https://gerrit.wikimedia.org/r/292093 (owner: 10Muehlenhoff) [00:37:46] awight: that's the train deploy [00:38:07] (03PS3) 10Yuvipanda: ircyall: Stop using package=>latest [puppet] - 10https://gerrit.wikimedia.org/r/292093 (owner: 10Muehlenhoff) [00:39:43] how hard would it be to deploy to a canary host first & monitor its response codes for a couple of minutes? [00:39:47] bd808: that's a pretty mean way to insinuate something, but if you like, I can commit [00:40:09] gwicke: it's doable today [00:40:32] I know it's doable, but how hard would it be to integrate it into the existing tooling? [00:40:35] that's part of the plan for scap3 [00:40:58] is this within reach for a short, concerted effort? [00:40:59] gwicke: it wouldn't be very hard [00:41:31] how do you see it twentyafterfour? it sync to mw1017, pops a prompt to ask to test and confirmation, if Y, sync everywhere? [00:42:22] Dereckson: well, that plus some graphite monitoring to help make the decision [00:43:27] Dereckson: gwicke: https://phabricator.wikimedia.org/T110068 [00:43:28] twentyafterfour: do we have a task documenting what's missing for the first iteration? [00:43:30] ori: RoanKattouw: we've some merged undeployed changes: https://phabricator.wikimedia.org/P3203 [00:43:32] ori: I really appreciated your help with the scap rewrite. You have certainly helped in the past. [00:44:22] Dereckson: What about my ones exactly? Did the full scap required for those not happen yet? Are you asking me to do that scap? [00:44:29] But when I see you, gwicke and bblack repeatedly say that QA and deploys are busted I don't see patches to go with that. [00:44:41] also T131120 [00:44:42] T131120: Use scap3's canary deploys for MediaWiki - https://phabricator.wikimedia.org/T131120 [00:44:45] RoanKattouw: that + git submodule [00:45:00] I don't think the releng team is sandbagging [00:45:03] I don't think any patches are needed, we integrate syncing to mw1017 and verifying the patch into our deployment process without changing any code [00:45:21] it could (and should) be made smoother, more automated, and less menial [00:45:36] I concur, efforts have been made to allow some debug in production / test with mw1017. We should use that. [00:45:53] the whole mediawiki deployment process is a mess. It's been getting better, slowly, but it's still too manual [00:45:58] having the ability to throw some production traffic would be nice, better automation / fewer steps would be nice, but neither is a blocker [00:46:16] bd808: as you know, we have been using rolling deploys for services for quite a while now [00:46:30] across 400 hosts? [00:46:31] gwicke: not sure if trolling [00:46:34] the rules for swat deploy already say " No new features/extensions" btw [00:46:41] Maybe some browser tests for different platforms that run automatically when mw1017 is updated? [00:46:50] mutante: neither change was a new feature or extension [00:47:01] There's lots of backscroll that I haven't quite read but it looks like part of the SWAT got caught up in an extension registration-related 5xx issue? [00:47:03] T104352 [00:47:04] T104352: Make scap able to depool/repool servers via the conftool API - https://phabricator.wikimedia.org/T104352 [00:47:14] RoanKattouw: yes [00:47:45] RoanKattouw: a change with a typo was deployed, we served ~100 - 150k error pages, change was reverted ; in an unrelated incident, a critical bugfix was believed to be deployed but was not in fact deployed ; I suggested taking a step back and stopping the SWAT [00:47:59] OK. Is there anything I should/can do to help get it unstuck, or should I just wait for stuff to take its course? [00:48:35] Just having you weigh in would be useful [00:48:35] Most of my SWAT changes are not very high prio, but one of them is (code using the wrong DB and getting "table does not exist" errors, will go to Wikipedias tomorrow unless fixed by then) [00:48:53] yeah, that should obviously go out, then [00:49:31] bd808: I know that you have been concerned about the overall length of a single deploy, but I'm not sure if there was actually ever data showing that doing this across 400 hosts would take too long [00:50:19] "doing this"? If you mean moving MW to deploy via scap3 then I know that the releng team is working on it [00:50:21] Let's finish SWAT but verify on mw1017? [00:50:29] Okay. [00:50:49] Dereckson: are you up for that, or would you like someone else to take over? [00:50:51] So, first, there is still two config things to do: https://phabricator.wikimedia.org/P3203 [00:50:56] gwicke: if you mean dong ancible deploys then I haven't seen the code that would move MW to that [00:51:14] Yes, I'm up for that. [00:51:20] we want to move mediawiki to scap3 soon but it's not a small change. And all these services are in line ahead of mediawiki: https://phabricator.wikimedia.org/project/view/1824/ [00:51:55] Dereckson: thanks a lot [00:52:01] (03CR) 10Dzahn: [C: 032] add puppet code for endowment.wm.org site [puppet] - 10https://gerrit.wikimedia.org/r/292303 (https://phabricator.wikimedia.org/T136735) (owner: 10Dzahn) [00:53:12] bd808: a decision was made a while ago to write scap3 instead [00:53:37] anyway, I feel this is just rehashing stuff & not constructive [00:53:59] So I've scap pull on mw1017 the 293b5c34 - Revert "Test PageAssessments extension on Labs" (no-op in prod), and all is fine. [00:53:59] RECOVERY - Mobileapps LVS codfw on mobileapps.svc.codfw.wmnet is OK: All endpoints are healthy [00:54:13] I'm more interested in figuring out a way to get a basic canary process going [00:54:19] RECOVERY - mobileapps endpoints health on scb2001 is OK: All endpoints are healthy [00:54:44] Dereckson: (no need to wait for an ack if you feel good about it) [00:55:00] mdholloway: gwicke ^^ i only reverted scb2001 so far, and now it's healthy again. So it was something in the deploy earlier [00:55:05] !log dereckson@tin Synchronized wmf-config/CommonSettings-labs.php: Revert "Test PageAssessments extension on Labs" (no-op) (duration: 00m 23s) [00:55:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:55:13] gwicke: something basic, based on existing deployment codepaths could be done pretty easily [00:55:17] bearND: good, thank you! [00:55:29] without switching mediawiki to scap3 [00:55:45] !log dereckson@tin Synchronized wmf-config/InitialiseSettings-labs.php: Revert "Test PageAssessments extension on Labs" (no-op) (duration: 00m 22s) [00:55:48] RECOVERY - mobileapps endpoints health on scb2002 is OK: All endpoints are healthy [00:55:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:56:06] but scap3 is going to have much better monitoring as well as automatic rollback, so that's a more robust setup [00:56:09] kaldari: still there? [00:56:17] Dereckson: yep [00:56:31] Okay, let's do the Set $wgSpamBlacklistEventLogging to true on testwiki. [00:56:38] thanks [00:57:36] kaldari: please test on mw1017 (which is the test. host) [00:57:38] twentyafterfour: yes, but I imagine bits of the traffic checks would be helpful to test anyway [00:57:39] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [00:58:49] twentyafterfour: lets use https://phabricator.wikimedia.org/T110068 to figure out what it would minimally take? [00:58:55] scap 3 is the same codebase as scap 2... and they haven't diverged much really, it's just a different transport (git instead of rsync) and more atomic [00:59:13] gwicke: ok [00:59:28] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [00:59:57] had to restart the services manually again :( [01:00:03] Dereckson: doesn't seem to have had an effect on test.wiki yet. [01:00:10] scap didn't do it [01:00:23] I've actually been wanting to work on that one for a while, just have other priorities (logstash or graphite monitoring for canaries sounds like a fun project and it's not going to be that difficult. [01:00:40] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [01:00:43] !log mobileapps reverted to 8d6d648c943074b7d3999baf31d60ad99249cd51 [01:00:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:01:38] is mw1017 seeing normal traffic? [01:02:22] no [01:02:44] kk, thanks [01:02:47] hence 17:45 having the ability to throw some production traffic would be nice [01:02:50] Tests can force mw1017 with https://chrome.google.com/webstore/detail/wikimediadebug/binmakecefompkjggiklgjenddjoifbb, which injects a X-Wikimedia-Debug header on Chrome. [01:03:04] Testers [01:03:30] but in this particular case the code path with the bug would have been exercised by just about any web request [01:03:46] kaldari: well your change is live on mw1017: grep wgSpamBlacklistEventLogging wmf-config/CommonSettings.php gives me $wgSpamBlacklistEventLogging = $wmgSpamBlacklistEventLogging; [01:04:06] RECOVERY - Mobileapps LVS eqiad on mobileapps.svc.eqiad.wmnet is OK: All endpoints are healthy [01:05:18] mwrepl [01:05:18] echo $wgSpamBlacklistEventLogging [01:05:18] 1 [01:06:02] Dereckson: mw.config.values.wgSpamBlacklistEventLogging from the console gives me undefined (on https://test.wikipedia.org/). Doesn't seem to be causing any errors though. [01:06:37] Dereckson: Since it's a client-side change I wonder if it's affected by RL caching [01:06:46] I'm going to wait 5 minutes and see [01:06:52] okay [01:09:31] thcipriani|afk: https://phabricator.wikimedia.org/T110068#2347859 [01:10:42] 06Operations, 10MediaWiki-Cache, 10MediaWiki-JobQueue, 10MediaWiki-JobRunner, and 3 others: Investigate massive increase in htmlCacheUpdate jobs in Dec/Jan - https://phabricator.wikimedia.org/T124418#2347865 (10ori) We are definitely sending duplicate purges. I added some debug logging and saw URLs get pur... [01:11:01] (03CR) 10Bmansurov: [C: 04-1] Enable Hovercards for huwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [01:11:42] I'm logging off. Dereckson, you are really awesome -- thank you. [01:11:58] RoanKattouw: your changes are live on mw1017 too if you wish to test them [01:12:07] Will do [01:12:17] Bye ori. Thanks for feedback and support during this incident. [01:13:19] Dereckson: Hmm, are all my changes there? [01:13:35] Dereckson: Oh, you know this global isn't even output to the client-side, so I guess I don't actually have a way of testing it at the moment :P [01:13:57] I see the DB error is fixed, but the other Special:Notifications bug fixes don't seem to be there [01:14:07] RoanKattouw: I've done a md5sum on Echo includes/special/NotificationPager.php that matches [01:14:17] Yup that one works [01:14:28] I just realized the ones I'm not seeing are in JS [01:14:52] caching... [01:14:58] Yeah maybe [01:15:20] I'd use incognito to test except then I can't use mw1017 [01:15:36] But I didn't care too much about that one. I'll test the Flow one now [01:16:29] (03PS1) 10Papaul: DNS: Add prod DNS for mw2215-mw2238 Bug:T135466 [dns] - 10https://gerrit.wikimedia.org/r/292307 (https://phabricator.wikimedia.org/T135466) [01:16:50] Dereckson: it's not causing any problems though, should be fine to scap [01:17:01] or sych [01:17:07] er snyc [01:17:24] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/292270/ hasn't been taken in consideration by Zuul [01:17:32] RECOVERY - MariaDB Slave Lag: s1 on db1053 is OK: OK slave_sql_lag Replication lag: 0.15 seconds [01:18:03] Oh well that explain [01:18:03] s [01:18:10] RoanKattouw: so 292288 — Fix notification pager is tested fine? [01:18:17] It's because https://gerrit.wikimedia.org/r/#/c/292265/1 was never +2ed [01:18:19] Yes that one is fine [01:18:21] The Flow one too [01:18:36] !log restart mysql on labsdb1001 [01:18:36] And I saw buggy behavior but it turns out that was a known bug [01:18:42] So I'm all good, except for the ones that didn't get merge [01:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:18:48] Which I'm happy to put off till the next SWAT anyway [01:18:53] *get merged [01:19:13] k [01:19:32] !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Echo/includes/special/NotificationPager.php: Fix notification pager (T136759) (duration: 00m 25s) [01:19:33] T136759: hewiki.echo_notification does not exist - https://phabricator.wikimedia.org/T136759 [01:19:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:20:47] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [01:21:24] !log dereckson@tin Synchronized php-1.28.0-wmf.4/extensions/Flow/handlebars/: HACK: Hide reply form for locked topics (T135848) (duration: 00m 24s) [01:21:25] T135848: [betalabs] Reply text area is present on Resolved topics but replies cannot be saved - https://phabricator.wikimedia.org/T135848 [01:21:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:21:55] RoanKattouw: okay so Flow 292264 and Echo 292288 are in prod, all works? [01:22:12] RoanKattouw: Is there any way to test for global variables that are just set on the server-side these days? (Besides testing for their functionality), like a Special:GlobalVariables page? [01:22:37] kaldari: mwscript eval.php enwiki ; var_dump($wgBlah); [01:22:47] (03PS2) 10Papaul: DNS: Add prod DNS for mw2215-mw2238 Bug:T135466 [dns] - 10https://gerrit.wikimedia.org/r/292307 (https://phabricator.wikimedia.org/T135466) [01:22:55] Dereckson: Checking [01:23:37] !log reboot labsdb1001 [01:23:39] Dereckson: Yup, looking good [01:23:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:23:48] I'll move the ones that didn't get deployed to the next SWAT [01:24:27] okay [01:25:01] !log dereckson@tin Synchronized wmf-config/InitialiseSettings.php: Set $wgSpamBlacklistEventLogging to true on testwiki (duration: 00m 23s) [01:25:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:25:39] !log dereckson@tin Synchronized wmf-config/CommonSettings.php: Set $wgSpamBlacklistEventLogging to true on testwiki (duration: 00m 22s) [01:25:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:25:57] kaldari: your change is live on the cluster [01:26:01] So we're done. [01:26:32] Dereckson: thanks and I successfully tested it using Roan's suggestions and it looks good. [01:26:46] PROBLEM - Host labsdb1001 is DOWN: PING CRITICAL - Packet loss = 100% [01:26:55] Good, thanks for testing. And sorry for the delay around your patches. [01:27:26] Dereckson: thanks again! no worries about the delays. Deployments are crazy unpredictable things :) [01:28:55] 06Operations, 10ops-codfw: rack/setup/deploy new codfw mw app servers - https://phabricator.wikimedia.org/T135466#2347915 (10Papaul) [01:29:36] PROBLEM - Ensure mysql credential creation for tools users is running on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit create-dbusers is failed [01:30:15] RECOVERY - Host labsdb1001 is UP: PING OK - Packet loss = 0%, RTA = 1.52 ms [01:31:58] !log service mysql start on labsdb1001 [01:32:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:33:39] ori, gwicke: Is there a Phabricator task about having a pre-deployment/staging/canary host? [01:34:14] Debra: perhaps https://phabricator.wikimedia.org/T110068#2347859 [01:36:11] Thanks. [01:37:14] !log labsdb1001 /etc/init.d/mysql start [01:37:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:40:20] bd808: is it possible to query logstash from within the cluster without http authentication? [01:45:07] gwicke: Not at the moment, no. We blocked it with ferm rules on the backing elasticsearch cluster [01:45:45] gwicke: but that could be opened up I think. we were just being cautious when we set it up [01:46:01] makes sense [01:46:30] the normal auth is cookie-based, or is there still a way to use http auth? [01:47:01] the reverse-proxy thing we have in front of it is http basic auth with ldap creds [01:47:34] ah, nice - didn't realize that it's still http auth [01:47:48] we have a very very old kibana [01:48:04] that would be fairly easy to use from a script as well, if we set up an account [01:48:19] RECOVERY - Ensure mysql credential creation for tools users is running on labstore1001 is OK: OK - create-dbusers is active [01:48:33] yeah. it would be easy to just open it up from some internal hosts too [01:48:44] like from terbium and tin [01:48:59] and fluorine [01:49:29] when I want to mess about with direct queries I just ssh into logstash1001 [01:49:42] port 9200 is open on localhost there [01:49:58] (and all of the logstash100x hosts) [01:50:34] bd808: I started to summarize this info at https://phabricator.wikimedia.org/T110068#2347933 [01:50:40] ACKNOWLEDGEMENT - puppet last run on bromine is CRITICAL: CRITICAL: puppet fail daniel_zahn broke it with installserver changes and will fix it [01:51:17] scb1001 and scb2001 say "ores" Connection refused since about 9h [01:52:09] !log mw1136 service hhvm restart [01:52:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:52:26] Hello [01:52:32] !log scb1001/2001 ores - connection refused [01:52:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:52:50] I think part of the problem is that deployments outside of SWAT are now frowned upon, so people stuff everything into SWAT [01:52:55] and that's just a bad idea [01:53:03] bd808: please correct anything I got wrong in the transcription; thanks! [01:53:09] RECOVERY - HHVM rendering on mw1136 is OK: HTTP OK: HTTP/1.1 200 OK - 66169 bytes in 0.395 second response time [01:53:17] I think people deploying throughougt the day and then supervising it is probably better [01:53:46] Instead of trying to stuff it all in an hour [01:53:59] RECOVERY - Apache HTTP on mw1136 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 626 bytes in 0.040 second response time [01:54:01] 06Operations, 10ops-codfw: rack/setup/deploy new codfw mw app servers - https://phabricator.wikimedia.org/T135466#2347951 (10Papaul) @Joe I checked the DHCP file it looks like all the other mw app's servers are using Trusty. For the new mw app's server do you want me to install Trusty or switch to Jessie? Tha... [01:54:05] with deployers (no fault of theirs) who don't fully understand the patches [01:54:26] 06Operations, 10Traffic, 06Community-Liaisons (Apr-Jun-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2347953 (10BBlack) @Whatamidoing - would you like recaps/updates of account lists here in this ticket? [01:55:29] Dereckson: can you ping me with a link to the typo extension registration incident report once you create it? [01:55:34] duploktm: do you think that they are actively frowned upon or just that it's easier to add something to the wiki page than to actually ssh in and do a deploy? [01:56:15] I haven't been paying a lot of attention honestly. If I need to deploy something I'll just do it but that doesn't happen vary often these days [01:57:22] bd808: Maybe not formally frowned upon, but the general feeling I've been getting is that deploys should always be on the calendar, and ad-hoc random deploys are not encouraged [01:57:42] ah. well that has been try for quite a while yes [01:57:45] Also, ssh to tin is way easier than login to wikitech and edit that template/lua module mess of a page :P [01:57:52] adding to the wiki page is easier than deploying, but just slightly, we could add another layer of templates [01:58:00] LOL [01:58:36] now, if those templates actually did the deploy... [01:59:28] SpamBlackList vs Spamblacklist [01:59:31] more seriously, I seem to vaguely recall that a part of the motivation for introducing swat windows also had to do with the complexity of managing ad-hoc security patches, branches etc [01:59:42] You mean SpamBlackList v. SpamBlacklist, I think? [01:59:44] duploktm: so swat or regular deployment, both would have to be on calendars [02:00:42] gwicke: I think that complexity has gone down lately, deploying MW code is significantly more straightforward than it used to be IMO [02:01:10] mutante: right, but I'm thinking more of small bugfix things that would previously be done by Reedy right away or something now get delayed until a SWAT window, often UBNs waiting overnight [02:01:26] Debra: yeah, that's what I meant [02:01:32] (still in class -.-) [02:02:26] duploktm: yea, agree, WMF should re-hire Reedy ,not kidding [02:02:53] +1 [02:04:16] (03CR) 10MZMcBride: "Related: ." (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281239 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [02:04:34] mutante: there was a email that went around awhile ago, but apparently none of the departments stepped up and said they had work for reedy [02:04:56] duploktm: I haven't done full MW deploys in a while, so can't vouch for it from personal experience -- but that's a good factor to consider while re-evaluating the swat window strategy [02:06:05] p858snake: yea, that was a mistake i think [02:06:18] gwicke: Could Beta Labs be used to test changes before going to production? [02:06:34] Some of them [02:06:44] Currently it's set up so beta usually receives config changes after production does [02:06:55] Seriously? [02:07:16] one of the requirements for alternative solutions is to still provide sufficient coordination & documentation [02:07:55] Alternative solutions to what? [02:07:59] Breaking the site? [02:08:03] to swat deploys [02:08:26] or, more generally, fixed-size deployment windows [02:08:28] I mean, putting them on Beta Labs first seems reasonable. I thought that was the point. [02:08:38] I don't think time of day or the SWAT process was really to blame here. [02:09:21] yeah, it doesn't seem to be the main issue [02:10:36] testing in labs is done manually, which means that it is not done for each patch; another factor is that it's sufficiently different to introduce the chance of something still breaking in prod [02:11:08] so, imho it can't replace a prod canary step that's baked into the deploy process [02:11:31] I thought Beta Labs was set up for this. Maybe I'm misunderstanding. [02:12:05] I've seen some ad-hoc processes (at least) around testing mediawiki-config on beta, and/or testing apache config changes on beta [02:12:21] there is a lot of testing in beta labs, but it's not fully automated & the environment is not exactly the same as in prod [02:12:34] but I think the bulk of beta utilization is more for much earlier testing while working on feature patches, not so much as a final pre-deploy verification [02:12:41] at least, that's my understanding from what I observe in practice [02:12:42] It would be close enough to catch L v. l, though? [02:13:21] Yes, testing in beta would have caught this issue [02:13:34] I suspect it would catch a majority of issues, not all though. [02:13:37] bblack: Seems silly to set up a bunch of wikis and then not use them for this. If there's another check on mw1017 later, even better. [02:13:57] I'm not saying it's good, just noting what I suspect is the case in practice today :) [02:14:38] (03PS1) 10Yuvipanda: tools: Do not use nginx proxy for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/292309 (https://phabricator.wikimedia.org/T136775) [02:14:53] (03PS2) 10Yuvipanda: tools: Do not use nginx proxy for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/292309 (https://phabricator.wikimedia.org/T136775) [02:15:15] getting beta to be a much closer functional mirror of production that's useful for full-stack pre-validation of just about anything... is a big thing that's been looked at before, and it's not easy. the beta we have today isn't it. [02:15:38] but things like syntax errors and typos that crash out the codebase, today's beta can definitely catch. [02:15:44] (03PS3) 10Yuvipanda: tools: Do not use nginx proxy for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/292309 (https://phabricator.wikimedia.org/T136775) [02:16:16] (03PS4) 10Yuvipanda: tools: Do not use nginx proxy for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/292309 (https://phabricator.wikimedia.org/T136775) [02:16:25] (if we have a process around using it that way rigorously. assuming that's what we want the process to be. there might be other/better ways to get to the same goal) [02:17:06] another thing we have done is to use staging hosts in production, which have the same production config, but aren't necessarily receiving client traffic [02:17:37] mw1017 is an example of that for mw core, another is a staging cluster for cassandra and restbase [02:17:41] Isn't that what the X-Wikimedia-Debug hosts are? [02:17:43] (03CR) 10Yuvipanda: [C: 032] tools: Do not use nginx proxy for toolschecker [puppet] - 10https://gerrit.wikimedia.org/r/292309 (https://phabricator.wikimedia.org/T136775) (owner: 10Yuvipanda) [02:18:00] Debra: yup, for mw core [02:18:37] Sounds like a short bash script. :-) [02:18:58] famous last words ;) [02:19:10] but seriously, it might not be that hard [02:20:37] worst case, we'd get a better understanding of what it takes after spending a couple of hours on it [02:21:29] but, tomorrow [02:22:24] (for me at least) [02:24:56] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.3) (duration: 10m 06s) [02:25:05] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:26:45] (03PS1) 10Yuvipanda: Revert "tools: Do not use nginx proxy for toolschecker" [puppet] - 10https://gerrit.wikimedia.org/r/292310 [02:26:58] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "tools: Do not use nginx proxy for toolschecker" [puppet] - 10https://gerrit.wikimedia.org/r/292310 (owner: 10Yuvipanda) [02:31:50] duploktm: sure, I'll ping you with the link [02:32:24] (03PS1) 10Yuvipanda: Revert "icinga: Make the tools checks not page temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/292311 (https://phabricator.wikimedia.org/T136775) [02:36:57] 06Operations, 13Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#2348007 (10Krenair) >>! In T136735#2347689, @Dzahn wrote: >>>! In T136735#2347455, @Krenair wrote: >> Do we really want to allow more microsites? > > Personally i think it would be better to use wik... [02:58:04] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.4) (duration: 15m 37s) [02:58:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:03:06] (03PS1) 10Dereckson: Use extension registration for SpamBlacklist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292312 (https://phabricator.wikimedia.org/T119117) [03:04:45] !log l10nupdate@tin ResourceLoader cache refresh completed at Thu Jun 2 03:04:44 UTC 2016 (duration 6m 40s) [03:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:05:32] (03CR) 10Dereckson: "Follow-up: I6926f772ef3544cda533cd536eedc64f2a6d2758" (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/281239 (https://phabricator.wikimedia.org/T119117) (owner: 10Dereckson) [03:14:40] PROBLEM - restbase endpoints health on praseodymium is CRITICAL: /page/summary/{title} (Get summary from storage) is CRITICAL: Test Get summary from storage returned the unexpected status 400 (expecting: 200) [03:16:39] RECOVERY - restbase endpoints health on praseodymium is OK: All endpoints are healthy [03:36:41] 06Operations, 13Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#2348027 (10MZMcBride) I think T599#11308 is relevant here. We should at least keep a list of these micro-sites somewhere so that we can thoroughly kill them all at a later date. From browsing around... [05:03:52] PROBLEM - Disk space on ms-be2012 is CRITICAL: DISK CRITICAL - free space: / 2106 MB (3% inode=96%) [05:06:33] 06Operations, 13Patch-For-Review: create endowment.wm.org microsite - https://phabricator.wikimedia.org/T136735#2348077 (10Dzahn) @MZMcBride In March i made an effort in that direction that also fixes puppet-lint issues at the same time, moving the microsites into a the same role structure https://gerrit.wiki... [05:24:28] 06Operations: fix puppet run on bromine - https://phabricator.wikimedia.org/T136793#2348082 (10Dzahn) [05:26:44] 06Operations: fix puppet run on bromine - https://phabricator.wikimedia.org/T136793#2348096 (10Dzahn) [05:26:46] 06Operations, 13Patch-For-Review: Split carbon's install/mirror roles, provision install1001 - https://phabricator.wikimedia.org/T132757#2348095 (10Dzahn) [06:20:01] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [06:21:39] (03CR) 10Mobrovac: [C: 031] Scap3 config for Kartotherian [puppet] - 10https://gerrit.wikimedia.org/r/291930 (https://phabricator.wikimedia.org/T129150) (owner: 10Thcipriani) [06:21:42] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5850190 keys - replication_delay is 0 [06:22:34] (03CR) 10Mobrovac: [C: 031] Scap3 config for tilerator [puppet] - 10https://gerrit.wikimedia.org/r/291268 (https://phabricator.wikimedia.org/T129146) (owner: 10Thcipriani) [06:31:22] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:22] PROBLEM - puppet last run on kafka1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on mc2015 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:31] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:52] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:53] PROBLEM - puppet last run on lvs1003 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:11] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:12] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:13] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Puppet has 1 failures [06:32:32] PROBLEM - puppet last run on mw1158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:02] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:41] PROBLEM - puppet last run on cp3036 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:42] PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Puppet has 1 failures [06:42:23] 06Operations, 10DBA, 06Labs: disk failure on labsdb1002 - https://phabricator.wikimedia.org/T126946#2348194 (10jcrespo) Not necessarily, it is probably a less wide table and will not have so many metadata issues as revision. [06:56:41] RECOVERY - puppet last run on kafka1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [06:56:42] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [06:56:53] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:56:54] (03PS1) 10Alexandros Kosiaris: Actually use the correct role for ores redis_password [labs/private] - 10https://gerrit.wikimedia.org/r/292317 [06:57:02] RECOVERY - puppet last run on lvs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:22] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 31 seconds ago with 0 failures [06:57:31] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:32] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:51] RECOVERY - puppet last run on mw1158 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [06:58:03] RECOVERY - puppet last run on ms-be2021 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures [06:58:23] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:41] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:42] RECOVERY - puppet last run on mc2015 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:01] RECOVERY - puppet last run on cp3036 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:59:02] (03PS2) 10Alexandros Kosiaris: Actually use the correct role for ores redis_password [labs/private] - 10https://gerrit.wikimedia.org/r/292317 [06:59:19] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Actually use the correct role for ores redis_password [labs/private] - 10https://gerrit.wikimedia.org/r/292317 (owner: 10Alexandros Kosiaris) [07:02:26] !log performing schema change for db1057 [07:02:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:13:16] 06Operations, 10DBA, 13Patch-For-Review: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398#2348216 (10jcrespo) @Volans, while I appreciate the patch, I cannot use it as-is, as many machines have not yet been properly installed due to network/config... [07:14:33] (03PS1) 10Muehlenhoff: Add pollux/openldap to the backup [puppet] - 10https://gerrit.wikimedia.org/r/292318 [07:16:17] (03CR) 10Alexandros Kosiaris: [C: 04-1] Add pollux/openldap to the backup (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/292318 (owner: 10Muehlenhoff) [07:19:45] (03PS2) 10Muehlenhoff: Add pollux/openldap to the backup [puppet] - 10https://gerrit.wikimedia.org/r/292318 [07:23:37] !log rebooting etherpad1001 (hosting etherpad.wikimedia.org) for upgrade to Linux 4.4 [07:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:28:23] (03CR) 10Alexandros Kosiaris: [C: 031] Add pollux/openldap to the backup [puppet] - 10https://gerrit.wikimedia.org/r/292318 (owner: 10Muehlenhoff) [07:32:53] (03PS4) 10Alexandros Kosiaris: ores: Allow specifying redis_password [puppet] - 10https://gerrit.wikimedia.org/r/292268 [07:33:25] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ores: Allow specifying redis_password [puppet] - 10https://gerrit.wikimedia.org/r/292268 (owner: 10Alexandros Kosiaris) [07:47:41] 06Operations, 10Traffic, 10Continuous-Integration-Infrastructure (phase-out-gallium): Move gallium to an internal host? - https://phabricator.wikimedia.org/T133150#2348241 (10hashar) I have drawn a summary of web services that ends up on gallium. One is on doc.wikimedia.org the three others are on integrati... [07:52:04] (03PS1) 10Mobrovac: VRS: Use RESTBase on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292319 (https://phabricator.wikimedia.org/T88016) [08:09:12] !log restbase disabling puppet in production for testing https://gerrit.wikimedia.org/r/#/c/292109/ in staging [08:09:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:17:01] (03CR) 10Alexandros Kosiaris: [C: 04-1] "If this is just temporary I am fine with it. Minor comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) (owner: 10Yuvipanda) [08:21:32] (03PS1) 10Muehlenhoff: Update to 4.4.12 [debs/linux44] - 10https://gerrit.wikimedia.org/r/292321 [08:24:17] (03PS2) 10Yuvipanda: service: Allow not requiring scap3 for service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) [08:24:29] akosiaris: ^ updated I think [08:25:13] (03PS5) 10Jcrespo: Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) [08:25:23] (03CR) 10jenkins-bot: [V: 04-1] service: Allow not requiring scap3 for service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) (owner: 10Yuvipanda) [08:26:23] (03CR) 10jenkins-bot: [V: 04-1] Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) [08:26:25] (03PS3) 10Yuvipanda: service: Allow not requiring scap3 for service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) [08:27:20] (03CR) 10Jcrespo: "The last patch assesses the name concerns, but still blocked on you, Alex, because you mentioned potential changes on the actual sudo comm" [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) [08:27:28] 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2348313 (10Gilles) [08:27:43] (03CR) 10jenkins-bot: [V: 04-1] service: Allow not requiring scap3 for service::uwsgi [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) (owner: 10Yuvipanda) [08:29:40] !log restbase deploy start of 19f25925 [08:29:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:30:04] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: connect usb external disk to labmon1001 - https://phabricator.wikimedia.org/T136242#2348321 (10yuvipanda) This actually sped up a lot over the evening, and finished now! \o/ So we're good I think. [08:31:21] (03PS1) 10Jcrespo: Make puppet-lint happy for ores module [puppet] - 10https://gerrit.wikimedia.org/r/292322 [08:36:51] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "Merging with the hope that the comment about this parameter being removed happening soon." [puppet] - 10https://gerrit.wikimedia.org/r/291967 (https://phabricator.wikimedia.org/T136488) (owner: 10Yuvipanda) [08:37:57] 07Puppet, 10ORES, 06Revision-Scoring-As-A-Service, 13Patch-For-Review: ORES-staging is broken due to service::uwsgi mandatory scap::target invoke - https://phabricator.wikimedia.org/T136488#2336803 (10akosiaris) Changed merged, with the hope that the parameter will be removed soon per the comment. I 've a... [08:40:38] !log restbase deploy end of 19f25925 [08:40:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:41:05] (03PS1) 10Alexandros Kosiaris: ores: Remove debug notify [puppet] - 10https://gerrit.wikimedia.org/r/292324 [08:41:29] (03CR) 10Mobrovac: [C: 031] Enable RESTBase on office wiki [puppet] - 10https://gerrit.wikimedia.org/r/292109 (https://phabricator.wikimedia.org/T88016) (owner: 10Ppchelko) [08:43:09] (03CR) 10Alexandros Kosiaris: [C: 032] Enable RESTBase on office wiki [puppet] - 10https://gerrit.wikimedia.org/r/292109 (https://phabricator.wikimedia.org/T88016) (owner: 10Ppchelko) [08:43:15] (03PS3) 10Alexandros Kosiaris: Enable RESTBase on office wiki [puppet] - 10https://gerrit.wikimedia.org/r/292109 (https://phabricator.wikimedia.org/T88016) (owner: 10Ppchelko) [08:43:19] (03CR) 10Alexandros Kosiaris: [V: 032] Enable RESTBase on office wiki [puppet] - 10https://gerrit.wikimedia.org/r/292109 (https://phabricator.wikimedia.org/T88016) (owner: 10Ppchelko) [08:44:21] 06Operations, 10Ops-Access-Requests, 06WMF-NDA-Requests: NDA-Request Jan Dittrich - https://phabricator.wikimedia.org/T136560#2348376 (10JanZerebecki) [08:50:48] (03CR) 10Muehlenhoff: [C: 032 V: 032] Update to 4.4.12 [debs/linux44] - 10https://gerrit.wikimedia.org/r/292321 (owner: 10Muehlenhoff) [08:53:12] (03PS1) 10Muehlenhoff: Amend changelog with older fixes [debs/linux44] - 10https://gerrit.wikimedia.org/r/292326 [08:53:33] (03CR) 10Muehlenhoff: [C: 032 V: 032] Amend changelog with older fixes [debs/linux44] - 10https://gerrit.wikimedia.org/r/292326 (owner: 10Muehlenhoff) [09:00:06] 06Operations, 10Ops-Access-Requests, 06WMF-NDA-Requests: NDA request for @WMDE-leszek - https://phabricator.wikimedia.org/T133145#2348403 (10WMDE-leszek) [09:05:09] 06Operations, 10Ops-Access-Requests, 10LDAP-Access-Requests, 06WMF-NDA-Requests: NDA request for @thiemowmde - https://phabricator.wikimedia.org/T135994#2348430 (10JanZerebecki) [09:10:30] (03PS4) 10Elukey: Extend the %{format}t timestamp formatter with (begin|end): prefixes [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/292172 (https://phabricator.wikimedia.org/T136314) [09:12:24] (03PS5) 10Elukey: Extend the %{format}t timestamp formatter with (begin|end): prefixes [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/292172 (https://phabricator.wikimedia.org/T136314) [09:19:34] (03CR) 10Elukey: "Couple of things worth to mention:" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/292172 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey) [09:26:55] (03PS2) 10Muehlenhoff: Ferm rules for puppetmaster frontend [puppet] - 10https://gerrit.wikimedia.org/r/283174 [09:28:11] (03Abandoned) 10Mobrovac: VRS: Use RESTBase on officewiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292319 (https://phabricator.wikimedia.org/T88016) (owner: 10Mobrovac) [09:28:26] (03CR) 10jenkins-bot: [V: 04-1] Ferm rules for puppetmaster frontend [puppet] - 10https://gerrit.wikimedia.org/r/283174 (owner: 10Muehlenhoff) [09:29:18] PROBLEM - https://phabricator.wikimedia.org on iridium is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - string Wikimedia and MediaWiki not found on https://phabricator.wikimedia.org:443https://phabricator.wikimedia.org/ - 13701 bytes in 0.177 second response time [09:30:06] works for me [09:34:20] oh, something is wrong with the check, I think [09:36:06] (03PS1) 10Mobrovac: Revert "Enable RESTBase on office wiki" [puppet] - 10https://gerrit.wikimedia.org/r/292328 [09:37:14] (03CR) 10jenkins-bot: [V: 04-1] Revert "Enable RESTBase on office wiki" [puppet] - 10https://gerrit.wikimedia.org/r/292328 (owner: 10Mobrovac) [09:37:21] what? [09:44:26] mobrovac: that's fallout of https://gerrit.wikimedia.org/r/#/c/292268/ , I already pinged Alex on IRC [09:44:53] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Revert "Enable RESTBase on office wiki" [puppet] - 10https://gerrit.wikimedia.org/r/292328 (owner: 10Mobrovac) [09:46:12] I have a patch for that [09:46:27] https://gerrit.wikimedia.org/r/292322 [09:46:49] but do not want to merge while someones else is working on the same thing [09:46:53] (03PS2) 10Alexandros Kosiaris: ores: Remove debug notify [puppet] - 10https://gerrit.wikimedia.org/r/292324 [09:46:59] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ores: Remove debug notify [puppet] - 10https://gerrit.wikimedia.org/r/292324 (owner: 10Alexandros Kosiaris) [09:47:11] better [09:47:36] (03Abandoned) 10Jcrespo: Make puppet-lint happy for ores module [puppet] - 10https://gerrit.wikimedia.org/r/292322 (owner: 10Jcrespo) [09:48:22] (03PS6) 10Jcrespo: Add ores-admins group and provide permissions for scb [puppet] - 10https://gerrit.wikimedia.org/r/291716 (https://phabricator.wikimedia.org/T136406) [09:58:58] 06Operations, 10DBA, 13Patch-For-Review: Investigate/decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#1998016 (10MoritzMuehlenhoff) I noticed this ticket when checking for db servers without base::firewall enabled: Summarising: - db2008/db2009 were removed from mediawiki in https://gerrit.wiki... [09:59:30] (03PS3) 10Muehlenhoff: Add pollux/openldap to the backup [puppet] - 10https://gerrit.wikimedia.org/r/292318 [10:02:25] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add pollux/openldap to the backup [puppet] - 10https://gerrit.wikimedia.org/r/292318 (owner: 10Muehlenhoff) [10:08:00] 06Operations, 10DBA, 13Patch-For-Review: Investigate/decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#2348590 (10jcrespo) I would remove them all when Daniel finishes his work. [10:08:53] 06Operations, 10DBA, 13Patch-For-Review: Investigate/decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#2348591 (10jcrespo) [10:08:55] !log restbase enabling puppet back in production [10:09:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:09:15] 06Operations, 10DBA: Investigate/decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#1998016 (10jcrespo) [10:09:20] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [10:12:01] PROBLEM - puppet last run on pollux is CRITICAL: CRITICAL: Puppet has 2 failures [10:12:34] ^looking into it [10:14:50] !log archiving again syslog.1 from ms-be2012 on /srv/swift-storage/sdl1/tmp [10:14:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:18:50] RECOVERY - Disk space on ms-be2012 is OK: DISK OK [10:19:57] (03PS1) 10Muehlenhoff: Also add backup::host for oit/corp LDAP mirror [puppet] - 10https://gerrit.wikimedia.org/r/292333 [10:24:10] (03PS2) 10Muehlenhoff: Also add backup::host for oit/corp LDAP mirror [puppet] - 10https://gerrit.wikimedia.org/r/292333 [10:28:38] (03CR) 10Muehlenhoff: [C: 032 V: 032] Also add backup::host for oit/corp LDAP mirror [puppet] - 10https://gerrit.wikimedia.org/r/292333 (owner: 10Muehlenhoff) [10:37:30] RECOVERY - puppet last run on pollux is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [10:43:50] PROBLEM - puppet last run on dubnium is CRITICAL: CRITICAL: Puppet has 2 failures [10:47:27] I am going to crash zuul for half an hour or so to take traces. [10:47:37] I cant reproduce on my local machine and really need prod traces [10:47:50] so CI would stop triggering for a few, will recheck whatever got missed [10:48:27] (03PS3) 10Muehlenhoff: Ferm rules for puppetmaster frontend [puppet] - 10https://gerrit.wikimedia.org/r/283174 [10:49:40] !log gracefully stopping Zuul, will upgrade / take traces etc over the next half hour or so [10:49:40] RECOVERY - puppet last run on dubnium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [10:49:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:50:43] !log gallium: stopped puppet agent [10:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:53:37] 06Operations, 10MediaWiki-extensions-Score: Image could not be trimmed - https://phabricator.wikimedia.org/T136657#2348701 (10Tgr) `/usr/local/bin/mediawiki-firejail-convert` is from `$wgImageMagickConvertCommand` and surely there would be more breakage if that was wrong... so maybe the problem is that this sh... [10:54:50] 06Operations, 10Ops-Access-Requests, 06WMF-NDA-Requests: NDA request for @WMDE-leszek - https://phabricator.wikimedia.org/T133145#2222784 (10jcrespo) I do not think that giving NDA access to everyone just because they need to create graphana dashboards (as it used to be public) is wise. > I am planning to us... [10:55:51] !log Restarted Zuul and reenabled puppet on gallium [10:55:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:56:04] ImportError: No module named schedulers.background [10:56:07] ... [10:56:13] quite easy to fix luckily [10:56:43] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/283174 (owner: 10Muehlenhoff) [11:05:49] Im not sure if i should have filled https://phabricator.wikimedia.org/T136816 but phabricator is not allowing you to be logged out to view the site. [11:09:52] It's https://phabricator.wikimedia.org/W749 which needs changing to public [11:09:55] please [11:13:32] paladox, it is a known issue, I have already filed a ticket [11:13:45] however, it does only afect the home dashboard [11:14:01] tickets can be seen as anonymous [11:14:18] jynus Oh its https://phabricator.wikimedia.org/W749 causing it [11:14:23] Needs to be changed to public [11:14:35] since it is currently only viewable by all users. [11:15:15] not even I have permissions to edit that [11:15:54] jynus oh, i wonder what Custom Policy is [11:16:10] you should be able to access https://phabricator.wikimedia.org/dashboard/view/1/ [11:16:46] jynus yep thanks [11:17:10] RECOVERY - https://phabricator.wikimedia.org on iridium is OK: HTTP OK: HTTP/1.1 200 OK - 25507 bytes in 0.152 second response time [11:17:13] 06Operations, 10MediaWiki-extensions-Score: Image could not be trimmed - https://phabricator.wikimedia.org/T136657#2343353 (10MoritzMuehlenhoff) @Tgr The wrapper is currently only installed on the image scalers, but there actually seem to be extensions which shell out to imagemagick on the app servers (On fluo... [11:17:27] probably someone fixed it [11:18:06] jynus yep Aklapper changed it to public [11:18:16] great [11:20:20] sorry for that, not many of us use phab on without logging in [11:21:08] 06Operations, 10MediaWiki-extensions-Score: Image could not be trimmed - https://phabricator.wikimedia.org/T136657#2348756 (10MoritzMuehlenhoff) I'll revert the wmf-config change for now, so that this can be sorted out properly. [11:27:02] (03PS1) 10Muehlenhoff: Disable the firejail wrapper for imagemagick temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292337 [11:36:21] (03CR) 10Muehlenhoff: [C: 032 V: 032] Disable the firejail wrapper for imagemagick temporarily [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292337 (owner: 10Muehlenhoff) [11:38:40] PROBLEM - puppet last run on labsdb1003 is CRITICAL: CRITICAL: puppet fail [11:39:13] !log jmm@tin Synchronized wmf-config/CommonSettings.php: disable firejail security hardening for image scalers, needs more work for the Score extension (duration: 00m 36s) [11:39:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:51:59] (03CR) 10Bartosz Dziewoński: "(For future reference, this was because of T136657.)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292337 (owner: 10Muehlenhoff) [11:52:31] 06Operations, 10MediaWiki-extensions-Score: Image could not be trimmed - https://phabricator.wikimedia.org/T136657#2343353 (10matmarex) https://gerrit.wikimedia.org/r/#/c/292337/ [13:39] !log jmm@tin Synchronized wmf-config/CommonSettings.php: disable firejail security hardening for image scalers,... [11:55:07] 06Operations, 10MediaWiki-extensions-Score: Image could not be trimmed - https://phabricator.wikimedia.org/T136657#2348833 (10MoritzMuehlenhoff) 05Open>03Resolved a:03MoritzMuehlenhoff This has been reverted, sorry for the noise. I'll doublecheck with the Score extension before re-enabling. [11:58:29] (03CR) 10Elukey: [C: 04-1] "Found a way to improve perfomance reducing strstr calls, this patch is not ready for review :)" [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/292172 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey) [12:00:24] 06Operations, 10MediaWiki-extensions-Score: Image could not be trimmed - https://phabricator.wikimedia.org/T136657#2348841 (10Bonvol) Thank you, it works (confirmed on example from T136818) [12:05:08] 06Operations, 06Labs, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#2348847 (10faidon) Thanks. Testing again production (e.g. upload.wikimedia.org, but any host including bast1001 would do) would also be a useful data poin... [12:05:42] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/291116 (https://phabricator.wikimedia.org/T136188) (owner: 10Hashar) [12:06:19] 06Operations, 06Labs, 10netops: Intermittent bandwidth issue to labs proxy (eqiad) from Comcast in Portland OR - https://phabricator.wikimedia.org/T136671#2343755 (10faidon) p:05Triage>03Normal [12:06:19] RECOVERY - puppet last run on labsdb1003 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [12:08:26] 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2348850 (10Gilles) [12:10:41] !log Upgraded Zuul upstream code being 66c8e52..30a433b package is 2.1.0-151-g30a433b-wmf1precise1 [12:10:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:14:54] (03CR) 10Muehlenhoff: [C: 032 V: 032] Ferm rules for puppetmaster frontend [puppet] - 10https://gerrit.wikimedia.org/r/283174 (owner: 10Muehlenhoff) [12:20:42] 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2348854 (10SindyM3) Lemonbit says: We have successfully installed the certificate for www.wikilovesmonuments.org (alias wikilovesmonuments.org), SSL connectio... [12:21:01] 06Operations, 06Performance-Team, 10Thumbor: Package and backport Thumbor dependencies in Debian - https://phabricator.wikimedia.org/T134485#2348856 (10Gilles) [12:21:29] 06Operations, 06Performance-Team, 10Thumbor: Package Thumbor for Debian - https://phabricator.wikimedia.org/T134485#2267027 (10Gilles) [12:22:10] 06Operations, 06Performance-Team, 10Thumbor: Package Thumbor for Debian - https://phabricator.wikimedia.org/T134485#2267027 (10Gilles) [12:22:27] 06Operations, 06Performance-Team, 10Thumbor: Package Thumbor for Debian - https://phabricator.wikimedia.org/T134485#2267027 (10Gilles) [12:28:32] 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2348880 (10Akoopal) 05Open>03Resolved a:03Akoopal Per comment from Sindy, certificate is installed. [12:36:43] (03PS1) 10Muehlenhoff: Enable base::firewall for palladium [puppet] - 10https://gerrit.wikimedia.org/r/292345 [12:46:43] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/292345 (owner: 10Muehlenhoff) [12:53:24] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [12:53:44] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [1000.0] [12:57:23] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:57:34] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [12:57:50] 06Operations, 10Gerrit: Gerrit replication to furud.codfw.wmnet fails with: reject HostKey: furud.codfw.wmnet - https://phabricator.wikimedia.org/T136822#2348972 (10hashar) [13:11:18] 06Operations, 10Gerrit: Gerrit replication to furud.codfw.wmnet fails with: reject HostKey: furud.codfw.wmnet - https://phabricator.wikimedia.org/T136822#2348998 (10hashar) First occurence is: ``` [2016-05-31 11:07:45,050] INFO com.google.gerrit.pgm.Daemon : Gerrit Code Review 2.8.1-4-ga1048ce ready [2016-05-... [13:14:00] 06Operations, 10Gerrit: Gerrit replication to furud.codfw.wmnet fails with: reject HostKey: furud.codfw.wmnet - https://phabricator.wikimedia.org/T136822#2349000 (10hashar) Replication has been enabled on April 19th via https://gerrit.wikimedia.org/r/#/c/284097/ [13:32:33] (03PS4) 10Ottomata: Support additional reportupdater directories [puppet] - 10https://gerrit.wikimedia.org/r/289007 (https://phabricator.wikimedia.org/T126549) (owner: 10Milimetric) [13:34:58] !log Downgrading Zuul back to zuul_2.1.0-95-g66c8e52-wmf1precise1_amd64.deb . Paramiko cant acquire ssh connection with Gerrit for some reason... https://phabricator.wikimedia.org/P3204 [13:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:38:36] (03CR) 10Ottomata: [C: 032] Support additional reportupdater directories [puppet] - 10https://gerrit.wikimedia.org/r/289007 (https://phabricator.wikimedia.org/T126549) (owner: 10Milimetric) [13:38:50] (03CR) 10Ottomata: [V: 032] Support additional reportupdater directories [puppet] - 10https://gerrit.wikimedia.org/r/289007 (https://phabricator.wikimedia.org/T126549) (owner: 10Milimetric) [13:52:03] !log installing imagemagick security updates on Ubuntu systems (but affected decoders already neutralised by policy changes) (also Debian systems already addressed) [13:52:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:11:52] (03PS1) 10Eevans: Ignore the ColUpdateTimeDeltaHistogram metric (broken) [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/292356 (https://phabricator.wikimedia.org/T126629) [14:13:18] (03PS1) 10Eevans: Ignore the ColUpdateTimeDeltaHistogram metric (broken) [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/292357 (https://phabricator.wikimedia.org/T126629) [14:13:34] 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2349181 (10Dzahn) This is great but i cant confirm yet it has been installed. wikilovesmonuments.org uses an invalid security certificate. The certificate is... [14:14:03] 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2349182 (10Dzahn) 05Resolved>03Open [14:15:59] (03PS1) 10Jdrewniak: T133432 deploying new localized top-links on wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292358 (https://phabricator.wikimedia.org/T133432) [14:18:55] (03Abandoned) 10Eevans: Ignore the ColUpdateTimeDeltaHistogram metric (broken) [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/292356 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [14:19:52] PROBLEM - DPKG on stat1003 is CRITICAL: DPKG CRITICAL dpkg reports broken packages [14:21:52] RECOVERY - DPKG on stat1003 is OK: All packages OK [14:22:55] (03PS2) 10Addshore: Load the RevisionSlider extension on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287936 (https://phabricator.wikimedia.org/T134770) [14:23:19] (03PS3) 10Addshore: Load the RevisionSlider extension on beta cluster [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287936 (https://phabricator.wikimedia.org/T134770) [14:30:29] 06Operations, 06Labs, 10Tool-Labs, 10netops: Someone seems to be running a port scanner in labs - https://phabricator.wikimedia.org/T136829#2349244 (10Peachey88) [14:33:32] !log acked ores icinga checks on some scb hosts and pointing to T124201 (it seems the checks arrived before the actual setup) [14:33:33] T124201: Setup ores on scb cluster - https://phabricator.wikimedia.org/T124201 [14:33:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:37:37] (03PS1) 10Eevans: Enable updated cassandra-metrics-collector version [puppet] - 10https://gerrit.wikimedia.org/r/292361 (https://phabricator.wikimedia.org/T126629) [14:38:28] (03CR) 10Mobrovac: [C: 031] Enable updated cassandra-metrics-collector version [puppet] - 10https://gerrit.wikimedia.org/r/292361 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [14:38:49] (03CR) 10Eevans: "This should not be merged until after https://gerrit.wikimedia.org/r/#/c/292357/ has, and Trebuchet deploy has been performed." [puppet] - 10https://gerrit.wikimedia.org/r/292361 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [14:41:39] 06Operations, 10Math, 10RESTBase, 06Services, and 2 others: parameter mathpurge=true should purge cache in restbase - https://phabricator.wikimedia.org/T136205#2349293 (10mobrovac) p:05Triage>03High a:05Physikerwelt>03mobrovac [14:44:37] 06Operations, 06Labs, 10Tool-Labs, 10netops: Someone seems to be running a port scanner in labs - https://phabricator.wikimedia.org/T136829#2349321 (10Andrew) @Reedy, you're right, I was briefly confusing the src and dst ports. I'll rename [14:45:01] 06Operations, 06Labs, 10Tool-Labs, 10netops: bitninja upset about us running a crawler - https://phabricator.wikimedia.org/T136829#2349322 (10Andrew) [14:52:03] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 6 failures [14:52:55] (03PS1) 10Mobrovac: Change Prop: Don't decode the responses' bodies [puppet] - 10https://gerrit.wikimedia.org/r/292366 [14:58:03] (03CR) 10Mobrovac: "PCC is happy - https://puppet-compiler.wmflabs.org/3034/" [puppet] - 10https://gerrit.wikimedia.org/r/292366 (owner: 10Mobrovac) [14:59:47] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2349377 (10Cmjohnson) Mgmt issues on db1089 and db1094 are fixed and they're installing now. [15:00:05] anomie ostriches thcipriani marktraceur aude: Respected human, time to deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160602T1500). Please do the needful. [15:00:05] mobrovac jan_drewniak: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:14] * mobrovac is here [15:00:22] I can SWAT. [15:00:22] o/ [15:01:02] thcipriani: you can merge both of my patches at the same time [15:02:22] PROBLEM - puppet last run on mw1151 is CRITICAL: CRITICAL: Puppet has 1 failures [15:02:22] PROBLEM - puppet last run on mw1249 is CRITICAL: CRITICAL: Puppet has 1 failures [15:04:42] (03CR) 10Thcipriani: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292358 (https://phabricator.wikimedia.org/T133432) (owner: 10Jdrewniak) [15:05:51] (03Merged) 10jenkins-bot: T133432 deploying new localized top-links on wikipedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292358 (https://phabricator.wikimedia.org/T133432) (owner: 10Jdrewniak) [15:08:50] ok, going to get portals out while waiting for zuul [15:09:02] thcipriani: haven't synced my patches yet? [15:09:27] mobrovac: nope, still waiting on jenkins right now [15:09:32] https://integration.wikimedia.org/zuul/ [15:09:46] ah right, sorry [15:11:34] !log thcipriani@tin Synchronized portals/prod/wikipedia.org/assets: [[gerrit:292358|deploying new localized top-links on wikipedia.org]] (duration: 00m 32s) [15:11:39] I wish extension patches merged as fast as config changes :) [15:12:05] !log thcipriani@tin Synchronized portals: [[gerrit:292358|deploying new localized top-links on wikipedia.org]] (duration: 00m 31s) [15:12:13] ^ jan_drewniak check please [15:13:08] thcipriani: looks good, thanks! [15:13:14] jan_drewniak: thanks for checking [15:15:28] finally merged! [15:15:46] \o/ [15:15:50] ok, pulling on tin [15:18:02] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:47] !log thcipriani@tin Synchronized php-1.28.0-wmf.4/extensions/Math: SWAT: [[gerrit:292344|Use img instead of meta tags for SVGs]] and [[gerrit:292340|Fix iterator in batchGetMathML]] (duration: 00m 28s) [15:19:52] ^ mobrovac check please [15:19:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:20:23] checking [15:22:31] thcipriani: all good! [15:22:34] thnx! [15:22:42] mobrovac: awesome. Thanks for checking :) [15:25:35] (03CR) 10Mobrovac: [C: 04-1] "Can't go out before https://github.com/gwicke/template-expression-compiler/pull/2 is merged, published and deployed." [puppet] - 10https://gerrit.wikimedia.org/r/292366 (owner: 10Mobrovac) [15:25:59] 06Operations, 10Math, 10RESTBase, 06Services, and 2 others: parameter mathpurge=true should purge cache in restbase - https://phabricator.wikimedia.org/T136205#2349435 (10BBlack) >> * If Mathoid URLs are project/language-independent, why are they hosted under project/language-specific URLs? That's pointl... [15:26:37] (03PS1) 10Ottomata: Install python-pymysql on statistics compute nodes [puppet] - 10https://gerrit.wikimedia.org/r/292373 [15:27:00] (03CR) 10Ottomata: [C: 032 V: 032] Install python-pymysql on statistics compute nodes [puppet] - 10https://gerrit.wikimedia.org/r/292373 (owner: 10Ottomata) [15:28:22] RECOVERY - puppet last run on mw1249 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [15:30:31] RECOVERY - puppet last run on mw1151 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:32:42] PROBLEM - puppet last run on mw2126 is CRITICAL: CRITICAL: Puppet has 1 failures [15:38:04] dataset1001 out of space? [15:39:12] must have been a glitch in the matrix [15:42:22] RECOVERY - puppet last run on mw2126 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [15:49:25] (03PS1) 10Kaldari: Test PageAssessments on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292376 (https://phabricator.wikimedia.org/T125551) [15:55:38] 06Operations, 10DBA, 13Patch-For-Review: Install, configure and provision recently arrived db core machines - https://phabricator.wikimedia.org/T133398#2349511 (10jcrespo) This is the last updated list: db1079 db1080 db1081 db1083 db1084 db1087 db1088 db1089 db1090 db1091 db1092 db1093 db1094 [16:00:04] godog moritzm: Respected human, time to deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160602T1600). Please do the needful. [16:00:05] urandom: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:39] legoktm: ping, https://wikitech.wikimedia.org/wiki/Incident_documentation/20160601-MediaWiki [16:01:15] * urandom is present [16:01:17] o/ [16:04:35] (03PS1) 10Jcrespo: Puppetize new shard s1 db servers [puppet] - 10https://gerrit.wikimedia.org/r/292380 (https://phabricator.wikimedia.org/T133398) [16:05:59] thanks for that writeup, Dereckson [16:12:09] You're welcome greg-g. [16:12:51] moritzm: is there a puppet swat today? [16:13:29] (03CR) 10Jhernandez: "Here's the config on staging we're using" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [16:17:38] urandom, I can do that [16:17:46] jynus: awesome, thanks! [16:19:32] I do not like the second one [16:19:43] oh? [16:19:44] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2349595 (10Cmjohnson) db1082 cable was disconnected. Re-attached and no issues with installations. db1085/86 are having issues with vlans. There may be a switch bug. Having issues... [16:19:55] jynus: what's up? [16:19:57] it is a blob update, I cannot +1 [16:20:10] definitelly not on a puppet swat [16:20:53] ok [16:21:02] it is not a "trivial/easy to check/configuration" change [16:21:25] if it helps, it's a blob that will overwrite a blob that i submitted :) [16:21:51] we can follow up later [16:23:46] how we do the first one? [16:23:58] needs service update, etc.? [16:24:04] nope [16:24:15] well, it only applies to one production node, 1007 [16:24:16] so, just deploy, let puppet handle it? [16:24:20] true [16:24:31] which has puppet disabled, and this exact change live hacked in [16:24:31] I will run puppet afterwards [16:24:36] i can run it [16:24:39] ok [16:24:44] doesn't matter, really [16:24:52] (03PS2) 10Jcrespo: limit mmap'd disk access to indexes only on Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/292305 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [16:24:56] the diff should be effectively zero [16:25:10] whitespace notwithstanding [16:25:13] did it work, or are you still trying? [16:25:33] jynus: not sure i follow [16:25:35] (nothing to do with the commit, just curious) [16:25:42] oh, it works [16:25:43] the issue you had with 2.2 [16:26:19] it's still leaves an open-ended question, but this does resolve the IO issues we were having yesterday [16:26:27] s/it's/it/ [16:26:28] good to know [16:26:38] (03CR) 10Jcrespo: [C: 032] limit mmap'd disk access to indexes only on Cassandra 2.2 [puppet] - 10https://gerrit.wikimedia.org/r/292305 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [16:27:50] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [16:27:59] mmm [16:28:49] jynus: ?? [16:29:03] it is disabled there still [16:29:11] I have not touched it yet [16:29:25] i enabled it, but it's going to fail without the other gerrit [16:29:51] because i masked the cassandra-metrics-collector systemd unit to preserve a live hack [16:29:56] how is that possible? [16:30:05] ah! [16:30:38] strange though [16:30:49] it undid the disk_access_mode: mmap_index_only setting [16:31:06] did you run it before I merged it? [16:31:17] https://phabricator.wikimedia.org/P3206 [16:31:25] jynus: it's possible, but i just reran it now [16:31:45] you have a bunch of unpuppetized things there, and that is a problem [16:32:20] well, it applied changes to the yaml config, just not the change in question [16:33:06] there we go [16:33:25] jynus: now it's good [16:33:32] puppet failure notwithstanding [16:35:02] it is not logging correctly [16:35:12] what is not logging? [16:35:22] puppet? [16:35:29] yes [16:35:40] (03CR) 10Ottomata: "Hm, yeah, this looks totally fine to me." [software/varnish/varnishkafka] - 10https://gerrit.wikimedia.org/r/292172 (https://phabricator.wikimedia.org/T136314) (owner: 10Elukey) [16:37:21] jynus: i see it logging the failure to start cassandra-metrics-collector.service [16:38:09] (03CR) 10Jhobs: [C: 04-1] "Thanks, Joaquin! I've confirmed with Adam that the sampling rate we want for logging is 10%, which is the default for the extension (hence" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [16:38:28] jynus: i can see where it logged the removal of the disk_access_mode line, but i not when the subsequent run re-added it, is that what you mean? [16:39:18] s/but i/but/ [16:42:18] it is because the change failed to be propagated to strontium [16:43:11] oh, is that why it failed to do the right thing on the first run? [16:44:28] jynus: re: the other change, https://gerrit.wikimedia.org/r/#/c/292357/ is the build result of https://github.com/wikimedia/cassandra-metrics-collector/pull/15/files, and is already running on 1007 (in a screen session) [16:44:30] 06Operations, 10ops-eqiad, 13Patch-For-Review: Rack and set up 16 db's db1079-1094 - https://phabricator.wikimedia.org/T135253#2349631 (10Cmjohnson) with a little help from faidon the vlans were fixed but failed during install. Chris [16:44:55] you could build it yourself, but it will require some steps [16:45:16] and it's a java project, which is why most people would rather not have anything to do with it :) [16:45:57] I think it is ok now [16:47:04] 06Operations, 10Analytics: Jmxtrans failures on Kafka hosts caused metric holes in grafana - https://phabricator.wikimedia.org/T136405#2349644 (10Milimetric) [16:47:07] 06Operations, 10Traffic, 06Community-Liaisons (Apr-Jun-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2349645 (10Whatamidoing-WMF) If that's considered sensitive information, then definitely not. [16:47:52] urandom, I am technically a certified java developer [16:48:15] jynus: you are more than welcome to build and deploy to archiva! [16:48:57] and re-create https://gerrit.wikimedia.org/r/#/c/292357/ [16:49:07] jynus: if it's ok with you, we could do the RT db thing, where you said tell me at least half an hour before though [16:49:14] or earlier tomorrow [16:49:35] and then i will also give the db2007 back of course [16:49:49] (03PS1) 10Alexandros Kosiaris: ores: Allow specifying specific sudo rules [puppet] - 10https://gerrit.wikimedia.org/r/292385 [16:49:51] yes, mutante, let's do a backup [16:50:22] !log begin reinstall of labmon1001 [16:50:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:50:31] so i will stop the service on magnesium [16:50:43] jynus: godog usually merges these, and i know he knows *what* to do, he just prefers not to have to do it :) [16:50:54] and since you are on duty yourself, no others will probably miss RT right now [16:50:55] godog is on vacation [16:50:59] exactly [16:51:04] (03PS4) 10Jhobs: Enable Hovercards for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) [16:51:21] and technically I should not be doing puppet swat, as I am on clinic [16:51:30] yeah, no worries [16:51:36] i appreciate the one gerrit [16:51:38] but I do now want to deploy a blob without checking [16:51:46] fair enough [16:51:47] maybe someone else wants [16:51:53] paravoid: ?? [16:52:15] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] "https://puppet-compiler.wmflabs.org/3035/ PCC agrees, merging" [puppet] - 10https://gerrit.wikimedia.org/r/292385 (owner: 10Alexandros Kosiaris) [16:52:17] I am not -1 it, but I am not +1 it either [16:52:17] (03CR) 10Merlijn van Deen: [C: 031] "+1 wrt licence change, I'm not sure about the setup.py details. Pywikibot has both license='MIT License', and classifiers=[ 'License :: OS" [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/292056 (owner: 10Yuvipanda) [16:52:52] 06Operations, 07Tracking: Make services manageable by systemd (tracking) - https://phabricator.wikimedia.org/T97402#2349679 (10Milimetric) [16:52:53] and making puppet fail without deploying it is a really bad idea [16:52:54] !log magnesium (RT), tmp. stopped RT and puppet [16:52:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:53:08] PROBLEM - Host labmon1001 is DOWN: PING CRITICAL - Packet loss = 100% [16:53:27] jynus: it was either that, or have no metrics reporting (during a time when all hell was breaking loose) [16:53:43] alex, please tell me if you find issues, latest commit failed to propagate to strontium [16:53:59] let me ack the labmon [16:54:12] (temporarily) [16:56:40] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [16:57:04] (03PS1) 10Dzahn: Revert "requesttracker: use test db if on jessie" [puppet] - 10https://gerrit.wikimedia.org/r/292387 [16:57:16] uhhh... how did it recover? [16:58:42] (03PS2) 10Dzahn: Revert "requesttracker: use test db if on jessie" [puppet] - 10https://gerrit.wikimedia.org/r/292387 [16:58:55] it didn't [16:59:29] (03PS3) 10Dzahn: Revert "requesttracker: use test db if on jessie" [puppet] - 10https://gerrit.wikimedia.org/r/292387 [16:59:54] jynus: it didn't? what was that last entry from icinga-wm? [17:00:04] yurik gwicke cscott arlolra subbu: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160602T1700). [17:00:40] it failed again [17:00:50] oh [17:00:57] there is something wrong with puppet on that node [17:01:29] (03PS4) 10Dzahn: Revert "requesttracker: use test db if on jessie" [puppet] - 10https://gerrit.wikimedia.org/r/292387 (https://phabricator.wikimedia.org/T119112) [17:02:06] I have to attend Dzhans request, please fix the puppet issues and I will deploy them [17:02:56] (03PS6) 10Dzahn: exim: route mail for RT to ununpentium [puppet] - 10https://gerrit.wikimedia.org/r/288721 (https://phabricator.wikimedia.org/T119112) [17:02:58] PROBLEM - puppet last run on restbase1007 is CRITICAL: CRITICAL: Puppet has 1 failures [17:03:34] (03PS7) 10Dzahn: exim: route mail for RT to ununpentium [puppet] - 10https://gerrit.wikimedia.org/r/288721 (https://phabricator.wikimedia.org/T119112) [17:05:10] !log stopping replication from db1001 to db1016 (pasive m1 node) before schema change [17:05:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:05:47] !log stopped exim on magnesium [17:05:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:06:24] mutante, I have stopped replication on m1 failover [17:07:00] you can proceed, if something goes wrong we will failover the proxy or reload the db from there [17:07:12] please ping me again when done [17:07:21] jynus: ok, i will merge the change that makes the new server use m1-master instead of test db https://gerrit.wikimedia.org/r/#/c/292387/4/modules/role/manifests/requesttracker/server.pp [17:07:22] to restart it [17:07:28] will do, yes [17:07:45] I am on standby while I solve some issues with urandom [17:07:50] great, thanks [17:08:12] (03CR) 10Dzahn: [C: 032] Revert "requesttracker: use test db if on jessie" [puppet] - 10https://gerrit.wikimedia.org/r/292387 (https://phabricator.wikimedia.org/T119112) (owner: 10Dzahn) [17:08:37] akosiaris: ores sudo rule change pending? [17:08:37] jynus: i think i found someone to merge the jar file [17:08:54] jynus: and that seems to be what has puppet upset [17:09:11] jynus: so i think we're good, i appreciate your help. thanks! [17:09:16] I was going to check it now, but if someone else has more background, good [17:09:17] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [17:09:21] akosiaris: merged [17:10:04] that icinga-wm message should be fixed [17:10:12] mutante, did it merge to strontium correctly? [17:10:34] I think yes [17:10:49] jynus: yes, it did [17:11:06] in this case it was just a change that was merged but not puppet-merge'd [17:11:07] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [17:11:20] but sometimes its the strontium issue indeed [17:11:34] yes, it was the change before that one that I had to merge manually again there [17:11:46] and I was worried I could have broken it [17:12:40] ah, yea, i have done it before on strontium when it happened.. with cd /var/lib/git/operations/puppet/ and git pull origin [17:12:54] not this time [17:14:00] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 for Pcoombe - https://phabricator.wikimedia.org/T136343#2349807 (10ellery) @Pcoombe Yes, you will need access to hadoop. [17:15:20] 06Operations, 06Labs, 10Tool-Labs, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349820 (10Andrew) [17:16:26] !log running RT database upgrade from 4.0.4 to 4.2.8 [17:16:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:21:35] jynus: i'm done with DB changes [17:21:51] !log ran ALTER TABLE character set utf8 .. (https://phabricator.wikimedia.org/T119112#2311402) on RT db [17:21:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:22:32] mutante, sounded like fast/easy [17:22:48] please test it, there is no rush to enable replication yet [17:23:02] jynus: well..i had the command ready to copy/paste on ticket just finding it took a while :) [17:23:22] the first part looked like rt-setup-database-4 --dba rt --action upgrade --upgrade-from 4.0.4 --upgrade-to 4.2.8 [17:23:35] fortunately it supports the versions we needed [17:26:53] 06Operations, 06Labs, 10Tool-Labs, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10MoritzMuehlenhoff) The report only contains alerts of the kind: "A visitor reached our honeynet and sent... [17:26:54] i am going to start deploying new parsoid code. any reason not to go ahead? [17:27:31] (03CR) 10Muehlenhoff: [C: 032 V: 032] Ignore the ColUpdateTimeDeltaHistogram metric (broken) [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/292357 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [17:28:52] 06Operations, 10ops-codfw: ms-be2012.codfw.wmnet: slot=12 dev=sdm failed - https://phabricator.wikimedia.org/T136395#2333141 (10RobH) So we have 16 300GB Intel 320 series on the shelf. Since the older 320 series is NOT used in any new systems, and only used for spare replacements, I'd suggest we simply swap... [17:29:02] (03PS8) 10Dzahn: exim: route mail for RT to ununpentium [puppet] - 10https://gerrit.wikimedia.org/r/288721 (https://phabricator.wikimedia.org/T119112) [17:29:31] !log starting deploy of new parsoid code [17:29:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:29:42] (03CR) 10Dzahn: [C: 032] exim: route mail for RT to ununpentium [puppet] - 10https://gerrit.wikimedia.org/r/288721 (https://phabricator.wikimedia.org/T119112) (owner: 10Dzahn) [17:31:56] 06Operations, 13Patch-For-Review: move RT off of magnesium - https://phabricator.wikimedia.org/T119112#2349939 (10Dzahn) i ran **rt-setup-database-4 --dba rt --action upgrade --upgrade-from 4.0.4 --upgrade-to 4.2.8** before the above [17:34:30] !log synced new code; restarted parsoid on wtp1001 as a canary [17:34:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:35:51] 06Operations, 06Labs, 10Tool-Labs, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10scfc) Maybe I'm missing something, but the link (https://bitninja.io/incidentReport.php?details=91e8f633... [17:38:21] 06Operations, 06Labs, 10Tool-Labs, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10chasemp) It's hard for me to take the BitNinja reports seriously. Previously, it is my understanding, t... [17:40:50] !log finished deploying parsoid version 7188080b [17:40:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:41:35] (03PS1) 10Dzahn: switch RT from magnesium to misc-web varnish [dns] - 10https://gerrit.wikimedia.org/r/292391 (https://phabricator.wikimedia.org/T119112) [17:43:04] (03PS2) 10Muehlenhoff: Enable updated cassandra-metrics-collector version [puppet] - 10https://gerrit.wikimedia.org/r/292361 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [17:43:22] (03PS2) 10Dzahn: switch RT from magnesium to misc-web varnish [dns] - 10https://gerrit.wikimedia.org/r/292391 (https://phabricator.wikimedia.org/T119112) [17:43:59] (03CR) 10Dzahn: [C: 032] "RT is also behind varnish now and lives on ununpentium as backend" [dns] - 10https://gerrit.wikimedia.org/r/292391 (https://phabricator.wikimedia.org/T119112) (owner: 10Dzahn) [17:45:47] 06Operations, 13Patch-For-Review: move RT off of magnesium - https://phabricator.wikimedia.org/T119112#2350047 (10Dzahn) test ticket created by mail after the switch https://rt.wikimedia.org/Ticket/Display.html?id=10299 [17:46:05] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable updated cassandra-metrics-collector version [puppet] - 10https://gerrit.wikimedia.org/r/292361 (https://phabricator.wikimedia.org/T126629) (owner: 10Eevans) [17:48:15] mutante: thanks [17:49:08] 06Operations, 13Patch-For-Review: decom magnesium (was: Reinstall magnesium with jessie) - https://phabricator.wikimedia.org/T123713#2350078 (10Dzahn) [17:49:10] 06Operations, 13Patch-For-Review: move RT off of magnesium - https://phabricator.wikimedia.org/T119112#2350076 (10Dzahn) 05Open>03Resolved DB is switched and upgraded mail is switched and tested DNS is switched So we now have: - on a virtual machine instead of hardware - on jessie instead of precise - b... [17:49:34] 06Operations, 10ops-codfw: ms-be2012.codfw.wmnet: slot=12 dev=sdm failed - https://phabricator.wikimedia.org/T136395#2350079 (10RobH) I just synced up with @mark about this via IRC. He is aware that I've advised we use the 300GB spare rather than order a new 160GB for replacement. Also chatted with @papaul v... [17:49:49] akosiaris: yw [17:50:20] RECOVERY - puppet last run on restbase1007 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:50:33] 06Operations, 05WMF-NDA: Migrate RT to Phabricator - https://phabricator.wikimedia.org/T38#2350090 (10Dzahn) p:05High>03Low changing prio from high to low since RT got upgraded and moved away from precise [17:50:50] jynus: ^^^ [17:51:08] 06Operations, 06Labs: labnet100[12].eqiad.wmnet need to be reimaged with RAID - https://phabricator.wikimedia.org/T136718#2350094 (10chasemp) [17:51:35] 06Operations, 13Patch-For-Review: decom magnesium (was: Reinstall magnesium with jessie) - https://phabricator.wikimedia.org/T123713#2350095 (10Dzahn) this is now unblocked. just waiting 24 hours [17:55:25] 06Operations, 06Labs, 10Tool-Labs, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2349206 (10doctaxon) @Giftpflanze: Die reden wohl von unserem Bot?! [17:58:55] (03PS2) 10Jcrespo: Puppetize new shard s1 db servers [puppet] - 10https://gerrit.wikimedia.org/r/292380 (https://phabricator.wikimedia.org/T133398) [17:59:12] (03PS1) 10MaxSem: Revert "Increase the number of workers for osm2pgsql." [puppet] - 10https://gerrit.wikimedia.org/r/292394 [17:59:55] (03PS1) 10BryanDavis: logging: disable Wikibase\Client\Changes\WikiPageUpdater channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292395 (https://phabricator.wikimedia.org/T136852) [18:00:42] (03PS1) 10Alexandros Kosiaris: ores: Specify healthcheck_url [puppet] - 10https://gerrit.wikimedia.org/r/292396 [18:01:09] I hope it's still ok to deploy mobileapps [18:01:26] sorry for the delay [18:01:28] 06Operations, 10ops-codfw: ms-be2012.codfw.wmnet: slot=12 dev=sdm failed - https://phabricator.wikimedia.org/T136395#2350155 (10Papaul) a:05Papaul>03fgiunchedi Disk replacement complete [18:02:34] (03CR) 10Jcrespo: [C: 032] Puppetize new shard s1 db servers [puppet] - 10https://gerrit.wikimedia.org/r/292380 (https://phabricator.wikimedia.org/T133398) (owner: 10Jcrespo) [18:02:57] 06Operations, 06Research-and-Data-Backlog, 10Research-management, 06Revision-Scoring-As-A-Service, and 3 others: [Epic] Deploy Revscoring/ORES service in Prod - https://phabricator.wikimedia.org/T106867#2350161 (10akosiaris) [18:03:16] jouncebot: next [18:03:16] In 0 hour(s) and 56 minute(s): MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160602T1900) [18:03:26] bearND: looks like you've got an hour [18:03:57] thanks [18:04:33] !log starting mobileapps deploy [18:04:34] Hmm [18:04:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:04:41] Got an exception on a page [18:04:42] 06Operations, 06Labs, 10Tool-Labs, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2350165 (10Giftpflanze) @doctaxon apparently they do, but it seems the claim is unsubstantiated [18:05:57] ""Exception encountered, of type "Wikimedia\Assert\ParameterAssertionException" [18:06:05] ... [18:06:12] Bsadowski1: what page? [18:06:27] Special:CheckUser [18:06:51] Bsadowski1: some stricter checks were introduced recently and that broke some sloppy code [18:06:52] on login.wikimedia.org [18:06:58] Oh okay [18:06:59] Bsadowski1: there's a bug for this somewhere. known issue [18:07:13] (03PS1) 10Dzahn: remove db2007 from site.pp, done with testing [puppet] - 10https://gerrit.wikimedia.org/r/292397 [18:07:56] (03PS2) 10Dzahn: remove db2007 from site.pp, done with testing [puppet] - 10https://gerrit.wikimedia.org/r/292397 [18:08:03] Well, it's continuing to fail.. [18:08:22] (03CR) 10Dzahn: [C: 032] remove db2007 from site.pp, done with testing [puppet] - 10https://gerrit.wikimedia.org/r/292397 (owner: 10Dzahn) [18:09:49] (03PS3) 10Dzahn: remove db2007 from site.pp, done with testing [puppet] - 10https://gerrit.wikimedia.org/r/292397 (https://phabricator.wikimedia.org/T125827) [18:10:35] (03PS2) 10Alexandros Kosiaris: conftool: Add ores in conftool [puppet] - 10https://gerrit.wikimedia.org/r/291943 (https://phabricator.wikimedia.org/T124202) [18:11:12] (03CR) 10Yurik: [C: 031] Revert "Increase the number of workers for osm2pgsql." [puppet] - 10https://gerrit.wikimedia.org/r/292394 (owner: 10MaxSem) [18:11:40] 06Operations, 06Labs, 10Tool-Labs, 10netops: 'German Wikipedia Broken Weblinks Bot' is ill-behaved and in danger of getting all of Labs blacklisted - https://phabricator.wikimedia.org/T136829#2350197 (10MoritzMuehlenhoff) I've replied to them requesting to whitelist the captcha warnings. Will update the Ph... [18:11:56] (03PS3) 10Alexandros Kosiaris: conftool: Add ores in conftool [puppet] - 10https://gerrit.wikimedia.org/r/291943 (https://phabricator.wikimedia.org/T124202) [18:12:02] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] conftool: Add ores in conftool [puppet] - 10https://gerrit.wikimedia.org/r/291943 (https://phabricator.wikimedia.org/T124202) (owner: 10Alexandros Kosiaris) [18:12:18] (03PS4) 10Dzahn: remove db2007 from site.pp, done with testing [puppet] - 10https://gerrit.wikimedia.org/r/292397 (https://phabricator.wikimedia.org/T125827) [18:12:28] (03CR) 10Dzahn: [V: 032] remove db2007 from site.pp, done with testing [puppet] - 10https://gerrit.wikimedia.org/r/292397 (https://phabricator.wikimedia.org/T125827) (owner: 10Dzahn) [18:15:26] 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2350233 (10RobH) [18:15:42] 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2350237 (10RobH) [18:15:52] 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2350219 (10RobH) p:05Triage>03Normal [18:16:05] 06Operations, 10ops-eqiad: rack/setup/install/deploy labsdb1009-labsdb1011 - https://phabricator.wikimedia.org/T136860#2350219 (10RobH) a:05RobH>03Cmjohnson [18:18:19] !log db2007 shutdown, schedule eternal downtime [18:18:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:19:08] 06Operations, 10ops-codfw, 10DBA: db2034 degraded RAID - https://phabricator.wikimedia.org/T136583#2350259 (10Papaul) a:05Papaul>03jcrespo I received a 2.5" SAS 600GB 15K from HP because they do not have any 3.5" disks . So db2034 will have 11x3.5" SAS 600GB 15K and 1x2.5" 600GB 15k. I don't think this w... [18:19:21] !log mobileapps deployed b2fee30 [18:19:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:19:38] !log db2007, revoke puppet cert, delete salt key, nuke from stored configs / icinga [18:19:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:20:41] 06Operations, 10DBA, 13Patch-For-Review: Investigate/decom db2001-db2009 - https://phabricator.wikimedia.org/T125827#2350265 (10Dzahn) 11:23 < mutante> !log db2007 shutdown, schedule eternal downtime 11:24 < mutante> !log db2007, revoke puppet cert, delete salt key, nuke from stored configs / icinga [18:21:34] PROBLEM - Host lvs2006 is DOWN: PING CRITICAL - Packet loss = 100% [18:24:41] !log powercycle labmon1001 again [18:24:44] 06Operations, 10ops-codfw: lvs2006 degraded RAID - https://phabricator.wikimedia.org/T136584#2350307 (10Papaul) a:05Papaul>03Volans Disk replacement complete. [18:24:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:25:36] 06Operations, 10ops-codfw, 10DBA, 10hardware-requests, 13Patch-For-Review: Decommission es2005-es2010 - https://phabricator.wikimedia.org/T134755#2350326 (10Papaul) [18:25:39] * gwicke thinks mutante is feeling poetic today [18:26:45] ah, eternal downtime ;) [18:27:04] (03PS2) 10Alexandros Kosiaris: lvs: add ores [puppet] - 10https://gerrit.wikimedia.org/r/291945 (https://phabricator.wikimedia.org/T124202) [18:27:45] RECOVERY - Host lvs2006 is UP: PING WARNING - Packet loss = 44%, RTA = 39.55 ms [18:28:14] :) the script to schedule _down_times has a user name of "marvin" [18:28:16] https://en.wikipedia.org/wiki/Marvin_%28character%29 [18:29:36] !log going to try to intentionally trip the NFS check on tools-checker. This will not page [18:29:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:30:50] mutante, if you are now 100% sure RT is working, I will restart replication [18:31:55] (you still have 3 months of backups to regret that, but it will be slower to recover) [18:32:39] jynus: yes, sure enough :) go ahead please [18:34:04] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 187 bytes in 0.416 second response time [18:34:14] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 278 bytes in 0.058 second response time [18:34:15] PROBLEM - NFS read/writeable on labs instances on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/nfs/home - 312 bytes in 0.045 second response time [18:34:17] ^ all expected [18:34:22] and took 5 minutes [18:34:26] which is also expected [18:34:54] * YuviPanda is gonna fix it nao [18:35:03] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - 187 bytes in 0.389 second response time [18:35:15] PROBLEM - Start and verify pages via webservices on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/service/start - 274 bytes in 0.203 second response time [18:36:03] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.307 second response time [18:36:04] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.024 second response time [18:36:14] RECOVERY - NFS read/writeable on labs instances on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.060 second response time [18:36:54] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.301 second response time [18:37:10] (03PS2) 10Rush: Revert "icinga: Make the tools checks not page temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/292311 (https://phabricator.wikimedia.org/T136775) (owner: 10Yuvipanda) [18:37:23] RECOVERY - Start and verify pages via webservices on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 7.040 second response time [18:37:33] (03CR) 10Rush: [C: 031] "saw yuvi test paging, and this seems to have stabilized with the changes. thanks yuvi" [puppet] - 10https://gerrit.wikimedia.org/r/292311 (https://phabricator.wikimedia.org/T136775) (owner: 10Yuvipanda) [18:43:27] !log powercycle labmon1001 again, get into bios [18:43:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:43:43] (03PS3) 10Yuvipanda: Revert "icinga: Make the tools checks not page temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/292311 (https://phabricator.wikimedia.org/T136775) [18:43:53] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "icinga: Make the tools checks not page temporarily" [puppet] - 10https://gerrit.wikimedia.org/r/292311 (https://phabricator.wikimedia.org/T136775) (owner: 10Yuvipanda) [18:45:48] 06Operations, 10MediaWiki-Email, 10Traffic, 07Easy, 07HTTPS: Links in MediaWiki emails should respect the user's https preference - https://phabricator.wikimedia.org/T41676#2350417 (10BBlack) [18:47:21] !log restarting replication on db1016 [18:47:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:49:37] 06Operations, 10Traffic, 06Community-Liaisons (Apr-Jun-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2350457 (10BBlack) Well that's been a question. The raw logs with IP addresses are sensitive. Username lists have been sent to mailing li... [18:52:50] (03PS2) 10BBlack: symlink /.well-known/apple-app-site-association to /apple-app-site-association [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287190 (https://phabricator.wikimedia.org/T130647) (owner: 10Filippo Giunchedi) [18:53:01] (03CR) 10BBlack: [C: 031] symlink /.well-known/apple-app-site-association to /apple-app-site-association [mediawiki-config] - 10https://gerrit.wikimedia.org/r/287190 (https://phabricator.wikimedia.org/T130647) (owner: 10Filippo Giunchedi) [18:53:42] (03CR) 10Addshore: [C: 031] logging: disable Wikibase\Client\Changes\WikiPageUpdater channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292395 (https://phabricator.wikimedia.org/T136852) (owner: 10BryanDavis) [19:00:05] thcipriani: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160602T1900). Please do the needful. [19:00:44] holding for blockers https://phabricator.wikimedia.org/T136040 [19:02:45] (03PS2) 10Alexandros Kosiaris: ores: Specify healthcheck_url [puppet] - 10https://gerrit.wikimedia.org/r/292396 [19:05:14] (03CR) 10Alexandros Kosiaris: [C: 032] ores: Specify healthcheck_url [puppet] - 10https://gerrit.wikimedia.org/r/292396 (owner: 10Alexandros Kosiaris) [19:09:40] thcipriani: I don't have access to the blocker task, do you have an estimate on when it'll be resolved? [19:11:02] (03PS1) 10Jcrespo: Allow the group of users grafana to connect to the admin interface [puppet] - 10https://gerrit.wikimedia.org/r/292405 [19:11:05] dcausse: unclear to me at this point. I've made a comment on the task. There is a patch, but there are no other +1s (comments in this instance since it's somewhat a security thing) on the patch which I'd like to see before I push that out. [19:11:26] ok, thanks [19:11:42] I can review it in about 15 minutes [19:12:36] bawolff: appreciated :) [19:18:06] RECOVERY - ores on scb2002 is OK: HTTP OK: HTTP/1.0 200 OK - 2801 bytes in 0.090 second response time [19:18:36] RECOVERY - ores on scb1002 is OK: HTTP OK: HTTP/1.0 200 OK - 2801 bytes in 0.009 second response time [19:18:36] thcipriani: Ok, I'm looking at the patch now [19:18:36] RECOVERY - ores on scb2001 is OK: HTTP OK: HTTP/1.0 200 OK - 2801 bytes in 0.098 second response time [19:21:53] thcipriani: I'm going to put that patch in gerrit, and then cherry-pick to the wmf branch [19:22:05] PROBLEM - puppet last run on labvirt1010 is CRITICAL: CRITICAL: Puppet has 2 failures [19:22:09] Since its not really a security issue, I don't think the patch needs to be applied secretly [19:22:09] (03CR) 10Alexandros Kosiaris: [C: 032] lvs: add ores [puppet] - 10https://gerrit.wikimedia.org/r/291945 (https://phabricator.wikimedia.org/T124202) (owner: 10Alexandros Kosiaris) [19:22:15] (03PS3) 10Alexandros Kosiaris: lvs: add ores [puppet] - 10https://gerrit.wikimedia.org/r/291945 (https://phabricator.wikimedia.org/T124202) [19:22:24] (03CR) 10Alexandros Kosiaris: [V: 032] lvs: add ores [puppet] - 10https://gerrit.wikimedia.org/r/291945 (https://phabricator.wikimedia.org/T124202) (owner: 10Alexandros Kosiaris) [19:23:15] bawolff: thank you, I'm happy to defer to you on that. [19:27:52] (03CR) 10Bmansurov: [C: 04-1] Enable Hovercards for huwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [19:28:15] (03PS1) 10Alexandros Kosiaris: ores: Add the codfw LVS IP as well [puppet] - 10https://gerrit.wikimedia.org/r/292408 [19:28:45] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ores: Add the codfw LVS IP as well [puppet] - 10https://gerrit.wikimedia.org/r/292408 (owner: 10Alexandros Kosiaris) [19:30:33] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 for Pcoombe - https://phabricator.wikimedia.org/T136343#2350600 (10MeganHernandez_WMF) Approved! Thank you! [19:31:56] thcipriani: So I cherry-picked that to https://gerrit.wikimedia.org/r/#/c/292409/ [19:32:53] thcipriani: So once that gets merged into the wmf branch and deployed to wikis that already on wmf4, it should be fine to continue rolling out wmf4 [19:32:55] bawolff: kk, I can merge and deploy to group0/1 then I'll roll forward to . [19:33:03] cool :) [19:33:06] bawolff: thank you for help :) [19:33:11] no problem [19:34:59] !log akosiaris@palladium conftool action : set/pooled=yes; selector: scb1001.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=scb', 'service=ores']) [19:35:03] !log akosiaris@palladium conftool action : set/pooled=yes; selector: scb1002.eqiad.wmnet (tags: ['dc=eqiad', 'cluster=scb', 'service=ores']) [19:35:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:35:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:35:32] !log akosiaris@palladium conftool action : set/pooled=yes; selector: scb2001.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=ores']) [19:35:36] !log akosiaris@palladium conftool action : set/pooled=yes; selector: scb2002.codfw.wmnet (tags: ['dc=codfw', 'cluster=scb', 'service=ores']) [19:35:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:35:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:38:49] 06Operations, 10Ops-Access-Requests: Requesting access to stat1002 for Pcoombe - https://phabricator.wikimedia.org/T136343#2350647 (10Pcoombe) a:05Pcoombe>03None Thanks. I'll need adding to `analytics-privatedata-users` then. [19:43:06] RECOVERY - HP RAID on db2034 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:9, 1I:1:10, 1I:1:11, 1I:1:12, Controller, Battery/Capacitor [19:43:56] (03PS5) 10Jhobs: Enable Hovercards for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) [19:47:06] RECOVERY - puppet last run on labvirt1010 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [19:48:04] (03CR) 10Jhobs: Enable Hovercards for huwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [19:48:47] RECOVERY - HP RAID on lvs2006 is OK: OK: Slot 0: OK: 1I:1:1, 1I:1:2, Controller, Battery/Capacitor [19:52:10] !log thcipriani@tin Synchronized php-1.28.0-wmf.4/extensions/CheckUser/specials/SpecialCheckUser.php: [[gerrit:292409|Fix Special:Checkuser for log entries when cuc_title = ""]] (duration: 00m 31s) [19:52:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:52:23] (03CR) 10Bmansurov: Enable Hovercards for huwiki (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [19:52:35] ^ I am going to let that sit for a few minutes, then roll forward in 15 mins. [19:52:38] thcipriani: fyi, deploying wmf4 to group2 will switch search traffic to a new elastic version in codfw, so I would not be surprised to see a surge of pool counter errors (hopefully very short) [19:53:22] !log stopping kafka broker and restarting kafka1014 [19:53:23] dcausse: thanks for the heads up. Barring any other blockers, I will be rolling forward at 1:05ish pacific. [19:53:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:57:06] PROBLEM - Host kafka1014 is DOWN: PING CRITICAL - Packet loss = 100% [19:58:16] RECOVERY - Host kafka1014 is UP: PING WARNING - Packet loss = 44%, RTA = 9.99 ms [20:09:10] (03PS1) 10Thcipriani: all wikis to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292420 [20:10:44] (03CR) 10Thcipriani: [C: 032] all wikis to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292420 (owner: 10Thcipriani) [20:11:42] (03Merged) 10jenkins-bot: all wikis to 1.28.0-wmf.4 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292420 (owner: 10Thcipriani) [20:11:58] (03PS1) 10Dzahn: remove magnesium from site.pp, DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/292421 (https://phabricator.wikimedia.org/T123713) [20:12:03] !log thcipriani@tin rebuilt wikiversions.php and synchronized wikiversions files: all wikis to 1.28.0-wmf.4 [20:12:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:12:21] ^ dcausse in case you were waiting for it. [20:12:23] (03PS2) 10Dzahn: remove magnesium from site.pp, DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/292421 (https://phabricator.wikimedia.org/T123713) [20:12:50] thcipriani: thanks! looks good so far [20:12:59] thcipriani: I've got a logging config change if you are feeling up to merging it -- https://gerrit.wikimedia.org/r/#/c/292395/ -- if not I'll do it sometime later today [20:13:12] (03CR) 10Dzahn: [C: 032] remove magnesium from site.pp, DHCP/netboot [puppet] - 10https://gerrit.wikimedia.org/r/292421 (https://phabricator.wikimedia.org/T123713) (owner: 10Dzahn) [20:13:42] bd808: np, I can get it out. [20:14:06] (03PS2) 10Thcipriani: logging: disable Wikibase\Client\Changes\WikiPageUpdater channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292395 (https://phabricator.wikimedia.org/T136852) (owner: 10BryanDavis) [20:14:55] (03CR) 10Thcipriani: [C: 032] logging: disable Wikibase\Client\Changes\WikiPageUpdater channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292395 (https://phabricator.wikimedia.org/T136852) (owner: 10BryanDavis) [20:15:43] (03Merged) 10jenkins-bot: logging: disable Wikibase\Client\Changes\WikiPageUpdater channel [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292395 (https://phabricator.wikimedia.org/T136852) (owner: 10BryanDavis) [20:18:30] !log thcipriani@tin Synchronized wmf-config/InitialiseSettings.php: [[gerrit:292395|logging: disable Wikibase\Client\Changes\WikiPageUpdater channel]] (duration: 00m 26s) [20:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:18:36] ^ bd808 sync'd [20:19:48] thcipriani: awesome [20:20:10] !log magnesium (formerly RT) remove from puppet and icinga, revoked cert and salt key, just waiting another day or before shutdown [20:20:12] reducing log messages my any means necessary :) [20:20:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:20:35] (03PS6) 10Bmansurov: Enable Hovercards for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [20:20:56] thcipriani: 450k events per hour down to zero :) [20:20:57] (03CR) 10Bmansurov: [C: 031] Enable Hovercards for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [20:21:34] * thcipriani hands bd808 a "most improved" trophy [20:23:39] 06Operations, 13Patch-For-Review: decom magnesium (was: Reinstall magnesium with jessie) - https://phabricator.wikimedia.org/T123713#2350795 (10Dzahn) 13:25 < mutante> !log magnesium (formerly RT) remove from puppet and icinga, revoked cert and salt key, just waiting another day or before shutdown [20:24:13] uhhh, I just got: Exception encountered, of type "Exception" when trying to login to mediawiki.org... [20:24:51] logs all seem fine... [20:25:16] thcipriani: certainly T119736 [20:25:16] T119736: Could not find local user data for {Username}@{wiki} - https://phabricator.wikimedia.org/T119736 [20:25:38] you need to login to the wiki mentionned in the exception :/ [20:26:47] (03PS1) 10Dzahn: remove magnesium's public IP [dns] - 10https://gerrit.wikimedia.org/r/292474 (https://phabricator.wikimedia.org/T123713) [20:30:00] dcausse: yes indeedy, thanks for the pointer. [20:36:21] (03PS1) 10Ottomata: Release 2.4.0-1 [debs/python-pykafka] (debian) - 10https://gerrit.wikimedia.org/r/292478 [20:48:53] PROBLEM - puppet last run on db2039 is CRITICAL: CRITICAL: Puppet has 1 failures [20:57:23] 06Operations, 10ops-eqiad, 06Labs, 10Labs-Infrastructure: Reinstall labmon1001 with new disk configuration (and jessie) - https://phabricator.wikimedia.org/T136227#2350898 (10RobH) a:03Cmjohnson Assignging this to @Cmjohnson and adding #ops-eqiad for him to remove power entirely from labsmon1001 and add... [21:00:06] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup paylvs1005-8 - https://phabricator.wikimedia.org/T136881#2350904 (10Cmjohnson) [21:00:22] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup paylvs1005-8 - https://phabricator.wikimedia.org/T136881#2350918 (10Cmjohnson) [21:02:21] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup new fundraising queue servers - https://phabricator.wikimedia.org/T136882#2350932 (10Cmjohnson) [21:02:50] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup new fundraising queue servers - https://phabricator.wikimedia.org/T136882#2350957 (10Cmjohnson) [21:14:11] RECOVERY - puppet last run on db2039 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [21:20:57] (03PS1) 10BBlack: status -> wikitech-static hosting T34796 [dns] - 10https://gerrit.wikimedia.org/r/292482 [21:21:43] (03CR) 10BBlack: [C: 032] status -> wikitech-static hosting T34796 [dns] - 10https://gerrit.wikimedia.org/r/292482 (owner: 10BBlack) [21:34:30] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 602 600 - REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5854005 keys - replication_delay is 602 [21:35:57] 06Operations, 10Traffic, 07HTTPS, 05MW-1.27-release-notes, 13Patch-For-Review: Insecure POST traffic - https://phabricator.wikimedia.org/T105794#2351132 (10chasemp) @qgil sort of a [[ https://en.wikipedia.org/wiki/Hail_Mary_pass | hail mary ]] ping as I'm not sure where to raise the alarm on this. It... [21:40:11] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS on 10.192.48.44:6479 has 1 databases (db0) with 5764871 keys - replication_delay is 0 [21:44:07] (03PS1) 10Cmjohnson: Adding mgmt and production dns entries for new fundraising servers frqueue1001/2, frdb1001, payments1005-8 [dns] - 10https://gerrit.wikimedia.org/r/292485 [21:45:58] 06Operations, 10Mobile-Content-Service, 06Parsing-Team, 06Services, 13Patch-For-Review: Create functional cluster checks for all services (and have them page!) - https://phabricator.wikimedia.org/T134551#2351164 (10GWicke) We have recently seen some service-wide alerts after deployment issues, and they w... [21:46:28] (03PS2) 10Cmjohnson: Adding mgmt and production dns entries for new fundraising servers frqueue1001/2, frdb1001, payments1005-8 [dns] - 10https://gerrit.wikimedia.org/r/292485 [21:47:11] (03CR) 10Cmjohnson: [C: 032] Adding mgmt and production dns entries for new fundraising servers frqueue1001/2, frdb1001, payments1005-8 [dns] - 10https://gerrit.wikimedia.org/r/292485 (owner: 10Cmjohnson) [21:48:12] 06Operations, 10Mobile-Content-Service, 06Parsing-Team, 06Services: ChangeProp / RESTBase / Parsoid outage 2016-05-05 - https://phabricator.wikimedia.org/T134537#2351169 (10GWicke) 05Open>03Resolved Resolving, as the remaining issues are all tracked / in progress elsewhere, with the exception of the Pa... [21:51:31] (03PS3) 10Dzahn: varnish: mv wikimedia_vcl, netmapper_upd to separate files [puppet] - 10https://gerrit.wikimedia.org/r/290875 [21:51:43] (03CR) 10Dzahn: "compiler http://puppet-compiler.wmflabs.org/3041/" [puppet] - 10https://gerrit.wikimedia.org/r/290875 (owner: 10Dzahn) [22:02:54] (03PS1) 10Cmjohnson: Adding production and mgmt dns for maps1001-4 [dns] - 10https://gerrit.wikimedia.org/r/292488 [22:03:02] 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint: Elasticsearch logs are not send to logstash after 2.3.3 upgrade - https://phabricator.wikimedia.org/T136696#2351216 (10debt) [22:04:13] (03CR) 10Cmjohnson: [C: 032] Adding production and mgmt dns for maps1001-4 [dns] - 10https://gerrit.wikimedia.org/r/292488 (owner: 10Cmjohnson) [22:05:08] 06Operations, 10ops-eqiad: Rack/Setup 4 map servers in eqiad - https://phabricator.wikimedia.org/T135018#2351232 (10Cmjohnson) [22:14:57] (03PS1) 10Dzahn: releases: don't define incomingdir, duplicate def [puppet] - 10https://gerrit.wikimedia.org/r/292489 (https://phabricator.wikimedia.org/T136793) [22:17:58] (03PS2) 10Dzahn: releases: don't define incomingdir, duplicate def [puppet] - 10https://gerrit.wikimedia.org/r/292489 (https://phabricator.wikimedia.org/T136793) [22:18:03] 06Operations, 06Discovery, 06Labs, 10hardware-requests, 03Discovery-Search-Sprint: eqiad: (2) Relevance forge servers - https://phabricator.wikimedia.org/T131184#2351298 (10debt) [22:20:55] !log removed my gerrit admin flag [22:21:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:21:58] (03CR) 10Dzahn: [C: 032] releases: don't define incomingdir, duplicate def [puppet] - 10https://gerrit.wikimedia.org/r/292489 (https://phabricator.wikimedia.org/T136793) (owner: 10Dzahn) [22:23:54] 06Operations, 06Discovery, 06Labs, 10hardware-requests: rack/upgrade/setup/install/deploy relforge100[12].eqiad.wmnet - https://phabricator.wikimedia.org/T136708#2351316 (10debt) [22:24:27] 06Operations, 06Discovery, 10Elasticsearch, 03Discovery-Search-Sprint, 13Patch-For-Review: Increase time before alter for elasticsearch disk space issues - https://phabricator.wikimedia.org/T136702#2351318 (10debt) [22:24:57] (03Restored) 10Gergő Tisza: [HOLD] Enable AuthManager on beta wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291782 (https://phabricator.wikimedia.org/T135504) (owner: 10Gergő Tisza) [22:28:36] (03PS2) 10Gergő Tisza: Enable AuthManager on beta wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291782 (https://phabricator.wikimedia.org/T135504) [22:31:28] (03PS4) 10Mattflaschen: Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) [22:31:49] (03CR) 10Mattflaschen: "I don't think there are any technical obstacles left. Product side is being discussed on the bug." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [22:33:07] (03CR) 10jenkins-bot: [V: 04-1] Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [22:34:04] (03PS1) 10Dzahn: aptrepo: re-add distributions.erb template [puppet] - 10https://gerrit.wikimedia.org/r/292491 (https://phabricator.wikimedia.org/T136793) [22:34:45] (03PS1) 10Ori.livneh: Enable "purge" log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292492 [22:35:37] (03CR) 10Ori.livneh: [C: 032] Enable "purge" log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292492 (owner: 10Ori.livneh) [22:36:16] (03Merged) 10jenkins-bot: Enable "purge" log group [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292492 (owner: 10Ori.livneh) [22:37:18] (03PS2) 10Dzahn: aptrepo: re-add distributions.erb template [puppet] - 10https://gerrit.wikimedia.org/r/292491 (https://phabricator.wikimedia.org/T136793) [22:37:44] !log ori@tin Synchronized wmf-config/InitialiseSettings.php: I9dc532b3: Enable "purge" log group (duration: 00m 42s) [22:37:46] (03PS3) 10Dzahn: aptrepo: re-add distributions.erb template [puppet] - 10https://gerrit.wikimedia.org/r/292491 (https://phabricator.wikimedia.org/T136793) [22:37:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:40:49] (03CR) 10Dzahn: [C: 032] aptrepo: re-add distributions.erb template [puppet] - 10https://gerrit.wikimedia.org/r/292491 (https://phabricator.wikimedia.org/T136793) (owner: 10Dzahn) [22:47:02] PROBLEM - puppet last run on mc2012 is CRITICAL: CRITICAL: puppet fail [22:48:37] 06Operations, 10Traffic, 07HTTPS: Preload STS for wikimedia.org - https://phabricator.wikimedia.org/T132685#2351388 (10BBlack) [22:48:39] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: Enforce HTTPS+HSTS on remaining one-off sites in wikimedia.org that don't use standard cache cluster termination - https://phabricator.wikimedia.org/T132521#2351389 (10BBlack) [22:48:42] 06Operations, 10Traffic, 07HTTPS: Invalid web certificate on status.wikimedia.org - https://phabricator.wikimedia.org/T123135#2351390 (10BBlack) [22:48:45] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: status.wikimedia.org has no (valid) HTTPS - https://phabricator.wikimedia.org/T34796#2351385 (10BBlack) 05Open>03Resolved a:03BBlack I've moved the stats.wm.o DNS to wikitech-static, and set up an apache reverse proxy there with a LetsEncrypt cert... [22:58:50] 06Operations, 10Traffic, 07HTTPS: Preload STS for wikimedia.org - https://phabricator.wikimedia.org/T132685#2351398 (10BBlack) We're now basically in shape to do this. I'd like to wait a few days and see how https://status.wikimedia.org/ works out first. Then we can start running through and setting preloa... [23:00:04] RoanKattouw ostriches Krenair MaxSem awight Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160602T2300). [23:00:04] bmansurov kaldari bmansurov tgr Dereckson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:00:12] here [23:00:39] Hi. [23:00:39] o/ [23:01:00] bmansurov: when you wish to deploy a code change, you need to cherry-pick it to a currently deployed branch [23:01:01] I'd be happy to do today's deployment, unless anyone else is eager to? [23:01:10] Wait where did my patch go :/ [23:01:26] Dereckson: ok [23:01:39] Oh I never saved my edit, that's what happened [23:02:05] bmansurov: currenty this is wmf/1.28.0-wmf.4, you've https://www.mediawiki.org/wiki/MediaWiki_1.28/Roadmap as a reference to know which versions are currently deployed [23:02:10] (03PS1) 10Dzahn: aptrepo: mv wikimedia-specific distributions file to role [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) [23:02:42] awight: go ahead :) [23:03:02] k thanks [23:03:18] Dereckson: I clicked on "Cherry pick to" and when I add that branch, I see an error message saying that the branch doesn't exist. [23:03:31] (03CR) 10jenkins-bot: [V: 04-1] aptrepo: mv wikimedia-specific distributions file to role [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [23:03:37] bmansurov: I've prepared it for you sooner this evening: https://gerrit.wikimedia.org/r/#/c/292497/ [23:03:54] (03PS2) 10Dzahn: aptrepo: mv wikimedia-specific distributions file to role [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) [23:04:20] Dereckson: thanks, but I'm trying this one https://gerrit.wikimedia.org/r/#/c/292206/6 first [23:04:24] awight: FYI, I just added my patch (I forgot to hit save on my edit earlier), and Andrew also added one (I noticed because I edit conflicted with him) [23:04:45] bmansurov: oh, that's only needed for extensions and core code [23:04:56] Dereckson: ok, i see [23:04:57] for operations/mediawiki-config, master = what's deployed [23:04:58] thanks [23:04:59] (03CR) 10jenkins-bot: [V: 04-1] aptrepo: mv wikimedia-specific distributions file to role [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [23:05:13] (03PS3) 10Dzahn: aptrepo: mv wikimedia-specific distributions file to role [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) [23:05:20] (03PS7) 10Awight: Enable Hovercards for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [23:05:57] (03CR) 10Awight: [C: 032] Enable Hovercards for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [23:06:16] (03CR) 10jenkins-bot: [V: 04-1] aptrepo: mv wikimedia-specific distributions file to role [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [23:06:32] (03Merged) 10jenkins-bot: Enable Hovercards for huwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292206 (https://phabricator.wikimedia.org/T134778) (owner: 10Jhobs) [23:06:32] awight: so yesterday we decided it would be a good idea to systematically ssh mw1017 and `scap pull` first before a scap across all the cluster. Later, scap will automate the process. [23:06:46] Dereckson: ok sure, thanks for the note [23:07:01] awight: can you sync the hovercard submodule change before the config change? [23:07:21] Dereckson: So, I sync /srv/mediawiki-staging, then ssh and scap pull? Or is the first step unnecessary? [23:07:24] tgr: will do [23:07:56] awight: pull on /srv/mediawiki-staging in Tin, then ssh and scap pull [23:08:04] it will take changes from tin /srv/mediawiki-staging [23:08:26] awight: K so anytime is good for the CN... Pls go ahead and do other stuff first, I'll be around :) thx!!!! [23:08:29] (03PS4) 10Dzahn: aptrepo: mv wikimedia-specific distributions file to role [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) [23:09:08] Dereckson: bmansurov: Is there a mw-core patch prepared for the Popups submodule bump, or shall I make one? [23:09:29] awight: none that i know of [23:09:33] awight: should be automatic these days [23:10:01] oh dear! /me reads [23:10:03] yeah, you can git fetch in /srv/mediawiki-staging/php....-4, and you should see a commit for that [23:10:39] great [23:10:52] * awight is thrilled that the deployment documentation is up-to-date! [23:11:50] Uncomitted changes in Kartographer and Math, FWIW [23:12:26] (03CR) 10Dzahn: [C: 032] "no change on carbon (except resource name)" [puppet] - 10https://gerrit.wikimedia.org/r/292500 (https://phabricator.wikimedia.org/T132757) (owner: 10Dzahn) [23:12:31] RECOVERY - puppet last run on mc2012 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [23:12:35] ssh mw1017 [23:12:35] Password: [23:12:36] wat. [23:12:50] * awight tried ssh -A which is not cool [23:12:53] awight: You can't ssh from one machine to the other without agent forwarding [23:12:55] no agent forwarding [23:12:59] (03PS5) 10Jforrester: Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [23:13:11] Instead, use ssh mw1017.eqiad.wmnet from the outside (i.e. directly from a local shell, not from tin) [23:13:11] you could edit your .ssh/config file and add mw1017 after tin [23:13:23] (if you use such aliases) [23:13:47] (03CR) 10Jforrester: "Does this still depend on Ic5a79a3d11864 given that's been abandoned?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [23:13:52] Dereckson: thanks, direct login worked for me [23:14:06] (03CR) 10jenkins-bot: [V: 04-1] Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [23:14:09] (03CR) 10Jforrester: "(PS5 is a rebase.)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) (owner: 10Mattflaschen) [23:14:20] tgr: bmansurov: Popups changes should be on mw1017 now. [23:14:26] awight: you can also do this from tin -- SSH_AUTH_SOCK=/run/keyholder/proxy.sock ssh mwdeploy@wm1017 [23:14:36] awight: ok checking [23:14:37] hehehe [23:15:26] "Uncommitted changes in Kartographer and Math, FWIW" → I suspect they simply pulled changesdirectly into the submodules extension repos [23:15:54] Anybody mind if I go ahead mw1017-deploying more changes? [23:16:14] (03PS2) 10Awight: Test PageAssessments on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292376 (https://phabricator.wikimedia.org/T125551) (owner: 10Kaldari) [23:16:30] awight: i still see the old file at https://hu.wikipedia.org/w/load.php?debug=false&lang=hu&modules=ext.popups.targets.desktopTarget i guess i need to wait a little bit? [23:16:36] hrm. [23:16:51] bmansurov: so test changes in mw1017, you need to use https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [23:16:51] That's probably cached, fwiw [23:17:01] ah thanks. [23:17:23] if you use Chrome, we've an extension to make that ultra simple: https://chrome.google.com/webstore/detail/wikimediadebug/binmakecefompkjggiklgjenddjoifbb [23:17:26] (03CR) 10Awight: [C: 032] Test PageAssessments on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292376 (https://phabricator.wikimedia.org/T125551) (owner: 10Kaldari) [23:17:41] same for Firefox: https://addons.mozilla.org/en-US/firefox/addon/wikimedia-debug-header/ [23:17:52] kaldari: I'm about to deploy your PageAssessments labs config [23:18:11] (03Merged) 10jenkins-bot: Test PageAssessments on Beta Labs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292376 (https://phabricator.wikimedia.org/T125551) (owner: 10Kaldari) [23:18:55] 06Operations, 10Traffic, 07HTTPS, 13Patch-For-Review: status.wikimedia.org has no (valid) HTTPS - https://phabricator.wikimedia.org/T34796#2351571 (10BBlack) (note, above has been edited a few times to correct missing stuff, will keep doing that so this task serves as a good reference) [23:19:07] (I guess there's no way to scap sync-file multiple files?) [23:19:37] Dereckson: I see the new file using the extension. When will I see the change in production so that I can test? [23:19:44] awight: nope. [23:19:49] sync-dir [23:20:00] bmansurov: how do you enable hovercards now that they are out of beta? [23:20:07] !log awight@tin Synchronized wmf-config/CommonSettings-labs.php: Test PageAssessments on Beta Labs (duration: 00m 24s) [23:20:12] bd808: I don't want to yet, cos Popups has config changes that rely on an extensions change [23:20:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:20:13] tgr: look at the footer [23:20:25] tgr: ohh, turn off the gadget [23:20:27] awight: *nod* [23:20:44] bmansurov: you can't test there? it's a full mw server with all the capabilities [23:20:51] !log awight@tin Synchronized wmf-config/InitialiseSettings-labs.php: Test PageAssessments on Beta Labs (duration: 00m 26s) [23:20:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:21:08] Dereckson: np, the change is live. Thank you and awight! [23:21:12] kaldari: deployed. Please lemme know if you need a rollback or anything. [23:21:22] !log awight@tin Synchronized wmf-config/extension-list-labs: Test PageAssessments on Beta Labs (duration: 00m 25s) [23:21:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:21:29] bmansurov: Great! Can you confirm that it works and should be deployed? [23:21:46] awight: yes, I confirm it's working. tgr you? [23:22:10] seems to work, can't reproduce the duplicate bug either [23:22:20] (03PS3) 10Awight: Enable AuthManager on beta wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291782 (https://phabricator.wikimedia.org/T135504) (owner: 10Gergő Tisza) [23:22:21] cool [23:22:31] ok, deploying for real then. [23:23:13] awight: I don't see this one working yet though https://gerrit.wikimedia.org/r/#/c/292206/ [23:23:30] !log awight@tin Synchronized php-1.28.0-wmf.4/extensions/Popups: Do not show Hovercards when NavPopups gadget is enabled on huwiki (duration: 00m 24s) [23:23:34] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:47] bmansurov: Shall I push it live? [23:23:50] bmansurov: how is the 1% thing determined? [23:24:03] awight: yes please [23:24:13] either I am very lucky or something is wrong with that [23:24:26] tgr: that 1% is not in effect yet [23:24:40] !log awight@tin Synchronized wmf-config/InitialiseSettings.php: Enable Hovercards experiment for 1% of users on huwiki (duration: 00m 24s) [23:24:45] bmansurov: tgr: ^ okay, there's the real deployment [23:24:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:24:57] tgr: i think it uses your user session id [23:25:37] awight: giving it a little time since i don't see the change yet [23:26:24] (03CR) 10Awight: [C: 032] Enable AuthManager on beta wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291782 (https://phabricator.wikimedia.org/T135504) (owner: 10Gergő Tisza) [23:26:25] bmansurov: I tried with a fresh incognito browser a few times and it's always enabled [23:26:37] tgr: yes the config change is not there yet [23:26:53] tgr: can you run this in the browser's console? mw.config.get('wgPopupsExperimentConfig') [23:27:00] tgr: do you get null? [23:27:04] (03Merged) 10jenkins-bot: Enable AuthManager on beta wikitech [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291782 (https://phabricator.wikimedia.org/T135504) (owner: 10Gergő Tisza) [23:27:36] bmansurov: yes [23:27:59] so first it's enabled for everyone then disabled for 99%? [23:28:04] (03PS4) 10Awight: Add namespace translation 'Portal' for diq [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284866 (https://phabricator.wikimedia.org/T133702) (owner: 10Raimond Spekking) [23:28:04] !log awight@tin Synchronized wmf-config/InitialiseSettings.php: Enable AuthManager on beta wikitech (duration: 00m 25s) [23:28:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:28:13] (03CR) 10Awight: [C: 032] Add namespace translation 'Portal' for diq [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284866 (https://phabricator.wikimedia.org/T133702) (owner: 10Raimond Spekking) [23:28:17] (03PS1) 10Dzahn: endowment: remove duplicate Apache site definition [puppet] - 10https://gerrit.wikimedia.org/r/292502 (https://phabricator.wikimedia.org/T136793) [23:28:19] tgr: not really, you must have it enabled in the beta settings [23:28:30] tgr: Also, the AuthManager change is deployed. [23:28:31] tgr: or were you logged out? [23:28:48] yes, incognito mode and logged out [23:28:49] (03Merged) 10jenkins-bot: Add namespace translation 'Portal' for diq [mediawiki-config] - 10https://gerrit.wikimedia.org/r/284866 (https://phabricator.wikimedia.org/T133702) (owner: 10Raimond Spekking) [23:29:07] tgr: that's weird [23:29:14] Ooh, I see what went wrong with our CentralNotice bugfix this week. [23:30:02] 06Operations, 07Tracking: Upgrade Wikimedia servers to Ubuntu Trusty (14.04) (tracking) - https://phabricator.wikimedia.org/T65899#2351607 (10Dzahn) [23:30:04] 06Operations, 06Labs, 10wikitech.wikimedia.org: distribution upgrade for wikitech-static instance - https://phabricator.wikimedia.org/T94585#2351602 (10Dzahn) 05Open>03Resolved a:03Dzahn Meanwhile this was done in T126385 and is on jessie. [23:30:15] AndyRussG: ^ it seems that the deployment mistakes were due to a new script creating the mw-core patch from extensions' wmf/1.28-wmf.4 branch [23:31:10] awight: heh interesting [23:31:11] (03CR) 10Dzahn: [C: 032] endowment: remove duplicate Apache site definition [puppet] - 10https://gerrit.wikimedia.org/r/292502 (https://phabricator.wikimedia.org/T136793) (owner: 10Dzahn) [23:31:19] yeah I also thought about that... [23:31:24] Dereckson: I'll push https://gerrit.wikimedia.org/r/#/c/284866/ to mw1017 now... [23:31:26] something automagically pulling from wmf_deploy [23:31:31] awight: okay [23:31:59] Dereckson: ready to test. [23:32:10] works fine [23:32:35] bmansurov: I think it hashes by IP and I just got lucky [23:32:51] when I go through tor I don't get the cards [23:32:58] !log awight@tin Synchronized wmf-config/InitialiseSettings.php: Add namespace translation 'Portal' for diq (duration: 00m 24s) [23:33:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:33:04] Dereckson: ^ [23:33:21] fine in prod too [23:33:27] tgr: ok, i'm double checking the code [23:34:22] tgr: i think the code is wrong, if no experiment is running popups is enabled by default [23:34:56] tgr: but, there is an experiment for huwiki [23:35:07] "Create Wikipedia Jamaican" is _still_ open? whats missing [23:35:29] there were so many things, i want to make a master list [23:35:44] things needed to create a new language wiki [23:36:06] https://phabricator.wikimedia.org/T134017 going on [23:36:32] 07Blocked-on-Operations, 06Operations, 10Wikidata, 10Wikimedia-Language-setup, and 2 others: Create Wikipedia Jamaican - https://phabricator.wikimedia.org/T134017#2351622 (10Dzahn) What is still missing for this task to be resolved now? [23:36:52] "I've tried playing with this language in the drop-downs of multilingual projects like Meta and Wikidata. On Wikidata, where the Babel box's header and footer change languages when you alter the interface language, jam works for that purpose. But at both Wikidata and Meta I have coding in my Babel box that should add jam-0 when I change the interface language. But that doesn't work. (It doe [23:36:59] s work in general. (Test, say, Hindi.) Any thought as to why that might be?" [23:37:05] We should adress this comment and all is done I think for jam [23:37:28] awight: i still don't see the config variable :( [23:37:52] (03PS1) 10Dzahn: endowment: comment out git:clone until repo exists [puppet] - 10https://gerrit.wikimedia.org/r/292504 (https://phabricator.wikimedia.org/T136793) [23:38:06] bmansurov: I'll take a look... [23:38:07] Dereckson: ok, i thought the merge after that was already in response [23:39:09] (03CR) 10Dzahn: [C: 032] endowment: comment out git:clone until repo exists [puppet] - 10https://gerrit.wikimedia.org/r/292504 (https://phabricator.wikimedia.org/T136793) (owner: 10Dzahn) [23:40:11] tgr: i tried it multiple times while logged out and didn't get popups in any of those tries [23:40:30] tgr: i used incognito btw [23:41:21] bmansurov: I see the changes in 292206 on tin, and randomly peeked at mw1025, the config is there as well. [23:41:40] awight: ok [23:41:47] bmansurov: How are you getting that the config isn't deployed, just by looking at functionality? [23:42:09] awight: running this in the browser console: mw.config.get('wgPopupsExperimentConfig') [23:42:12] mutante, YuviPanda can you help me with this deb-upload perms-denied issue? https://gist.githubusercontent.com/subbuss/b9978254f60fcbd56f0c31a7250b6f42/raw/11150bfc629cfb21ec4aa4e69910e3128bc1f634/gistfile1.txt [23:43:00] mwrepl : [23:43:01] echo $wgPopupsExperimentConfig [23:43:01] [Thu Jun 2 23:42:52 2016] [hphp] [24281:7ffb8d0fb100:0:000001] [] [23:43:05] Notice: Undefined variable: wgPopupsExperimentConfig [23:43:15] subbu: yes, oh. that might be caused by my changes [23:43:47] subbu: this is about uploading to releases., right [23:43:54] echo $wgUsePopups [23:43:55] yes. [23:44:08] (03CR) 10jenkins-bot: [V: 04-1] endowment: comment out git:clone until repo exists [puppet] - 10https://gerrit.wikimedia.org/r/292504 (https://phabricator.wikimedia.org/T136793) (owner: 10Dzahn) [23:44:13] same [23:44:23] mwrepl : echo $usePopups [23:44:33] mwrepl : echo $wgUsePopups [23:44:38] is that we i get it? [23:44:42] subbu: i think it's my fault, sorry, refactored all of the puppet code [23:44:48] subbu: looking [23:44:49] Notice: Undefined variable: wgUsePopups too [23:45:01] mutante, np, thanks for looking. [23:45:09] awight: apparently labstestwiki doesn't get scap updates [23:45:28] I'll try to pull from the other end [23:45:30] hey, wmgUsePopups is at false :/ [23:46:12] but mwscript eval.php huwiki → echo $wmgUsePopups is 1 [23:47:59] AndyRussG: RoanKattouw: your changes are ready to test on mw1017 [23:48:39] awight: K checking [23:48:53] tgr: bmansurov: $wmgPopupsExperimentConfig is well defined [23:49:33] you need to add $wgPopupsExperimentConfig = $wmgPopupsExperimentConfig in CommonSettings.php [23:50:27] Dereckson: ok let me do so [23:50:39] subbu: hmm, not like i expected a permission issue, rather looks like a session was started and then Received disconnect from 10.64.0.196: 11: disconnected by user [23:50:44] bmansurov: other way: or you can rename wmgPopupsExperimentConfig into wgPopupsExperimentConfig in InitialiseSettings.php, and use wfLoadExtension in CommonSettings.php [23:50:50] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [23:51:01] wfLoadExtension can directly use wg variables, without this wmg = wg hack [23:51:02] awight: Looks good but you'll need a scap to actually deploy it, since it contains i18n changes [23:51:08] Dereckson: ok [23:51:09] subbu: could you run it again while i watch the log ? [23:51:13] ok. doing now. [23:51:15] RoanKattouw: oof, okay n.p. [23:51:37] Sorry for not flagging it [23:51:39] mutante, done. failed again. [23:51:41] PROBLEM - Misc HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [23:51:45] subbu: ok, it accepts the key..starts to run the command.. then gets disconnected. .hmm [23:52:00] (03PS1) 10GWicke: WIP: logstash_checker script for canary deploys [puppet] - 10https://gerrit.wikimedia.org/r/292505 (https://phabricator.wikimedia.org/T110068) [23:52:49] i am not logged into tin with -A .. i assume that isn't necessary. [23:53:06] (03CR) 10jenkins-bot: [V: 04-1] WIP: logstash_checker script for canary deploys [puppet] - 10https://gerrit.wikimedia.org/r/292505 (https://phabricator.wikimedia.org/T110068) (owner: 10GWicke) [23:53:07] Looks like this is the last deployment for the week, so it's probably okay that the full scap will drag on a bit [23:53:12] subbu: no that should not be it [23:53:23] arghh [23:53:47] why do they call the live one labswiki but the test one labtestwiki [23:53:54] (03PS1) 10Bmansurov: Enable the Popups experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292506 (https://phabricator.wikimedia.org/T134778) [23:53:58] awight: do you have time for a follow-up patch? [23:54:00] (03PS2) 10GWicke: WIP: logstash_checker script for canary deploys [puppet] - 10https://gerrit.wikimedia.org/r/292505 (https://phabricator.wikimedia.org/T110068) [23:54:07] tgr: yep! [23:54:12] awight: K all good for deploy 2 cluster :) [23:54:19] Dereckson: awight can we deploy a follow up ? https://gerrit.wikimedia.org/r/292506 [23:54:22] great! [23:54:25] sure [23:54:29] thanks [23:54:35] hehe. /me tries to alloy the positivity. [23:54:43] CN is working on both mobile and desktop, no console errors [23:54:54] subbu: debugging.. give me just a few more [23:54:56] good--I'll let the full scap sync it then [23:55:09] (03CR) 10jenkins-bot: [V: 04-1] WIP: logstash_checker script for canary deploys [puppet] - 10https://gerrit.wikimedia.org/r/292505 (https://phabricator.wikimedia.org/T110068) (owner: 10GWicke) [23:55:21] 06Operations, 10Traffic, 10Wiki-Loves-Monuments, 07HTTPS: configure https for www.wikilovesmonuments.org - https://phabricator.wikimedia.org/T118388#2351644 (10Akoopal) intresting. The site fixed is www.wikilovesmonuments.org. without www it seems to just point to the default site [23:55:21] (03CR) 10Awight: [C: 032] Enable the Popups experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292506 (https://phabricator.wikimedia.org/T134778) (owner: 10Bmansurov) [23:55:28] mutante, sure. [23:55:32] RECOVERY - Misc HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:55:42] (03PS1) 10Gergő Tisza: Fix labtestwiki name for AuthManager config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292507 [23:55:57] (03Merged) 10jenkins-bot: Enable the Popups experiment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292506 (https://phabricator.wikimedia.org/T134778) (owner: 10Bmansurov) [23:55:58] awight: ^^ [23:56:01] i alreayd know i broke it with https://gerrit.wikimedia.org/r/#/c/292489/ now [23:56:10] tgr: k [23:56:32] (03PS2) 10Awight: Fix labtestwiki name for AuthManager config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292507 (owner: 10Gergő Tisza) [23:57:09] subbu: try again [23:57:11] (03CR) 10Awight: [C: 032] "Au*gh. definitely worth a fixme..." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292507 (owner: 10Gergő Tisza) [23:57:46] (03Merged) 10jenkins-bot: Fix labtestwiki name for AuthManager config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292507 (owner: 10Gergő Tisza) [23:58:26] (03CR) 10Gergő Tisza: "Original was I48f9df58c73de13c4ade2457d8535e6ff4cffd9c." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/292507 (owner: 10Gergő Tisza) [23:58:41] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [23:58:46] (03CR) 10Gergő Tisza: "Typo fixed in Ie77657d785d345f63f4a6f11006ce7470a795820." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/291782 (https://phabricator.wikimedia.org/T135504) (owner: 10Gergő Tisza) [23:59:34] mutante, progress .. but errors still .. https://gist.githubusercontent.com/subbuss/4a13b5194e047f2e7c7640669bd34938/raw/a9a4582b6c96ef19a492776ef68d7dd36292c904/gistfile1.txt [23:59:36] !log awight@tin Started scap: Deploying labtestwiki AuthManager config; Enabling Popups experiment; CentralNotice fixes for T136408, T136387; Special:Notifications fixes [23:59:37] T136408: Update CentralNotice JSHint config to restrict syntax to ES3 (disallow ES5 or ES6) - https://phabricator.wikimedia.org/T136408 [23:59:38] T136387: CentralNotice failing in older browsers due use of ECMAScript 6 syntax - https://phabricator.wikimedia.org/T136387 [23:59:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master