[00:00:07] Yes, exactly [00:00:19] Just waiting for that IP cap one to merge and then I'll put the whole lot on mw1099 [00:00:26] (03Merged) 10jenkins-bot: IP cap lift for UN Women Editathon in NYC, Aug 12 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303571 (https://phabricator.wikimedia.org/T142396) (owner: 10MarcoAurelio) [00:01:31] mafk: OK, they're all on mw1099, please test [00:01:56] RoanKattouw: Prêt à vos ordres! [00:02:02] (03CR) 10BryanDavis: tools: Don't send per-tool http version stats (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303716 (owner: 10Yuvipanda) [00:02:45] (03PS5) 10Catrope: Rename 'autoreview' to 'autopatrolled' on mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302987 (owner: 10MarcoAurelio) [00:02:49] RoanKattouw: hmm, I dont see them [00:03:31] D'oh, I'm an idiot [00:03:42] Try now [00:05:06] I see some, let me have a look [00:06:39] RoanKattouw: Meta checked, but I forgot one group to add, if I create a folloup patch, would you merge it? [00:06:42] Sure [00:06:54] ok, so Meta checked, moving to fr.wiktionary [00:07:47] (03CR) 1020after4: [C: 031] Phab: Set origin's URL to phab not gerrit [puppet] - 10https://gerrit.wikimedia.org/r/301863 (owner: 10Chad) [00:08:22] frwiktionary lgtm [00:08:28] checking meta again for centralauth [00:09:55] meta for centralauth lgtm [00:10:32] and throttles are un-testable so I guess this is done [00:10:49] I'll do a followup patch for the autopatrol thing at meta [00:11:06] OK, I'll push this to the entire cluster [00:11:18] Can I do the patch that needs a maintenance script now? [00:11:36] (03CR) 10Dzahn: [C: 032] Phab: Set origin's URL to phab not gerrit [puppet] - 10https://gerrit.wikimedia.org/r/301863 (owner: 10Chad) [00:12:09] (03CR) 10Dzahn: [C: 032] Phab: Add ourselves to the list of sites to skip proxying [puppet] - 10https://gerrit.wikimedia.org/r/301862 (owner: 10Chad) [00:12:20] (03PS2) 10Dzahn: Phab: Add ourselves to the list of sites to skip proxying [puppet] - 10https://gerrit.wikimedia.org/r/301862 (owner: 10Chad) [00:12:47] !log catrope@tin Synchronized wmf-config: Various config changes for SWAT (duration: 00m 53s) [00:12:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:13:17] mafk: What's the script that needs to be run for https://gerrit.wikimedia.org/r/#/c/302987/ ? [00:13:40] RoanKattouw: migrateUserGroups IIRC, let me check [00:14:45] RoanKattouw: https://www.mediawiki.org/wiki/Manual:MigrateUserGroup.php [00:14:46] (03PS2) 10Dzahn: Phab: Set origin's URL to phab not gerrit [puppet] - 10https://gerrit.wikimedia.org/r/301863 (owner: 10Chad) [00:15:34] mafk: OK, I'll do that now [00:15:45] (03CR) 10Catrope: [C: 032] Rename 'autoreview' to 'autopatrolled' on mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302987 (owner: 10MarcoAurelio) [00:15:52] (03CR) 10Yuvipanda: tools: Don't send per-tool http version stats (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303716 (owner: 10Yuvipanda) [00:16:19] (03Merged) 10jenkins-bot: Rename 'autoreview' to 'autopatrolled' on mw.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302987 (owner: 10MarcoAurelio) [00:17:17] (03Draft2) 10MarcoAurelio: Follow-up for I4101ac9c [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303721 [00:17:37] (03PS3) 10MarcoAurelio: Follow-up for I4101ac9c [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303721 [00:17:51] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Rename autoreview group to autopatrolled on mw.org (duration: 00m 48s) [00:18:03] RoanKattouw: https://gerrit.wikimedia.org/r/#/c/303721/ <-- followup to the patch [00:20:25] (03CR) 10Catrope: [C: 032] Follow-up for I4101ac9c [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303721 (owner: 10MarcoAurelio) [00:20:36] Script running [00:20:50] * mafk cross fingers [00:20:52] (03Merged) 10jenkins-bot: Follow-up for I4101ac9c [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303721 (owner: 10MarcoAurelio) [00:21:23] oh, I can't believe it's working :) [00:22:30] OK deploying your follow-up now [00:23:09] !log catrope@tin Synchronized wmf-config/InitialiseSettings.php: Give autopatrol to translationadmin too (duration: 00m 47s) [00:23:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:23:20] (03CR) 10Dzahn: [C: 032] phab2001: add IPv6 AAAA and reverse [dns] - 10https://gerrit.wikimedia.org/r/302865 (owner: 10Dzahn) [00:23:24] (03PS2) 10Dzahn: phab2001: add IPv6 AAAA and reverse [dns] - 10https://gerrit.wikimedia.org/r/302865 [00:24:45] script looks it stopped at https://www.mediawiki.org/w/index.php?title=Special:ListUsers&group=autoreview [00:25:03] is still running? [00:25:09] Yes, still running [00:26:29] 06Operations, 10Ops-Access-Requests, 10Fundraising-Backlog: Access request: AWight access to iridium - https://phabricator.wikimedia.org/T142446#2535387 (10MaxSem) Would access to stat100[23] with all the HTTP requests in Hive be better? [00:27:46] RoanKattouw: OK. I hope it is just working on mediawiki.org heh :) [00:28:01] Those few leftover users might just have high user IDs [00:28:02] (03CR) 10BryanDavis: [C: 031] "A reasonable follow-up might be to bucket the HTTP status codes into families too (1xx, 2xx, 3xx, ...)" [puppet] - 10https://gerrit.wikimedia.org/r/303716 (owner: 10Yuvipanda) [00:28:06] It goes through the users in ID order [00:28:15] for one wiki, right? [00:28:15] Oh, it finishes [00:28:17] *finished [00:28:19] Yes, just that one wiki [00:28:32] k, it's the first time I request that script to be executed [00:28:33] And https://www.mediawiki.org/w/index.php?title=Special:ListUsers&group=autoreview is empty [00:28:37] So, success! [00:28:40] yay [00:29:03] and https://www.mediawiki.org/w/index.php?title=Special%3AListUsers&username=&group=autopatrolled&limit=50 full of them [00:29:08] double success [00:31:24] log messages are ok too [00:31:26] 00:30, 9 August 2016 MarcoAurelio (talk | contribs | block) changed group membership for MarcoAurelio@mediawikiwiki from autopatroller to (none) (testing a second) [00:31:39] it does not say anymore "autochecked users" [00:31:45] (03PS2) 10Yuvipanda: tools: Don't send per-tool http version stats [puppet] - 10https://gerrit.wikimedia.org/r/303716 [00:31:46] so I guess everything is ok [00:31:52] (03CR) 10Yuvipanda: [C: 032 V: 032] tools: Don't send per-tool http version stats [puppet] - 10https://gerrit.wikimedia.org/r/303716 (owner: 10Yuvipanda) [00:34:32] https://meta.wikimedia.org/wiki/MediaWiki:Group-autoreview / https://meta.wikimedia.org/wiki/MediaWiki:Group-autopatrolled [00:34:35] The message still exists [00:34:43] I guess that's in WikimediaMessages [00:34:56] But maybe https://meta.wikimedia.org/wiki/Meta:Autochecked_users is still used somewhere? [00:38:14] Krinkle: on flaggedrevs wikis [00:38:23] where the group is called autoreview [00:38:53] and I guess that on WM Messages it's grouppage-autoreview: {{ns:PROJECT}}:Autochecked users [00:39:28] pre-rename: (change visibility) 15:59, 14 February 2016 MarcoAurelio (talk | contribs | block) changed group membership for MarcoAurelio@mediawikiwiki from autochecked user and oversight to autochecked user [00:42:24] thanks for SWAT RoanKattouw [00:49:43] meh https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php;e95a62d4d850bed725ae9bd6631198d9969c610a$8298 [00:49:55] that ], should not have such much tabs [00:50:04] I hope it does not break anything [00:52:43] PROBLEM - puppet last run on mw1211 is CRITICAL: CRITICAL: Puppet has 1 failures [00:54:24] (03Draft2) 10MarcoAurelio: Syntax fix. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303725 [00:55:07] (03PS1) 10Yuvipanda: tools: Use phabricator as source for kubernetes building [puppet] - 10https://gerrit.wikimedia.org/r/303727 (https://phabricator.wikimedia.org/T142448) [00:55:24] Krinkle / RoanKattouw sorry to bother again but if you could https://gerrit.wikimedia.org/r/#/c/303725 to correct a syntax error? [00:56:00] (03PS3) 10MarcoAurelio: Code style fix in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303725 [00:56:23] If the syntax failed, all wikis would be down now. :D [00:56:50] well, that's a good indicator that something went bad [00:57:01] I had my heart in my hand [00:57:20] or rather, scap would've failed on the early checks [00:57:24] (yay) [00:57:41] It's just a coding style. The whitespace is ignored by the server. [00:58:11] (03CR) 10Krinkle: [C: 032] Code style fix in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303725 (owner: 10MarcoAurelio) [00:58:28] I created that patch directly on gerrit. On my laptop Notepad++ is more intelligent ;) [00:58:37] (03Merged) 10jenkins-bot: Code style fix in InitialiseSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303725 (owner: 10MarcoAurelio) [01:00:01] (03PS1) 10Yuvipanda: labstore: Pin dependencies of python3-ldap3 as well [puppet] - 10https://gerrit.wikimedia.org/r/303728 [01:00:53] mafk: Hm.. how does that work? [01:00:57] (creating a new patch in Gerrit) [01:01:02] I know one can edit an existing one [01:01:06] !log krinkle@tin Synchronized wmf-config/InitialiseSettings.php: coding style (duration: 00m 56s) [01:01:06] since the last update [01:01:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:01:41] Krinkle: easy - you go, ie, to https://gerrit.wikimedia.org/r/#/admin/projects/operations/mediawiki-config [01:01:46] then click on create change [01:01:59] enter branch, topic and commit message, and ready to go [01:02:05] (03CR) 10Yuvipanda: [C: 032] labstore: Pin dependencies of python3-ldap3 as well [puppet] - 10https://gerrit.wikimedia.org/r/303728 (owner: 10Yuvipanda) [01:02:07] they're saved as drafts [01:02:16] until you decide to publish them [01:02:46] Aha, and the editor allows you to "add" files [01:02:52] not bad [01:03:14] yep, it saves space on my hard drive [01:03:33] add, remove, etc. [01:03:38] I like it [01:03:49] for petty changes is quite useful [01:03:57] It also runs Jenkins on the unpublished draft [01:04:03] (if authorised) [01:04:05] https://gerrit.wikimedia.org/r/#/c/303729/ [01:05:31] yep [01:06:04] but iirc, only security can see unpublished drafts, and the owner [01:06:21] Hmmm, we should tell people about this. [01:06:38] Gerrit requires explanation. :-( [01:06:53] I worry about malicious users exploting this, but well... [01:07:43] PROBLEM - puppet last run on oxygen is CRITICAL: CRITICAL: Puppet has 1 failures [01:18:22] RECOVERY - puppet last run on mw1211 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [01:22:16] everyone can see unpublished drafts, if you know where to look. [01:22:41] although… perhaps no longer. i knew where to look on gitblit, but we killed it. i'm not sure if phabricator has that data. [01:23:34] I think we may have stopped replicating those branches to diffusion, but you can still find them with git [01:24:00] so yeah drafts are not in any way secret [01:24:09] just not well publicized [01:33:13] RECOVERY - puppet last run on oxygen is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [01:56:39] (03PS1) 10Yuvipanda: prometheus: Add devstat collector to default set [puppet] - 10https://gerrit.wikimedia.org/r/303738 [01:58:19] (03PS1) 10Dzahn: phabricator: add systemd unit file for phd service [puppet] - 10https://gerrit.wikimedia.org/r/303740 [01:58:48] mutante can you do me a favor and search your email inbox for any pages from icinga for parsoid.svc over the last year or so? anything since 2015 [01:59:22] yuvipanda: uh, yea. at least since the alerts@ alias exists.. hold on [01:59:23] (03CR) 10jenkins-bot: [V: 04-1] phabricator: add systemd unit file for phd service [puppet] - 10https://gerrit.wikimedia.org/r/303740 (owner: 10Dzahn) [02:00:50] yuvipanda: March 8 2016, May 4 2016, Aug 2 2016 [02:00:58] and that seems to be what i have [02:01:05] ah, ok thanks :D [02:01:34] wait, you said "parsoid.svc" [02:01:50] strictly that is only the one March 8 event [02:02:07] the other 2 are wtp* machines [02:02:24] PROBLEM - Host eeden is DOWN: PING CRITICAL - Packet loss = 100% [02:02:38] and that one on March 8 is parsoid.svc.codfw.wmnet [02:02:48] not eqiad [02:02:55] mutante ah, I see. [02:03:05] what are you finding out? [02:03:16] mutante I'm trying to find out why / how James_F hasn't been receiving alerts to parsoid.svc [02:03:44] even though he's in the nagios contactgroup for 'parsoid' [02:04:06] you mean by email? [02:04:13] yeah [02:04:15] or SMS [02:04:48] hmm, yea.. my search for alerts@ might not cover it then [02:04:50] (there's an address1 set for him with appropriate US carrier email) [02:05:02] PROBLEM - Host ns2-v4 is DOWN: PING CRITICAL - Packet loss = 100% [02:05:03] that only matches things that would page ops [02:05:03] mutante the last email he got was in 2015 [02:05:07] and nothing since [02:05:07] ah I see. [02:06:57] (03CR) 10Yuvipanda: [C: 032] prometheus: Add devstat collector to default set [puppet] - 10https://gerrit.wikimedia.org/r/303738 (owner: 10Yuvipanda) [02:07:38] checked eeden and it's up [02:07:58] but neon cant talk to it [02:12:17] yuvipanda: soo.. this kind of alert "parsoid.svc.eqiad.wmnet/LVS HTTP IPv4 is OK" i see those from 2012,2013,2014,2015. the last one on 11/2/2015 and then never again [02:12:39] mutante aaah, I see. [02:12:40] the one on March 8 was a "Host DOWN alert for parsoid.svc.codfw.wmnet!" [02:12:44] so that is different [02:12:55] mutante interesting. and that doesn't page people in the 'parsoid' contactgroup? [02:13:32] yes, pretty sure. LVS checks are paging but "HOST down" is not [02:14:11] ah, I see. [02:14:18] and the LVS check seems actually fine since then [02:14:20] so this means that nobody in the parsoid group got the last few alerts [02:14:43] RECOVERY - Host ns2-v4 is UP: PING OK - Packet loss = 0%, RTA = 83.69 ms [02:14:52] well, this is up: [02:14:53] RECOVERY - Host eeden is UP: PING OK - Packet loss = 0%, RTA = 83.48 ms [02:14:57] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=parsoid.svc.eqiad.wmnet&service=LVS+HTTP+IPv4 [02:15:11] that's the one that used to alert in the past [02:16:14] we dont see alerts in the history there but also icinga forgets that fairly soon [02:16:41] yeah [02:20:34] just the following things notify the "parsoid" group of people: a) check http on port 8000 on parsoid.svc.codfw.wmnet b) same for parsoid.svc.eqiad.wmnet. c) "check_http_lvs_on_port" 19000 for citoid and graphoid [02:21:02] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.13) (duration: 09m 24s) [02:21:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:21:09] if there is a "host down" or something else it would not [02:24:09] mutante is there a way we can add it? [02:24:17] host down on a .svc end point seems like an outage that should notify them [02:24:23] (not host down on wtp* hosts etc) [02:25:05] (03PS2) 10Dzahn: phabricator: add systemd unit file for phd service [puppet] - 10https://gerrit.wikimedia.org/r/303740 (https://phabricator.wikimedia.org/T137928) [02:26:06] (03CR) 10jenkins-bot: [V: 04-1] phabricator: add systemd unit file for phd service [puppet] - 10https://gerrit.wikimedia.org/r/303740 (https://phabricator.wikimedia.org/T137928) (owner: 10Dzahn) [02:26:53] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Aug 9 02:26:53 UTC 2016 (duration 5m 51s) [02:27:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:28:33] yuvipanda: normally one would expect that "cant connect to port 8000" is already the better check, like it should not be possible that the host is down but that still works .. hmm [02:29:08] what was the thing that happened recently? [02:29:26] mutante LVS itself died I think [02:29:53] it wasn't an outage of the backends but something network related I think [02:29:55] for a short time? just a minute? [02:30:10] ah yea [02:30:52] i guess then it can happen that it notices the "host down" before it got to the port check again [02:31:04] adding contact groups should be possible. yep [02:32:20] even though i don't see yet where we add the special host to icinga [02:32:27] as i expected we would [02:35:34] oh, that is all somehow taken from hieradata/common/lvs/configuration.yaml seven layers of abstraction [02:42:01] yuvipanda: not easily. lvs/manifests/monitor.pp uses lvs::monitor_service_http_https which uses @monitoring::host which sets the contact_group to "admins" (ops paging). and then cretes that for all services. the contact groups for the services can be changed in the hiera file above, but the hosts not like this [02:43:18] and just saw that it's like this for LVS checks. otherwise we could just set the groups where we use @monitoring::host [02:43:39] I see. [02:48:59] (03PS1) 10Alex Monk: puppetmaster: Merge labs and production auth.conf templates [puppet] - 10https://gerrit.wikimedia.org/r/303742 [02:51:00] (03PS1) 10Yuvipanda: puppetmaster: Cleanup some labs intrusions [puppet] - 10https://gerrit.wikimedia.org/r/303743 [02:51:03] krenair hah! see ^ [02:51:33] * yuvipanda runs puppet compiler [02:51:38] thank god we're working on different parts of this problem [02:52:03] 'fraid we're still left with require_package('ruby-httpclient') though [02:52:14] is what I said in -labs about that true? it's for mwyaml? [02:52:18] krenair yup [02:52:28] ok, surely there's somewhere better to put it [02:52:43] you're running the compiler on both our changes? [02:53:29] nope, just mine now (it's running) I'll run on yours after mine is done [02:53:35] oh I can run both lol [02:53:38] parallely [02:53:51] yep [02:54:00] running both now [02:54:13] palladium.eqiad.wmnet,rhodium.eqiad.wmnet,labcontrol1001.wikimedia.org,labtestcontrol2001.wikimedia.org [02:54:17] are the hosts I'm checking against [02:55:05] labcontrol1002, strontium? [02:56:03] (03PS2) 10Yuvipanda: puppetmaster: Cleanup some labs intrusions [puppet] - 10https://gerrit.wikimedia.org/r/303743 [02:56:12] krenair this is for your change https://puppet-compiler.wmflabs.org/3629/ [02:56:33] any idea what references modules/puppetmaster/files/labs.hiera.yaml ? [02:57:02] strontium died after it saw there is rhodium (jealousy) https://phabricator.wikimedia.org/T142187 [02:57:04] krenair yeah, hiera_config on puppetmaster/init [02:57:06] it's just set to 'realm' [02:57:20] krenair so there's no clear space to put that yet [02:57:31] krenair so your diffs just seem to be whitespace. [02:57:41] mostly [02:57:49] and adding the "# Temporary allow rhodium to compile all the catalogs while testing" comment on labs [02:57:53] not sure it really matters [02:58:30] right [02:58:35] I think it'll be nice to get it to a noop [02:58:39] because then there won't be a restart [02:58:45] Is puppet-compiler not able to cope with labtestcontrol2001? [02:59:57] apparently not :D [03:00:06] it has rnuning puppet [03:00:12] I suspect it's the realm that's tripping it [03:00:16] (03PS2) 10Alex Monk: puppetmaster: Merge labs and production auth.conf templates [puppet] - 10https://gerrit.wikimedia.org/r/303742 [03:00:21] Error: Must pass mysql_password to Class[Labspuppetbackend] at /mnt/jenkins-workspace/puppet-compiler/3629/production/src/manifests/site.pp:1258 on node labtestcontrol2001.wikimedia.org [03:00:21] Error: Must pass mysql_password to Class[Labspuppetbackend] at /mnt/jenkins-workspace/puppet-compiler/3629/production/src/manifests/site.pp:1258 on node labtestcontrol2001.wikimedia.org [03:00:21] Error: Failed to compile catalog for node labtestcontrol2001.wikimedia.org: Must pass mysql_password to Class[Labspuppetbackend] at /mnt/jenkins-workspace/puppet-compiler/3629/production/src/manifests/site.pp:1258 on node labtestcontrol2001.wikimedia.org [03:00:49] I wonder if that just is needing changes added to labs/private [03:02:22] labspuppetbackend::mysql_password is in codfw.yaml, common.yaml, and eqiad.yaml in labs/private.git [03:02:41] I see [03:03:29] krenair I've heard of people having to 'refresh' facts, I wonder if something like that is needed [03:03:37] ok, so I'm at noops https://puppet-compiler.wmflabs.org/3630/ [03:03:39] um [03:03:47] no I think there's something else [03:03:51] (I've no idea how the puppet compiler works, so might be totally bogus) [03:04:31] so, in operations/puppet.git, outside of hieradata/hosts/labtestcontrol2001.yaml and modules/labspuppetbackend/manifests/init.pp [03:04:45] these are the only references to labspuppetbackend: [03:04:46] manifests/site.pp:1258: include labspuppetbackend [03:04:47] manifests/site.pp:1275: labspuppetbackend_horizon => { [03:05:05] It's only on labtestcontrol2001.wikimedia.org [03:05:18] nowhere else [03:05:32] yeah, andrew is actively working on it I think [03:06:46] it is valid to say "include labspuppetbackend" in puppet, not set some values in the pp, but set them in hiera, right? [03:06:56] yeah [03:07:01] puppet itself succeeds [03:07:05] it's something to do with the puppet compiler [03:07:24] (03CR) 10Yuvipanda: [C: 032] puppetmaster: Cleanup some labs intrusions [puppet] - 10https://gerrit.wikimedia.org/r/303743 (owner: 10Yuvipanda) [03:08:14] can you get me in here? https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet3-diffs [03:09:14] yeah [03:10:25] krenair done [03:10:29] ty [03:11:01] krenair@compiler02:/var/lib/catalog-differ/private$ git log -n 1 | grep Date [03:11:02] Date: Thu Aug 4 16:26:14 2016 +0200 [03:11:10] (this is just labs/private.git) [03:11:46] yep, completely up to date [03:12:28] (03PS1) 10Gergő Tisza: Apply mobile cookie domain fix to beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303744 (https://phabricator.wikimedia.org/T49647) [03:13:32] yuvipanda, it does apply labs/private.git to non-labs hosts, right? [03:13:47] yeah [03:13:57] from my understanding of it at least, yeah [03:14:43] I just went to look at the output file in labtestcontrol2001:/etc/uwsgi/apps-enabled/labspuppetbackend.ini and of course it's got a real password, not the one from labs/private.git [03:15:16] right, because that comes from real private puppet [03:15:19] yes [03:16:13] * Krenair sighs [03:16:16] ok, -> pm [03:18:46] (03PS1) 10Yuvipanda: Revert "prometheus: Add devstat collector to default set" [puppet] - 10https://gerrit.wikimedia.org/r/303748 [03:18:53] (03PS2) 10Yuvipanda: Revert "prometheus: Add devstat collector to default set" [puppet] - 10https://gerrit.wikimedia.org/r/303748 [03:19:02] (03CR) 10Yuvipanda: [C: 032 V: 032] Revert "prometheus: Add devstat collector to default set" [puppet] - 10https://gerrit.wikimedia.org/r/303748 (owner: 10Yuvipanda) [04:02:28] (03PS3) 10Dzahn: phabricator: add systemd unit file for phd service [puppet] - 10https://gerrit.wikimedia.org/r/303740 (https://phabricator.wikimedia.org/T137928) [04:06:45] (03PS3) 10Alex Monk: puppetmaster: Merge labs and production auth.conf templates [puppet] - 10https://gerrit.wikimedia.org/r/303742 [04:13:09] yuvipanda, http://puppet-compiler.wmflabs.org/3632/ [04:13:40] krenair awesome. [04:14:06] (03PS4) 10Yuvipanda: puppetmaster: Merge labs and production auth.conf templates [puppet] - 10https://gerrit.wikimedia.org/r/303742 (owner: 10Alex Monk) [04:14:28] (03CR) 10Yuvipanda: [C: 032] "Puppet compiler is happy!" [puppet] - 10https://gerrit.wikimedia.org/r/303742 (owner: 10Alex Monk) [04:16:49] (03PS5) 10Alex Monk: puppetmaster: Merge labs and production auth.conf templates [puppet] - 10https://gerrit.wikimedia.org/r/303742 [04:16:56] krenair I actually made some comments [04:16:59] err, added some comments [04:17:03] (haven't actually merged) [04:17:51] https://gerrit.wikimedia.org/r/#/c/303742/4..5/modules/puppetmaster/manifests/init.pp is fine, +1 [04:18:09] strictly speaking I think we could move the $horizon_host part out of the if block [04:18:14] but it shouldn't matter [04:20:35] yeah [04:24:45] could also change the check to be $hiera_config == "labs" || $hiera_config == "labtest" [04:24:49] for require_package('ruby-httpclient') [04:24:50] (03CR) 10Yuvipanda: [C: 032] puppetmaster: Merge labs and production auth.conf templates [puppet] - 10https://gerrit.wikimedia.org/r/303742 (owner: 10Alex Monk) [04:25:53] but the horizon and private repo things... don't think they can change much other than a rename [04:27:02] krenair is a noop! [04:27:10] great [04:27:12] krenair I think next step is to get rid of is_labs_master from the template [04:27:23] maybe $secure_private=true (false for labs), for the private repo [04:27:42] yeah [04:27:52] that sounds like a decent rename [04:28:13] I think we can safely allow rhodium.eqiad.wmnet to connect to the labs puppetmasters, if it can get around the network rules prohibiting it from doing that [04:29:14] but horizon? I don't know beyond allowing things calling the puppetmaster class to add their own extra config [04:29:23] there's no auth.conf.d system is there? [04:29:34] (03PS3) 10KartikMistry: apertium-fra: Initial Debian packaging [debs/contenttranslation/apertium-fra] - 10https://gerrit.wikimedia.org/r/294252 (https://phabricator.wikimedia.org/T137768) [04:30:09] (03CR) 10jenkins-bot: [V: 04-1] apertium-fra: Initial Debian packaging [debs/contenttranslation/apertium-fra] - 10https://gerrit.wikimedia.org/r/294252 (https://phabricator.wikimedia.org/T137768) (owner: 10KartikMistry) [04:31:07] Krenair for horizon, I think we can add a basic hash [04:31:29] err, list [04:31:32] of hashes [04:31:34] so [04:31:36] (03PS4) 10KartikMistry: apertium-fra: Initial Debian packaging [debs/contenttranslation/apertium-fra] - 10https://gerrit.wikimedia.org/r/294252 (https://phabricator.wikimedia.org/T137768) [04:32:13] { path => '/resource_type', auth => 'any', allow => horizon_host } [04:32:16] this could just be passed in [04:32:22] or we could just make it a string [04:32:26] call it 'extra_rules' [04:32:31] and then just pass that in [04:32:34] I like the extra_rules one better [04:32:38] because it is far more flexibile [04:32:42] extra_auth_conf [04:37:13] krenair mind if I take a stab at it? [04:37:19] I'm doing it [04:38:15] krenair cool! [04:40:39] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-urd-hin] - 10https://gerrit.wikimedia.org/r/296368 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [04:42:57] at least I'm attempting to [04:45:00] (03PS4) 10KartikMistry: apertium-hin: New upstream release and rebuild for Jessie [debs/contenttranslation/apertium-hin] - 10https://gerrit.wikimedia.org/r/296228 (https://phabricator.wikimedia.org/T107306) [04:45:11] (03CR) 10jenkins-bot: [V: 04-1] apertium-hin: New upstream release and rebuild for Jessie [debs/contenttranslation/apertium-hin] - 10https://gerrit.wikimedia.org/r/296228 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [04:48:09] (03PS1) 10Alex Monk: [WIP] puppetmaster: Attempt to kill is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303757 [04:48:20] yuvipanda, ^ [04:48:30] V-1 in 3..2... [04:49:15] (03CR) 10jenkins-bot: [V: 04-1] [WIP] puppetmaster: Attempt to kill is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303757 (owner: 10Alex Monk) [04:49:37] (03CR) 10Yuvipanda: [WIP] puppetmaster: Attempt to kill is_labs_master (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303757 (owner: 10Alex Monk) [04:49:55] krenair awesome. another suggestion (besides the comment) is that it might make sense to split the changes out into two [04:50:19] the secure_private one is far simpler and is a noop change [04:50:22] while the other one isn't a noop change [04:54:50] (03PS1) 10Alex Monk: puppetmaster: Split secure_private from is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303758 [04:56:33] (03PS2) 10Alex Monk: puppetmaster: Split secure_private from is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303758 [04:58:51] (03PS1) 10Yuvipanda: tools: Expose etcd /metrics end point to everywhere [puppet] - 10https://gerrit.wikimedia.org/r/303759 [05:01:35] krenair https://puppet-compiler.wmflabs.org/3633/ [05:02:13] krenair lgtm [05:02:25] I'm splitting it in two instead [05:02:54] the labs role puppetmaster one - is it possible to move that to hiera? [05:02:58] given the ${horizon_host} [05:05:23] yuvipanda, [05:06:12] splitting it in two as in, one for the prod and one for labs? [05:06:22] krenair yeah, I think you can refer to hiera from hiera [05:06:40] split the changes in two [05:07:55] (03PS2) 10Yuvipanda: tools: Expose etcd /metrics end point to everywhere [puppet] - 10https://gerrit.wikimedia.org/r/303759 [05:07:55] right [05:08:52] krenair can you verify there's no usage of is_labs_master in teh templates that are called? [05:09:01] I guess that'd have shown up in the puppet compiler [05:09:11] git grep? [05:09:17] (03PS2) 10Alex Monk: puppetmaster: Split extra_auth_rules from is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303757 [05:09:19] (03PS1) 10Alex Monk: puppetmaster: Kill is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303761 [05:10:29] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: Split extra_auth_rules from is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303757 (owner: 10Alex Monk) [05:11:15] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: Kill is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303761 (owner: 10Alex Monk) [05:12:43] (03PS3) 10Alex Monk: puppetmaster: Split extra_auth_rules from is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303757 [05:13:01] (03PS2) 10Alex Monk: puppetmaster: Kill is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303761 [05:13:45] yuvipanda, do you know how to log into phabricator as the @admin account? [05:13:52] nope [05:14:28] the password is in one of ops' password stores somewhere apparently [05:14:37] Please use it to disable https://phabricator.wikimedia.org/p/Nikita13311331/ [05:18:29] (03PS1) 10Yuvipanda: tools: Scrape etcd metrics too [puppet] - 10https://gerrit.wikimedia.org/r/303762 [05:19:18] krenair I'm looking at pwstore and I can't find anything phabricatory [05:19:40] (03CR) 10jenkins-bot: [V: 04-1] tools: Scrape etcd metrics too [puppet] - 10https://gerrit.wikimedia.org/r/303762 (owner: 10Yuvipanda) [05:20:31] (03PS2) 10Yuvipanda: tools: Scrape etcd metrics too [puppet] - 10https://gerrit.wikimedia.org/r/303762 [05:24:07] Brilliant job Phabricator... Double huge log entries [05:26:17] (03PS3) 10Yuvipanda: tools: Scrape etcd metrics too [puppet] - 10https://gerrit.wikimedia.org/r/303762 [05:26:25] TimStarling, around? are you able to disable https://phabricator.wikimedia.org/p/Nikita13311331/ ? [05:28:20] done [05:30:34] thanks [05:32:48] <_joe_> !log removing stale nodes from puppet [05:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [05:33:00] <_joe_> Krenair: I'll take a look at the puppetmaster patches [05:39:00] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I love the patch in general, but I'd remove the rhodium special case and I'd frankly avoid using hiera for such things. Apart from that an" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303757 (owner: 10Alex Monk) [05:40:01] (03CR) 10Giuseppe Lavagetto: [C: 031] puppetmaster: Split secure_private from is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303758 (owner: 10Alex Monk) [05:44:45] <_joe_> bbiab [05:49:51] (03PS1) 10Yuvipanda: prometheus: Don't require scrape config files to start with node_ [puppet] - 10https://gerrit.wikimedia.org/r/303765 [05:50:54] (03PS2) 10Gergő Tisza: Apply mobile cookie domain fix to beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303744 (https://phabricator.wikimedia.org/T49647) [06:01:47] PROBLEM - puppet last run on mw2177 is CRITICAL: CRITICAL: puppet fail [06:12:38] What is this error in deploy in scb? http://pastebin.com/qbRDC80d [06:13:12] !log Updated cxserver to d3c7d64 (T142340) [06:13:13] T142340: in the recently-created Tulu Wikipedia, the Tulu language doesn't show up in the Content Translation dashboard article selector - https://phabricator.wikimedia.org/T142340 [06:13:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:29:58] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM, but see a small comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303761 (owner: 10Alex Monk) [06:31:02] RECOVERY - puppet last run on mw2177 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:45:36] (03CR) 10Muehlenhoff: [C: 032 V: 032] druid: Limit to analytics networks [puppet] - 10https://gerrit.wikimedia.org/r/303144 (owner: 10Muehlenhoff) [07:03:24] (03PS3) 10Gergő Tisza: Apply mobile cookie domain fix to beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303744 (https://phabricator.wikimedia.org/T49647) [07:18:02] 06Operations, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests, 05MW-1.28-release-notes, and 4 others: Create Wikipedia Tulu - https://phabricator.wikimedia.org/T140898#2535906 (10Arrbee) [07:25:33] !log installing curl security updates on Ubuntu systems [07:25:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:29:27] (03PS5) 10KartikMistry: apertium-nob: New upstream release [debs/contenttranslation/apertium-nob] - 10https://gerrit.wikimedia.org/r/269914 (https://phabricator.wikimedia.org/T124317) [07:29:42] (03CR) 10jenkins-bot: [V: 04-1] apertium-nob: New upstream release [debs/contenttranslation/apertium-nob] - 10https://gerrit.wikimedia.org/r/269914 (https://phabricator.wikimedia.org/T124317) (owner: 10KartikMistry) [07:33:04] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [07:34:45] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [07:37:36] PROBLEM - parsoid on wtp2002 is CRITICAL: Connection refused [07:37:55] PROBLEM - puppet last run on wtp2002 is CRITICAL: CRITICAL: puppet fail [07:42:31] !log installing php5 security updates on remaining four precise systems [07:42:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:43:25] RECOVERY - parsoid on wtp2002 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.091 second response time [07:43:45] RECOVERY - puppet last run on wtp2002 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [07:45:16] (03PS6) 10Volans: Monitoring: Add NRPE commands to get RAID status [puppet] - 10https://gerrit.wikimedia.org/r/303147 (https://phabricator.wikimedia.org/T142085) [08:05:50] !log rolling restart of hhvm in codfw to pick up curl security updates [08:05:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:21:24] PROBLEM - Host wtp2002 is DOWN: PING CRITICAL - Packet loss = 100% [08:23:45] RECOVERY - Host wtp2002 is UP: PING OK - Packet loss = 0%, RTA = 36.15 ms [08:27:45] PROBLEM - configured eth on wtp2002 is CRITICAL: Connection refused by host [08:28:05] PROBLEM - dhclient process on wtp2002 is CRITICAL: Connection refused by host [08:28:06] PROBLEM - Check size of conntrack table on wtp2002 is CRITICAL: Connection refused by host [08:28:15] PROBLEM - parsoid on wtp2002 is CRITICAL: Connection refused [08:28:25] PROBLEM - DPKG on wtp2002 is CRITICAL: Connection refused by host [08:28:34] PROBLEM - Disk space on wtp2002 is CRITICAL: Connection refused by host [08:28:44] PROBLEM - puppet last run on wtp2002 is CRITICAL: Connection refused by host [08:28:55] PROBLEM - salt-minion processes on wtp2002 is CRITICAL: Connection refused by host [08:29:04] PROBLEM - MD RAID on wtp2002 is CRITICAL: Connection refused by host [08:29:31] there is something going on with icinga [08:30:24] some service going to soft issues that never should [08:30:53] is anything going on with network? [08:35:44] (03CR) 10Alexandros Kosiaris: [C: 04-1] "Inline comments" (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303559 (https://phabricator.wikimedia.org/T142393) (owner: 10Gehel) [08:38:40] 06Operations, 10DBA: dbstore2002 stopped providing mysql service despite the process being running - https://phabricator.wikimedia.org/T142273#2536017 (10jcrespo) 05Open>03Resolved Disk replacement is handled on a separate task; no more to do here. [08:44:04] PROBLEM - Host wtp2002 is DOWN: PING CRITICAL - Packet loss = 100% [08:45:14] ignore wtp2002 please [08:49:15] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [08:51:05] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [09:00:36] godog: it seems you know a bit about graphite / diamond. logstash1004 reports iostat metrics for md0 but not for md1 [09:06:58] he is on holiday :) [09:09:44] PROBLEM - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is CRITICAL: CRITICAL - Expecting active but unit nfs-exports is failed [09:20:38] !log rolling restart of hhvm in on eqiad canary trusty app servers to pick up curl security updates [09:20:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:25:21] (03PS1) 10Elukey: Add fake aqs Cassandra user's password [labs/private] - 10https://gerrit.wikimedia.org/r/303772 (https://phabricator.wikimedia.org/T142073) [09:26:00] (03CR) 10Elukey: [C: 032 V: 032] Add fake aqs Cassandra user's password [labs/private] - 10https://gerrit.wikimedia.org/r/303772 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [09:32:46] (03PS1) 10Elukey: Add the configuration needed to prepare a new AQS Cassandra user creation [puppet] - 10https://gerrit.wikimedia.org/r/303774 (https://phabricator.wikimedia.org/T142073) [09:38:04] (03PS3) 10Muehlenhoff: pybal_config: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297601 [09:39:56] (03CR) 10Muehlenhoff: [C: 032] pybal_config: Use PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/297601 (owner: 10Muehlenhoff) [09:47:21] (03CR) 10ArielGlenn: "From convo on IRC: Brion doesn't remember any OMG MUST BE 60 MINUTES reason for this setting's current value." [puppet] - 10https://gerrit.wikimedia.org/r/302729 (owner: 10ArielGlenn) [09:47:29] (03PS2) 10ArielGlenn: make dump run locks stale and therefore removable after 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/302729 [09:49:18] (03CR) 10ArielGlenn: [C: 032] make dump run locks stale and therefore removable after 15 minutes [puppet] - 10https://gerrit.wikimedia.org/r/302729 (owner: 10ArielGlenn) [09:55:00] (03PS2) 10Elukey: Add the configuration needed to prepare a new AQS Cassandra user creation [puppet] - 10https://gerrit.wikimedia.org/r/303774 (https://phabricator.wikimedia.org/T142073) [09:57:24] (03PS2) 10ArielGlenn: fix up pgrep error check for rsyncs between dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/302910 [09:58:48] (03CR) 10ArielGlenn: [C: 032] fix up pgrep error check for rsyncs between dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/302910 (owner: 10ArielGlenn) [09:59:14] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/3635/ looks good" [puppet] - 10https://gerrit.wikimedia.org/r/303774 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [09:59:21] (03PS3) 10Elukey: Add the configuration needed to prepare a new AQS Cassandra user creation [puppet] - 10https://gerrit.wikimedia.org/r/303774 (https://phabricator.wikimedia.org/T142073) [10:30:16] (03CR) 10Mobrovac: "I was thinking about it, but IMHO it would create too much confusion, because the service name ($title) is used throughout service::node. " [puppet] - 10https://gerrit.wikimedia.org/r/303599 (https://phabricator.wikimedia.org/T141464) (owner: 10Mobrovac) [10:31:09] (03PS1) 10Elukey: Include the password::aqs namespace in the AQS role [puppet] - 10https://gerrit.wikimedia.org/r/303783 (https://phabricator.wikimedia.org/T142073) [10:34:20] (03CR) 10Elukey: [C: 032] Include the password::aqs namespace in the AQS role [puppet] - 10https://gerrit.wikimedia.org/r/303783 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [10:41:51] (03CR) 10Gehel: [C: 031] "LGTM! And this change made me learn a few things about systemd. It would be good if someone who actually knows systemd / upstart could hav" [puppet] - 10https://gerrit.wikimedia.org/r/303626 (https://phabricator.wikimedia.org/T116754) (owner: 10Smalyshev) [10:49:04] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [10:52:55] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [10:53:25] PROBLEM - puppet last run on ms-be2010 is CRITICAL: CRITICAL: puppet fail [11:13:14] (03PS1) 10Elukey: Move the include of cassandra/aqs passwords up to solve a priority issue [puppet] - 10https://gerrit.wikimedia.org/r/303786 (https://phabricator.wikimedia.org/T142073) [11:16:12] (03CR) 10Elukey: [C: 032] "https://puppet-compiler.wmflabs.org/3638/" [puppet] - 10https://gerrit.wikimedia.org/r/303786 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [11:16:46] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 80.00% of data above the critical threshold [50.0] [11:18:45] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [11:19:14] RECOVERY - puppet last run on ms-be2010 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [11:20:18] 06Operations: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2536183 (10akosiaris) @Cmjohnson . Excellent! One left. I 've repooled ganeti1001 already and emptied ganeti1003. ganeti1003 is downtimed and powered off and awaiting SSDs. [11:24:23] (03PS1) 10Alexandros Kosiaris: kafka::server::monitoring: include standard [puppet/kafka] - 10https://gerrit.wikimedia.org/r/303787 [11:24:35] (03CR) 10Alexandros Kosiaris: [C: 032] kafka::server::monitoring: include standard [puppet/kafka] - 10https://gerrit.wikimedia.org/r/303787 (owner: 10Alexandros Kosiaris) [11:26:24] (03PS1) 10Alexandros Kosiaris: Update kafka submodule to fix puppetmaster warning [puppet] - 10https://gerrit.wikimedia.org/r/303789 [11:26:59] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Update kafka submodule to fix puppetmaster warning [puppet] - 10https://gerrit.wikimedia.org/r/303789 (owner: 10Alexandros Kosiaris) [11:29:50] (03PS1) 10Mobrovac: Varnish: Do not unset Accept: text/html for RESTBase reqs [puppet] - 10https://gerrit.wikimedia.org/r/303790 [11:38:03] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 10Research-collaborations: Analytics cluster access request for ISI Foundation team - https://phabricator.wikimedia.org/T141634#2505524 (10MoritzMuehlenhoff) > the c-level approved NDA has a duration of 6 months (if not extended). When the project i... [11:39:55] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-fra: Initial Debian packaging [debs/contenttranslation/apertium-fra] - 10https://gerrit.wikimedia.org/r/294252 (https://phabricator.wikimedia.org/T137768) (owner: 10KartikMistry) [11:50:27] 06Operations, 06Services, 06Services-next, 05Security: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#2536218 (10faidon) Is there a timeline for finishing the MediaWiki-integration work and replacing OCG? Should we align such a timeline... [11:56:24] !log installing fontconfig security updates on jessie systems [11:56:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:01:57] (03PS2) 10Giuseppe Lavagetto: service::node: Allow users to specify the logging name [puppet] - 10https://gerrit.wikimedia.org/r/303599 (https://phabricator.wikimedia.org/T141464) (owner: 10Mobrovac) [12:05:56] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-fra_1.0.0~r65786-1+wmf1 [12:05:58] T107306: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306 [12:06:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:10:50] !log rolling restart of hhvm on remaining eqiad mw servers to pick up curl security updates [12:10:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:19:08] (03PS1) 10Elukey: Change the AQS restbase user from 'cassandra' to 'aqs' [puppet] - 10https://gerrit.wikimedia.org/r/303792 (https://phabricator.wikimedia.org/T142073) [12:19:10] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-fra-cat] - 10https://gerrit.wikimedia.org/r/294425 (https://phabricator.wikimedia.org/T137768) (owner: 10KartikMistry) [12:21:35] akosiaris: https://gerrit.wikimedia.org/r/294425 is good now. [12:22:00] akosiaris: for apertium-fra, I somehow missed to push entire debian/ directory :) [12:23:35] kart_: hence the debian/control file missing ;-) [12:23:53] kart_: ypi ' [12:24:05] kart_: you 've missed pushing a lot of tags btw [12:24:15] disregard the "ypi'" thing. typo.. [12:25:44] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-fra-cat: New upstream release, Rebuilt for Jessie [debs/contenttranslation/apertium-fra-cat] - 10https://gerrit.wikimedia.org/r/294425 (https://phabricator.wikimedia.org/T137768) (owner: 10KartikMistry) [12:25:50] (03PS2) 10Elukey: Change the AQS restbase user from 'cassandra' to 'aqs' [puppet] - 10https://gerrit.wikimedia.org/r/303792 (https://phabricator.wikimedia.org/T142073) [12:26:44] !log depooling image scalers mw2086-mw2089 for reimaging with jessie [12:26:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:34:34] (03CR) 10Elukey: [C: 032] "pcc looks good: https://puppet-compiler.wmflabs.org/3640/" [puppet] - 10https://gerrit.wikimedia.org/r/303792 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [12:40:11] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 15 failures [12:45:11] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 15 failures [12:50:11] PROBLEM - check_puppetrun on bismuth is CRITICAL: CRITICAL: Puppet has 15 failures [12:55:11] RECOVERY - check_puppetrun on bismuth is OK: OK: Puppet is currently enabled, last run 139 seconds ago with 0 failures [12:57:37] (03PS3) 10ArielGlenn: If a prereq job is missing, optionally run it instead of giving up [dumps] - 10https://gerrit.wikimedia.org/r/302706 (https://phabricator.wikimedia.org/T141981) [13:07:56] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-fra-cat_1.1.0~r64309-1+wmf1 [13:07:57] T107306: Package and test apertium for Jessie - https://phabricator.wikimedia.org/T107306 [13:08:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:12:21] PROBLEM - puppet last run on mw2190 is CRITICAL: CRITICAL: Puppet has 1 failures [13:12:45] (03PS1) 10Volans: Admin: add my home files (volans) [puppet] - 10https://gerrit.wikimedia.org/r/303797 [13:13:21] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-isl] - 10https://gerrit.wikimedia.org/r/296050 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [13:14:23] (03PS4) 10ArielGlenn: If a prereq job is missing, optionally run it instead of giving up [dumps] - 10https://gerrit.wikimedia.org/r/302706 (https://phabricator.wikimedia.org/T141981) [13:15:57] (03PS2) 10Volans: Admin: add my home files (volans) [puppet] - 10https://gerrit.wikimedia.org/r/303797 [13:19:51] (03PS1) 10Elukey: Switch the AQS restbase use from 'cassandra' to aqs [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) [13:22:23] (03PS2) 10Elukey: Switch the AQS restbase use from 'cassandra' to 'aqs' [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) [13:22:45] (03PS2) 10Mobrovac: Varnish: Do not unset Accept: text/html for RESTBase reqs [puppet] - 10https://gerrit.wikimedia.org/r/303790 [13:23:06] (03CR) 10Volans: [C: 032] Admin: add my home files (volans) [puppet] - 10https://gerrit.wikimedia.org/r/303797 (owner: 10Volans) [13:25:18] akosiaris: https://integration.wikimedia.org/ci/job/debian-glue/527/console - why hfst not in jessie-wikimedia? [13:25:25] (03CR) 10Elukey: "PCC looks good: https://puppet-compiler.wmflabs.org/3641/" [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [13:26:10] akosiaris: for missing tags, looking at failures one by one. [13:26:16] what do you mean. ? hsft is in jessie-wikimedia. hfst 3.10.0~r2798-1+wmf1 [13:26:51] as well as hfst-ospell 0.4.0~r4643-5+wmf1, libhfst45 3.10.0~r2798-1+wmf1 and libhfstospell5 0.4.0~r4643-5+wmf1 [13:27:43] (03CR) 10BBlack: [C: 031] Varnish: Do not unset Accept: text/html for RESTBase reqs [puppet] - 10https://gerrit.wikimedia.org/r/303790 (owner: 10Mobrovac) [13:28:10] RECOVERY - NTP on ganeti1002 is OK: NTP OK: Offset -0.003484010696 secs [13:28:11] akosiaris: apertium-isl failures seems weird. [13:29:44] 06Operations: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2536464 (10Cmjohnson) [13:29:51] /{domain}/v1/page/mobile-sections-lead/{title} (retrieve lead section of en.wp Altrincham page via mobile-sections-lead) is CRITICAL: Test retrieve lead section of en.wp Altrincham page via mobile-sections-lead returned the unexpected status 503 (expecting: 200): /{domain}/v1/page/random/title (retrieve a random article) is CRITICAL: Test retrieve a random article returned the unexpected sta [13:29:57] tus 503 (expecting: 200) [13:29:58] mobrovac: ^ :) [13:30:00] morebots: ^ [13:30:00] I am a logbot running on tools-exec-1213. [13:30:01] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [13:30:01] To log a message, type !log . [13:30:03] er [13:30:14] kart_: ah you are looking at the piuparts part, yeah that ones fails but does not vote [13:30:20] it's lintian that fails [13:30:33] kart_: at N: The arch all pkg-config file contains a reference to a multi-arch path. [13:30:43] message='E: apertium-isl: pkg-config-multi-arch-wrong-dir usr/share/pkgconfig/apertium-isl.pc full text contains architecture specific dir x86_64-linux-gnu' [13:30:43] 06Operations: eqiad: Install SSD's into ganeti hosts - https://phabricator.wikimedia.org/T138414#2399490 (10Cmjohnson) 05Open>03Resolved @akosiaris ganeti1003 disks have been replaced. resolving the task [13:30:46] Okay. Checking again. That need to fix in upstream. [13:32:07] paravoid: caused by a 503 from the MW API [13:32:17] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-hbs-eng] - 10https://gerrit.wikimedia.org/r/296049 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [13:32:28] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-hbs-slv] - 10https://gerrit.wikimedia.org/r/296203 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [13:32:33] paravoid: all good now on scb100x, though, so transient on the api servers side? [13:32:59] could be [13:33:16] !log akosiaris@palladium conftool action : set/pooled=yes; selector: wtp2001.codfw.wmnet (tags: ['dc=codfw', 'cluster=parsoid', 'service=parsoid']) [13:33:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:33:28] akosiaris: \o/ [13:34:07] !log all of wtp2001-wtp2020 except wtp2002 have been pooled. T135176 [13:34:09] T135176: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176 [13:34:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:34:25] wtp2002 eludes me still ... erratic behavior [13:35:04] 06Operations, 10ops-eqiad, 10fundraising-tech-ops: Rack and setup Fundraising DB - https://phabricator.wikimedia.org/T136200#2536477 (10Cmjohnson) 05Open>03Resolved The server has been racked and cabled. [13:37:04] * Danny_B should obviously read more slowly to prevent stuff like "crack and setup fundraising db ... the server has been cracked and cabled"... ;-) [13:39:05] (03CR) 10Ottomata: "Ok! FYI, we will be removing this submodule from ops/puppet once the upgrade of main-eqiad kafka cluster is done. Hopefully next week." [puppet/kafka] - 10https://gerrit.wikimedia.org/r/303787 (owner: 10Alexandros Kosiaris) [13:39:11] RECOVERY - puppet last run on mw2190 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [13:39:59] (03CR) 10Alexandros Kosiaris: "sounds nice!" [puppet/kafka] - 10https://gerrit.wikimedia.org/r/303787 (owner: 10Alexandros Kosiaris) [13:40:11] PROBLEM - check_puppetrun on boron is CRITICAL: CRITICAL: Puppet has 1 failures [13:41:01] (03PS1) 10Giuseppe Lavagetto: postgresql: support SSL connections/replication [puppet] - 10https://gerrit.wikimedia.org/r/303800 [13:41:03] (03PS1) 10Giuseppe Lavagetto: [WiP] puppetmaster: add role for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/303801 [13:42:19] mutante: I've a comment on my meta. talk page about your recent additions to wikistats: [13:42:24] "https://wikistats.wmflabs.org/display.php?t=wp&s=images_asc - This conflicts with the listings at https://meta.wikimedia.org/wiki/List_of_Wikipedias/Table and many other places. Test wikis are not Wikipedias, they are not encyclopedias." [13:42:36] (03CR) 10jenkins-bot: [V: 04-1] [WiP] puppetmaster: add role for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/303801 (owner: 10Giuseppe Lavagetto) [13:43:02] s/meta/mw [13:43:10] mutante: you're welcome on my talk page if you wish to reply: https://www.mediawiki.org/wiki/Topic:T9bmi51puyfnslkc [13:44:07] mutante: this person reports regularly stats problems like "-1 files" on wiki with no upload [13:45:11] RECOVERY - check_puppetrun on boron is OK: OK: Puppet is currently enabled, last run 63 seconds ago with 0 failures [13:45:31] RECOVERY - Host wtp2002 is UP: PING OK - Packet loss = 0%, RTA = 38.52 ms [13:46:24] (03CR) 10Giuseppe Lavagetto: [C: 032] service::node: Allow users to specify the logging name [puppet] - 10https://gerrit.wikimedia.org/r/303599 (https://phabricator.wikimedia.org/T141464) (owner: 10Mobrovac) [13:46:29] (03PS3) 10Giuseppe Lavagetto: service::node: Allow users to specify the logging name [puppet] - 10https://gerrit.wikimedia.org/r/303599 (https://phabricator.wikimedia.org/T141464) (owner: 10Mobrovac) [13:49:56] <_joe_> mobrovac: merging [13:50:02] kk [13:55:31] 06Operations, 10ops-eqiad: Megaraid controller reset due to (what seemsa) a faulty disk on analytics1045 - https://phabricator.wikimedia.org/T141761#2536501 (10Cmjohnson) @elukey, it could be a glitch...are you seeing any more errors? [13:56:41] PROBLEM - puppet last run on ms-be2013 is CRITICAL: CRITICAL: Puppet has 1 failures [13:56:42] PROBLEM - puppet last run on es2016 is CRITICAL: CRITICAL: puppet fail [13:57:02] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [13:57:40] 06Operations, 10ops-eqiad: Megaraid controller reset due to (what seemsa) a faulty disk on analytics1045 - https://phabricator.wikimedia.org/T141761#2536502 (10Cmjohnson) All the disks seemed to be online cmjohnson@analytics1045:~$ sudo megacli -PDList -aALL |grep "Firmware state:" Firmware state: Online, Spu... [13:57:42] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [13:58:18] hrm [13:58:19] (03PS2) 10BBlack: VCL: emit X-Cache-Status response header [puppet] - 10https://gerrit.wikimedia.org/r/303578 (https://phabricator.wikimedia.org/T142410) [13:58:21] something exploded [13:58:33] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [1000.0] [13:58:37] (03CR) 10BBlack: [C: 032 V: 032] VCL: emit X-Cache-Status response header [puppet] - 10https://gerrit.wikimedia.org/r/303578 (https://phabricator.wikimedia.org/T142410) (owner: 10BBlack) [13:59:52] PROBLEM - puppet last run on wtp2004 is CRITICAL: CRITICAL: puppet fail [14:00:01] !log setting cr2-eqiad:xe-5/2/3 (link to cr2-codfw) to disable; flapping [14:00:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:00:52] PROBLEM - puppet last run on mw2228 is CRITICAL: CRITICAL: Puppet has 3 failures [14:01:13] PROBLEM - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [2000.0] [14:02:02] PROBLEM - puppet last run on ms-be2026 is CRITICAL: CRITICAL: Puppet has 1 failures [14:02:02] PROBLEM - puppet last run on mw2095 is CRITICAL: CRITICAL: Puppet has 11 failures [14:02:11] 06Operations, 10ops-eqiad, 10hardware-requests, 13Patch-For-Review: reclaim or decom: cp1043 + cp1044 - https://phabricator.wikimedia.org/T133614#2536509 (10Cmjohnson) [14:02:22] PROBLEM - puppet last run on elastic2024 is CRITICAL: CRITICAL: Puppet has 1 failures [14:02:23] PROBLEM - puppet last run on es2018 is CRITICAL: CRITICAL: Puppet has 1 failures [14:02:32] PROBLEM - puppet last run on mw2077 is CRITICAL: CRITICAL: Puppet has 1 failures [14:02:40] codfw breakage is related to ^^^ [14:02:52] yeah 503 spike on both codfw and ulsfo caches, but not eqiad/esams [14:02:58] 06Operations, 10ops-eqiad, 10hardware-requests, 13Patch-For-Review: reclaim or decom: cp1043 + cp1044 - https://phabricator.wikimedia.org/T133614#2237371 (10Cmjohnson) p:05Normal>03Lowest [14:04:02] PROBLEM - puppet last run on ms-be2021 is CRITICAL: CRITICAL: Puppet has 1 failures [14:04:02] PROBLEM - puppet last run on ms-be2011 is CRITICAL: CRITICAL: Puppet has 1 failures [14:04:17] (03CR) 10Giuseppe Lavagetto: [C: 031] Support scaling of huge SVGs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303548 (https://phabricator.wikimedia.org/T111815) (owner: 10Muehlenhoff) [14:04:51] PROBLEM - puppet last run on mc2013 is CRITICAL: CRITICAL: Puppet has 1 failures [14:04:51] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:05:12] RECOVERY - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is OK: OK: Less than 20.00% above the threshold [1200.0] [14:06:14] (03PS1) 10MarcoAurelio: Remove 'gather-hidelist' from CommonSettings.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303803 [14:06:18] (03PS1) 10Muehlenhoff: Assign debdeploy salt grain for labs::db::replica role [puppet] - 10https://gerrit.wikimedia.org/r/303804 [14:07:08] 06Operations, 06Services, 07Parsoid-Tests, 15User-mobrovac: Use a different logging & metrics tag (name property) for Parsoid testing on ruthenium - https://phabricator.wikimedia.org/T141464#2536520 (10mobrovac) 05Open>03Resolved Ruthenium is now emitting logs as `parsoid-tests`. Resolving. [14:07:21] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:07:31] (03CR) 10Muehlenhoff: [C: 032] Assign debdeploy salt grain for labs::db::replica role [puppet] - 10https://gerrit.wikimedia.org/r/303804 (owner: 10Muehlenhoff) [14:10:43] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [1000.0] [14:11:01] RECOVERY - Disk space on wtp2002 is OK: DISK OK [14:11:01] RECOVERY - salt-minion processes on wtp2002 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:11:02] RECOVERY - DPKG on wtp2002 is OK: All packages OK [14:11:06] (03PS3) 10Elukey: Switch the AQS restbase use from 'cassandra' to 'aqs' [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) [14:11:11] RECOVERY - Check size of conntrack table on wtp2002 is OK: OK: nf_conntrack is 0 % full [14:11:11] PROBLEM - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [2000.0] [14:11:41] RECOVERY - MD RAID on wtp2002 is OK: OK: Active: 4, Working: 4, Failed: 0, Spare: 0 [14:11:52] RECOVERY - configured eth on wtp2002 is OK: OK - interfaces up [14:12:02] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:12:12] RECOVERY - dhclient process on wtp2002 is OK: PROCS OK: 0 processes with command name dhclient [14:12:42] PROBLEM - puppet last run on wtp2002 is CRITICAL: CRITICAL: Puppet has 1 failures [14:12:47] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [14:13:21] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [14:14:42] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:15:11] RECOVERY - CirrusSearch codfw 95th percentile latency - more_like on graphite1001 is OK: OK: Less than 20.00% above the threshold [1200.0] [14:15:21] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [14:15:43] paravoid: I'd appreciate a review of https://gerrit.wikimedia.org/r/#/c/303617/ (it should look very familiar) [14:15:50] 06Operations, 10ops-eqiad: Megaraid controller reset due to (what seemsa) a faulty disk on analytics1045 - https://phabricator.wikimedia.org/T141761#2511340 (10Volans) @Cmjohnson, @elukey FYI looks like since the grep that @elukey has done there is some additional "other error count" that appeared. The non-op... [14:16:30] 06Operations, 10Traffic: Support TLS chacha20-poly1305 AEAD ciphers - https://phabricator.wikimedia.org/T131908#2536531 (10BBlack) I didn't think to link into this bug in the commitmsg (oops!) but I patched our chapoly logic further in: https://gerrit.wikimedia.org/r/#/c/303700/ . This was based on analyzing... [14:16:32] > Given that this is a new script I'm taking the opportunity to test it in real life, hope you don't mind ;) - hello volans! o/ [14:16:47] please keep going, thanks a lot :) [14:17:23] thanks elukey, any feedback is appreciated [14:17:43] (03CR) 10Alexandros Kosiaris: [C: 04-1] postgresql: support SSL connections/replication (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303800 (owner: 10Giuseppe Lavagetto) [14:18:11] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [14:20:41] RECOVERY - puppet last run on wtp2002 is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [14:21:22] (03CR) 10Mobrovac: [C: 04-1] beta: Use Let's Encrypt cert (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [14:22:21] RECOVERY - parsoid on wtp2002 is OK: HTTP OK: HTTP/1.1 200 OK - 1514 bytes in 0.078 second response time [14:22:42] RECOVERY - puppet last run on ms-be2013 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [14:22:52] RECOVERY - puppet last run on es2016 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [14:25:52] RECOVERY - puppet last run on ms-be2026 is OK: OK: Puppet is currently enabled, last run 10 seconds ago with 0 failures [14:25:59] akosiaris: apertium-kaz has missing pristine-tar branch and due to long file name I can't create it. Let me check how I did it in Debian. [14:26:21] akosiaris: I did manual creation, but it seems messing up branch. Not a good way too. [14:26:36] (03PS3) 10Mobrovac: Varnish: Do not unset Accept: text/html for RESTBase reqs [puppet] - 10https://gerrit.wikimedia.org/r/303790 [14:27:38] (03CR) 10Faidon Liambotis: [C: 04-1] Create root passwords for labs instances and store passwords on the puppetmaster (037 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303617 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [14:27:42] RECOVERY - puppet last run on wtp2004 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [14:27:51] RECOVERY - puppet last run on ms-be2011 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [14:27:52] RECOVERY - puppet last run on mw2095 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [14:27:56] andrewbogott: done! [14:28:02] RECOVERY - puppet last run on elastic2024 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [14:28:11] RECOVERY - puppet last run on es2018 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [14:28:22] RECOVERY - puppet last run on mw2077 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:28:41] RECOVERY - puppet last run on mw2228 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:29:51] RECOVERY - puppet last run on ms-be2021 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:30:42] RECOVERY - puppet last run on mc2013 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [14:33:12] (03PS1) 10Ladsgroup: ores: uwsgi workers restart after 200 requests [puppet] - 10https://gerrit.wikimedia.org/r/303807 [14:34:40] (03CR) 10Alexandros Kosiaris: [C: 04-1] [WiP] puppetmaster: add role for puppetdb (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303801 (owner: 10Giuseppe Lavagetto) [14:34:42] RECOVERY - Ensure NFS exports are maintained for new instances with NFS on labstore1001 is OK: OK - nfs-exports is active [14:35:21] hello Ops, does anybody here know how to turn off emails from Tool Labs? A cron job is sending me one every hour. [14:36:13] !log akosiaris@palladium conftool action : set/pooled=yes; selector: wtp2002.codfw.wmnet (tags: ['dc=codfw', 'cluster=parsoid', 'service=parsoid']) [14:36:14] jan_drewniak: you should ask in #wikimedia-labs, but generally you can append >/dev/null 2>/dev/null to cronjobs to silence them [14:36:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:36:45] paravoid: thanks! [14:37:07] (03PS2) 10Giuseppe Lavagetto: postgresql: support SSL connections/replication [puppet] - 10https://gerrit.wikimedia.org/r/303800 [14:37:42] !log T135176 pool wtp2002 [14:37:44] T135176: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176 [14:37:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:38:03] (03CR) 10Giuseppe Lavagetto: postgresql: support SSL connections/replication (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303800 (owner: 10Giuseppe Lavagetto) [14:39:21] 06Operations, 06Services: Move all Node.JS services to Jessie and Node 4 - https://phabricator.wikimedia.org/T124989#2536578 (10akosiaris) [14:39:24] 06Operations, 10Parsoid, 06Services, 15User-mobrovac: Migrate Parsoid cluster to Jessie / node 4.x - https://phabricator.wikimedia.org/T135176#2536575 (10akosiaris) 05Open>03Resolved a:03akosiaris Finally all hosts have been reimaged and been pooled. Resolving [14:40:01] (03CR) 10Mobrovac: [C: 031] "Tested in beta, works as advertised" [puppet] - 10https://gerrit.wikimedia.org/r/303790 (owner: 10Mobrovac) [14:40:10] (03CR) 10Alexandros Kosiaris: "ok, but why ? An explanation in the commit log or a link to a task would be nice." [puppet] - 10https://gerrit.wikimedia.org/r/303807 (owner: 10Ladsgroup) [14:40:49] (03CR) 10Alexandros Kosiaris: [C: 031] postgresql: support SSL connections/replication [puppet] - 10https://gerrit.wikimedia.org/r/303800 (owner: 10Giuseppe Lavagetto) [14:42:11] akosiaris, hi ... [14:42:32] subbu: ah, so I 've been looking into deploying to debug what you met yesterday [14:42:36] (03CR) 10Ladsgroup: "We are suspecting there is a memory leak in uwsgi and we want to be sure. If it wasn't the case, we'll revert this later." [puppet] - 10https://gerrit.wikimedia.org/r/303807 (owner: 10Ladsgroup) [14:42:38] but repo on tin is not clean [14:42:38] (03CR) 10BBlack: [C: 032] Varnish: Do not unset Accept: text/html for RESTBase reqs [puppet] - 10https://gerrit.wikimedia.org/r/303790 (owner: 10Mobrovac) [14:42:53] akosiaris, not clean? i did a git deploy abort [14:43:08] --- a/src [14:43:08] +++ b/src [14:43:08] @@ -1 +1 @@ [14:43:08] -Subproject commit abf396ebff800f971ff9fcebc6cbbfeb6f0dbcc3 [14:43:08] +Subproject commit a577d80e75f5b1b330b820d1ffdbc064ecc38779 [14:43:21] ah .. i guess abort doesn't reset submodules. [14:43:25] let me fix that. [14:43:25] probably [14:43:50] ok, so I 'll deploy then an already deployed sha1 so we don't cause any unwanted consequences [14:44:27] yup. reset src. clean now. [14:44:34] cool thanks [14:45:15] 06Operations, 10ops-eqiad: Megaraid controller reset due to (what seemsa) a faulty disk on analytics1045 - https://phabricator.wikimedia.org/T141761#2536587 (10elukey) @Cmjohnson I still see Media errors though.. would it be good to think about a proactive disk replacement? Otherwise we can keep using it and s... [14:47:21] 06Operations, 07Tracking: reduce amount of remaining Ubuntu 12.04 (precise) systems - https://phabricator.wikimedia.org/T123525#1948569 (10Platonides) More likely, it uses some library function that performs such queries internally. [14:48:02] (03PS5) 10ArielGlenn: If a prereq job is missing, optionally run it instead of giving up [dumps] - 10https://gerrit.wikimedia.org/r/302706 (https://phabricator.wikimedia.org/T141981) [14:55:42] (03CR) 10Ottomata: [C: 031] Switch the AQS restbase use from 'cassandra' to 'aqs' [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [14:55:52] (03CR) 10Alexandros Kosiaris: [C: 032] "given the X different environments there are, we should be running this against one of those and not production but let's proceed until ht" [puppet] - 10https://gerrit.wikimedia.org/r/303807 (owner: 10Ladsgroup) [14:55:57] (03PS2) 10Alexandros Kosiaris: ores: uwsgi workers restart after 200 requests [puppet] - 10https://gerrit.wikimedia.org/r/303807 (owner: 10Ladsgroup) [14:56:37] (03CR) 10Alexandros Kosiaris: [V: 032] ores: uwsgi workers restart after 200 requests [puppet] - 10https://gerrit.wikimedia.org/r/303807 (owner: 10Ladsgroup) [15:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160809T1500). [15:00:04] dcausse: A patch you scheduled for Morning SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [15:00:27] o/ [15:01:04] I can SWAT today [15:01:33] (03CR) 10Rush: [C: 032 V: 032] labs dnsrecursor metaldns: Don't return NXDOMAIN when we don't have a record of the right type but do recognise the domain [puppet] - 10https://gerrit.wikimedia.org/r/299903 (https://phabricator.wikimedia.org/T139438) (owner: 10Alex Monk) [15:01:40] (03PS17) 10Alex Monk: beta: Use Let's Encrypt cert [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [15:01:42] (03PS5) 10Rush: labs dnsrecursor metaldns: Don't return NXDOMAIN when we don't have a record of the right type but do recognise the domain [puppet] - 10https://gerrit.wikimedia.org/r/299903 (https://phabricator.wikimedia.org/T139438) (owner: 10Alex Monk) [15:02:13] (03CR) 10Giuseppe Lavagetto: [WiP] puppetmaster: add role for puppetdb (039 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303801 (owner: 10Giuseppe Lavagetto) [15:04:11] (03PS18) 10Alex Monk: beta: Use Let's Encrypt cert [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [15:04:33] (03PS2) 10Giuseppe Lavagetto: [WiP] puppetmaster: add role for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/303801 [15:06:00] (03CR) 10Rush: [V: 032] labs dnsrecursor metaldns: Don't return NXDOMAIN when we don't have a record of the right type but do recognise the domain [puppet] - 10https://gerrit.wikimedia.org/r/299903 (https://phabricator.wikimedia.org/T139438) (owner: 10Alex Monk) [15:06:01] subbu: Repo: parsoid/deploy [15:06:01] Tag: parsoid/deploy-sync-20160809-150515 [15:06:01] 43/43 minions completed fetch [15:06:05] so I suppose I fixed it [15:06:13] not sure how, but I did... [15:06:31] * akosiaris hates trebuchet [15:06:44] (03CR) 10jenkins-bot: [V: 04-1] beta: Use Let's Encrypt cert [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) (owner: 10Alex Monk) [15:07:43] (03PS3) 10Giuseppe Lavagetto: [WiP] puppetmaster: add role for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/303801 [15:07:57] <_joe_> akosiaris: welcome to the club :P [15:08:20] I 've been in the club for at least 2 years now [15:08:28] it's just I 've been hiding in the shadows [15:09:00] (03PS19) 10Alex Monk: beta: Use Let's Encrypt cert [puppet] - 10https://gerrit.wikimedia.org/r/247587 (https://phabricator.wikimedia.org/T50501) [15:09:12] PROBLEM - puppet last run on mw1250 is CRITICAL: CRITICAL: Puppet has 1 failures [15:09:27] (03CR) 10jenkins-bot: [V: 04-1] [WiP] puppetmaster: add role for puppetdb [puppet] - 10https://gerrit.wikimedia.org/r/303801 (owner: 10Giuseppe Lavagetto) [15:09:47] 06Operations, 06WMF-Legal, 06WMF-NDA-Requests: ZhouZ needs access to WMF-NDA group - https://phabricator.wikimedia.org/T98722#2536654 (10Aklapper) >>! In T98722#2534884, @ZhouZ wrote: > So everyone who has a @wikimedia.org account should be on a NDA and can be added to the WMF-NDA group. So going forward, t... [15:10:42] akosiaris, looks like one minion is missing .. usually 44. [15:10:57] but, thanks for fixing. :) [15:12:11] ah, looks like wtp2006 is down .. https://ganglia.wikimedia.org/latest/?r=day&cs=&ce=&c=Parsoid+codfw&h=&tab=m&vn=&hide-hf=false&m=cpu_report&sh=1&z=small&hc=4&host_regex=&max_graphs=0&s=by+name [15:12:17] it is not in that list. [15:12:32] dcausse: https://gerrit.wikimedia.org/r/#/c/303597/ is live on mw1099, check if possible please [15:13:19] 06Operations, 06Services, 15User-mobrovac: Move all Node.JS services to Jessie and Node 4 - https://phabricator.wikimedia.org/T124989#2536668 (10mobrovac) [15:13:24] thcipriani: sure, with X-Wikimedia-Debug header that's right? [15:13:40] dcausse: yup, that's correct [15:14:04] https://wikitech.wikimedia.org/wiki/X-Wikimedia-Debug [15:15:38] 06Operations, 06Services, 06Services-next, 05Security: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#2536674 (10Lea_WMDE) I’m not sure if we are ready for a timeline just yet. At this point, what we know is this: - There is one communit... [15:16:29] thcipriani: it works [15:16:40] dcausse: ack, rolling out everywhere [15:16:42] https://test2.wikipedia.org/w/index.php?search=find+areas section link is correct [15:18:46] !log thcipriani@tin Synchronized php-1.28.0-wmf.13/extensions/CirrusSearch/includes/Search/Result.php: SWAT: [[gerrit:303597|Use createFragmentTarget instead of setFragment (T142297)]] (duration: 00m 54s) [15:18:48] T142297: Search results discard first letter of fragment identifiers - https://phabricator.wikimedia.org/T142297 [15:18:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:18:54] ^ dcausse live everywhere now [15:19:28] thcipriani: awesome, it works as expected without the header [15:19:30] thanks! [15:19:42] awesome :) thanks for checking! [15:19:55] (double-checking :)) [15:19:59] :) [15:20:11] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 19 failures [15:21:41] (03PS4) 10Andrew Bogott: Labs: Generate/store root passwords for instances [puppet] - 10https://gerrit.wikimedia.org/r/303617 (https://phabricator.wikimedia.org/T142216) [15:23:19] (03CR) 10Andrew Bogott: Labs: Generate/store root passwords for instances (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303617 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [15:23:39] paravoid: revised! (although yuvi refactored some stuff underneath me in the meantime) [15:23:56] https://gerrit.wikimedia.org/r/#/c/303617/ [15:25:12] PROBLEM - check_puppetrun on payments2001 is CRITICAL: CRITICAL: Puppet has 19 failures [15:28:05] 06Operations, 06Services, 06Services-next, 05Security: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#2536727 (10ssastry) Another thing to factor in is that often, the discussion is focused on the 'PDF rendering' and less on the fact tha... [15:30:11] RECOVERY - check_puppetrun on payments2001 is OK: OK: Puppet is currently enabled, last run 121 seconds ago with 0 failures [15:31:36] (03PS6) 10ArielGlenn: If a prereq job is missing, optionally run it instead of giving up [dumps] - 10https://gerrit.wikimedia.org/r/302706 (https://phabricator.wikimedia.org/T141981) [15:32:45] 06Operations, 10Analytics, 10Traffic: Correct cache_status field on webrequest dataset - https://phabricator.wikimedia.org/T142410#2536733 (10Nuria) @elukey : If I understand things right this changeset will emit a new header that we need to publish via varnishkafka. The header value should replace whatever... [15:34:33] RECOVERY - puppet last run on mw1250 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [15:35:35] (03CR) 10ArielGlenn: [C: 032] If a prereq job is missing, optionally run it instead of giving up [dumps] - 10https://gerrit.wikimedia.org/r/302706 (https://phabricator.wikimedia.org/T141981) (owner: 10ArielGlenn) [15:36:00] (03PS2) 10BBlack: vk webrequest: use X-Cache-Status for cache_status [puppet] - 10https://gerrit.wikimedia.org/r/303579 (https://phabricator.wikimedia.org/T142410) [15:36:34] (03CR) 10BBlack: [C: 032 V: 032] vk webrequest: use X-Cache-Status for cache_status [puppet] - 10https://gerrit.wikimedia.org/r/303579 (https://phabricator.wikimedia.org/T142410) (owner: 10BBlack) [15:37:42] 06Operations, 10Analytics, 10Traffic: Correct cache_status field on webrequest dataset - https://phabricator.wikimedia.org/T142410#2533998 (10BBlack) @Nuria + @elukey - the second patch just merged above takes care of the varnishkafka part. So this should gradually go live over the next ~30 minutes. [15:45:45] andrewbogott: I don't see yuvi's changes [15:46:22] paravoid: I just mean https://gerrit.wikimedia.org/r/#/c/303743/ [15:46:34] it eliminates one of the files that was changed in my patch [15:47:08] ah [15:47:09] so now I have a role class that uses a file source inside a different module. I hate that, but I don't really know what the proper approach is [15:47:27] 06Operations, 10hardware-requests: reclaim and return all cisco servers - https://phabricator.wikimedia.org/T128821#2536817 (10Papaul) [15:47:29] 06Operations, 10ops-codfw, 10hardware-requests: codfw: audit cisco servers for return/decom - https://phabricator.wikimedia.org/T140787#2536815 (10Papaul) 05Open>03Resolved complete [15:48:50] yeah, that's not great [15:49:01] why not moving it under the right module? [15:49:05] May 2014, the US 4 × 100 metres relayteam member Tyson Gay received a one-year suspension for anabolic steroid use and was stripped of his medals after 15 July 2012 when he first used.[30] In May 2015, the IOC wrote to US Olympic Committee telling them to collect the medals from teammates Trell Kimmons, Justin Gatlin, Ryan Bailey, Jeffery Demps and Darvis Patton.[31] Two of Gay's teammates who ran with him in the final, Kimmons and Bai [15:49:16] Officially, IOC stripped the silver medal but has not yet confirmed the redistribution of the medals in this discipline.[33] [15:49:17] On 17 August 2015, the Court of Arbitration for Sport says it approved a settlement agreed to by Turkish athlete Aslı Çakır Alptekin and the IAAF. Alptekin has agreed to forfeit her 1500 metres Olympic title and serve an eight-year ban for blood doping.[34]IOC has not yet confirmed the redistribution of the medals in this event.[35] On [15:49:20] paravoid: the 'roles' module? [15:49:24] oh [15:49:25] right [15:49:32] November 9, 2015, the Independent Commission Investigation of the World Anti-Doping Agency asks for a lifetime ban for doping vs Mariya Savinova the Russian gold medalist in the Women's 800 metres and her Russian teammate, bronze medalist Ekaterina Poistogova..[38] A third Russian in the event,Yelena Arzhakova in fifth place, was already disqualified in 2013. Redistribution of medals has not been announced but in the likely case all three R [15:49:45] March 2016, the Court of Arbitration for Sport has issued decision that all competitive results obtained by Olga Kaniskina from 15 August 2009 to 15 October 2012 are disqualified.[42] IOC has not yet confirmed the Olga Kaniskina's deprivation of her silver medal in women's 20km walk and redistribution of the medals [15:49:51] cat287: stop it [15:49:58] Ok [15:50:02] Sorry [15:50:14] 06Operations, 10ops-codfw, 10netops: audit network ports in a4-codfw - https://phabricator.wikimedia.org/T140935#2536841 (10Papaul) a:05Papaul>03RobH @RobH This is complete let me know if you have any questions [15:50:32] andrewbogott: then the resource should be moved elsewhere really :/ [15:50:56] roles aren't for the most part for provisioning files, they are for stitching classes together [15:51:37] revealed to also have abnormalities in her athlete "passport" that could also disqualify her from the event.[36] 4th place finisher Russian Tatyana Tomashova has a previous doping violation and fifth place EthiopianAbeba Aregawi, later representing Sweden was suspended for doping violation on February 29, 2016.[37] When reallocating medals, the IOC has previously elected not to advance athletes with a history of doping violations. [15:51:41] revealed to also have abnormalities in her athlete "passport" that could also disqualify her from the event.[36] 4th place finisher Russian Tatyana Tomashova has a previous doping violation and fifth place EthiopianAbeba Aregawi, later representing Sweden was suspended for doping violation on February 29, 2016.[37] When reallocating medals, the IOC has previously elected not to advance athletes with a history of doping violations. [15:51:47] Hiidoodod [15:52:02] November 9, 2015, the Independent Commission Investigation of the World Anti-Doping Agency asks for a lifetime ban for doping vs Mariya Savinova the Russian gold medalist in the Women's 800 metres and her Russian teammate, bronze medalist Ekaterina Poistogova..[38] A third Russian in the event,Yelena Arzhakova in fifth place, was already disqualified in 2013. Redistribution of medals has not been announced but in the likely case all three R [15:52:15] announced that Yuliya Kalina of Ukraine has been disqualified from the 2012 Summer Olympics and ordered to return the bronze medal from the 58 kg weightlifting event. Reanalysis of Kalina's samples from London 2012 resulted in a positive test for the prohibited substancedehydrochlormethyltestosterone(turinabol).[44] The positions were adjusted accordingly. [15:52:27] March 2016, the Court of Arbitration for Sport has issued decision that all competitive results obtained by Sergey Kirdyapkin from 20 August 2009 to 15 October 2012 are disqualified.[39] IOC has not yet confirmed the Sergey Kirdyapkin's deprivation of his gold medal in men's 50km walk and redistribution of the medals in this event.[40] On 17 June 2016, the Court of Arbitration for Sport over turned the Russian decision in March. As a result, [15:52:42] August 2015, the Court of Arbitration for Sport says it approved a settlement agreed to by Turkish athlete Aslı Çakır Alptekin and the IAAF. Alptekin has agreed to forfeit her 1500 metres Olympic title and serve an eight-year ban for blood doping.[34]IOC has not yet confirmed the redistribution of the medals in this event.[35] On June 1, 2016, Turkish silver medalist Gamze Bulutwas revealed to also have [15:52:48] paravoid: can you kick please? [15:52:54] August 2015, the Court of Arbitration for Sport says it approved a settlement agreed to by Turkish athlete Aslı Çakır Alptekin and the IAAF. Alptekin has agreed to forfeit her 1500 metres Olympic title and serve an eight-year ban for blood doping.[34]IOC has not yet confirmed the redistribution of the medals in this event.[35] On June 1, 2016, Turkish silver medalist Gamze Bulutwas revealed to also have [15:52:54] yeah... [15:52:59] Hi [15:53:05] (03CR) 10Faidon Liambotis: [C: 04-1] "Minor nits. Should we also have a separate /usr/local/sbin script to retrieve the password for a project? (or even instance?)" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303617 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [15:53:16] Gay received a one-year suspension for anabolic steroid use and was stripped of his medals after 15 July 2012 when he first used.[30] In May 2015, the IOC wrote to US Olympic Committee telling them to collect the medals from teammates Trell Kimmons, Justin Gatlin, Ryan Bailey, Jeffery Demps and Darvis Patton.[31] Two of Gay's teammates who ran with him in the final, Kimmons and Bailey, had previously also served suspensions. If the medals [15:53:31] thx [15:53:32] thx [15:53:49] paravoid: better +b *!*@185.135.157.* [15:53:50] seriously [15:54:07] they see me op and telling them to stop, what did they expect really [15:54:09] Do we think that cat287 was a bot that was smart enough to apologize when scolded? If so I'm a bit impressed [15:54:19] haha [15:54:45] (03PS7) 10ArielGlenn: Make scheduler hupable. [dumps] - 10https://gerrit.wikimedia.org/r/302831 [15:55:06] (03CR) 10Eevans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [15:55:38] Not a bot. Just a troll. [15:55:56] They've been trolling across a large number of WMF channels. [15:57:20] paravoid: so, you think a new module/class just for labs puppetmaster? (I liked it just fine the way it was before :/ ) [15:57:32] or how it was before indeed [15:58:23] (03PS4) 10Elukey: Switch the AQS restbase use from 'cassandra' to 'aqs' [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) [15:58:39] yuvipanda: I sort of want to revert https://gerrit.wikimedia.org/r/#/c/303743/ — it messes with my current work on the labs puppetmaster. [15:59:50] !log switching restbase/cassandra user on aqs100[123] to aqs (T142073) - https://gerrit.wikimedia.org/r/303798 will be applied to one node at the time with depool/pool [15:59:50] (03PS1) 10ArielGlenn: turn on checkpointing for dumps of jawiki, commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/303813 [15:59:51] T142073: Improve user management for AQS - https://phabricator.wikimedia.org/T142073 [15:59:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:01:14] (03CR) 10Elukey: [C: 032] Switch the AQS restbase use from 'cassandra' to 'aqs' [puppet] - 10https://gerrit.wikimedia.org/r/303798 (https://phabricator.wikimedia.org/T142073) (owner: 10Elukey) [16:05:09] (03PS2) 10ArielGlenn: turn on checkpointing for dumps of jawiki, commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/303813 [16:07:44] (03CR) 10ArielGlenn: [C: 032] turn on checkpointing for dumps of jawiki, commonswiki [puppet] - 10https://gerrit.wikimedia.org/r/303813 (owner: 10ArielGlenn) [16:09:22] (03PS5) 10Andrew Bogott: Labs: Generate/store root passwords for instances [puppet] - 10https://gerrit.wikimedia.org/r/303617 (https://phabricator.wikimedia.org/T142216) [16:11:02] paravoid: once more ^ [16:15:13] (03CR) 10Glaisher: "This didn't update $wgAddGroups and $wgRemoveGroups and so causes T142492." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/302987 (owner: 10MarcoAurelio) [16:21:34] !log restbase deploy start of b800d343 [16:21:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:34:26] (03PS1) 10Gehel: maps - osm-initial-import fixes [puppet] - 10https://gerrit.wikimedia.org/r/303816 (https://phabricator.wikimedia.org/T138092) [16:41:22] andrewbogott: in a meeting, but from a quick glance I don't see how puppetmaster::labsrootpass is an improvement over puppetmaster::labs that we had before [16:41:49] !log restbase deploy end of b800d343 [16:41:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:42:13] paravoid: only because it does a specific thing rather than representing a kind of puppetmaster. [16:42:30] Also the older code included a switch in the top-level puppetmaster class, I think [16:42:40] mostly I'm just trying to avoid wheel-warring with yuvi :) [16:43:03] heh [16:43:22] we should figure out our strategy here, but I'm fine if you want to merge this as-is and figure it out later [16:43:27] but not much later, hopefully :) [16:44:01] ok, thanks [16:44:11] because it feels like we're going back and forth and ending up with an inferior result than we originally had at the end :) [16:44:39] I don't personally see anything wrong with a class representing a kind of puppetmaster that collects all those special things that the labs puppetmaster does [16:44:53] but ymmv (or yuvi's) [16:45:11] so I'm not sure how yuvi envisions this, we should ask him :) [16:45:57] (03CR) 10MaxSem: maps - osm-initial-import fixes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303816 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [16:47:47] (03CR) 10Gehel: maps - osm-initial-import fixes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303816 (https://phabricator.wikimedia.org/T138092) (owner: 10Gehel) [16:48:37] :facepalm: [16:49:21] MaxSem: a specific reason for the facepalm? [16:49:34] I went to look for that change [16:49:42] to see what the facepalm was about [16:49:52] I still don't know, but an erb generating a shell script, eww :) [16:50:19] well [16:50:44] (03CR) 10Dzahn: "@Alex i think we should just go back to the simple change that removes those lines. doing all this seems to overcomplicate it since it wil" [puppet] - 10https://gerrit.wikimedia.org/r/297727 (https://phabricator.wikimedia.org/T132661) (owner: 10Dzahn) [16:50:53] now that we have more than one script, we might want to move all the erby parts into one variables script to rule them all :) [16:52:58] (03CR) 1020after4: [C: 032] Apply mobile cookie domain fix to beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303744 (https://phabricator.wikimedia.org/T49647) (owner: 10Gergő Tisza) [16:53:23] (03Merged) 10jenkins-bot: Apply mobile cookie domain fix to beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303744 (https://phabricator.wikimedia.org/T49647) (owner: 10Gergő Tisza) [16:53:33] (03PS6) 10Andrew Bogott: Labs: Generate/store root passwords for instances [puppet] - 10https://gerrit.wikimedia.org/r/303617 (https://phabricator.wikimedia.org/T142216) [16:54:06] is strontium going to come back or decom? [16:54:15] since the disk issue [16:55:18] (03CR) 10Andrew Bogott: [C: 032] Labs: Generate/store root passwords for instances [puppet] - 10https://gerrit.wikimedia.org/r/303617 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [16:56:03] For arguments that are constant for a specific host, erb generation seems fine to me... [17:00:04] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Dear anthropoid, the time has come. Please deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160809T1700). [17:00:22] Nothing to do for ORES. Soon though. SOON :) [17:00:43] (03CR) 10Dzahn: "@ErikZachte" [puppet] - 10https://gerrit.wikimedia.org/r/231284 (https://phabricator.wikimedia.org/T84777) (owner: 10Dzahn) [17:01:32] PROBLEM - Puppet catalogue fetch on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/labs-puppetmaster/eqiad - 185 bytes in 1.282 second response time [17:01:56] !log starting parsoid deploy [17:02:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:02:22] (03CR) 10GWicke: "This doesn't cover /api/rest_v1/?doc, which was the old canonical doc URL. I'll send a follow-up patch." [puppet] - 10https://gerrit.wikimedia.org/r/303790 (owner: 10Mobrovac) [17:03:25] (03PS2) 10Gehel: maps - osm-initial-import fixes [puppet] - 10https://gerrit.wikimedia.org/r/303816 (https://phabricator.wikimedia.org/T138092) [17:03:45] !log synced new code; restarted parsoid on wtp1001 as a canary [17:03:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:05:45] (03PS1) 10GWicke: Forward Accept also for /api/rest_v1/?doc [puppet] - 10https://gerrit.wikimedia.org/r/303825 [17:07:45] (03CR) 10Mobrovac: [C: 04-1] "LGTM, one minor issue in-lined" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [17:08:12] 06Operations, 10ops-codfw, 10netops: audit network ports in a4-codfw - https://phabricator.wikimedia.org/T140935#2537105 (10RobH) 05Open>03Resolved ports ge-4/0/0 through ge-4/0/11 were all labeled wrong on the switch port description. also removed the description on ge-4/0/39. >>! In T140935#2482666,... [17:10:26] (03CR) 10GWicke: Forward Accept also for /api/rest_v1/?doc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [17:10:28] (03PS1) 10Andrew Bogott: Include 'whois' package for mkpasswd [puppet] - 10https://gerrit.wikimedia.org/r/303826 (https://phabricator.wikimedia.org/T142216) [17:10:40] (03CR) 10Jcrespo: [C: 04-1] "Why would you want automatic code deployment? That looks like a security problem." [puppet] - 10https://gerrit.wikimedia.org/r/303719 (owner: 10Dzahn) [17:11:07] (03CR) 10GWicke: Forward Accept also for /api/rest_v1/?doc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [17:11:32] PROBLEM - Unmerged changes on repository mediawiki_config on tin is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [17:11:32] PROBLEM - Unmerged changes on repository mediawiki_config on mira is CRITICAL: There is one unmerged change in mediawiki_config (dir /srv/mediawiki-staging/, ref HEAD..readonly/master). [17:13:24] !log finished deploying parsoid sha a577d80e [17:13:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:14:32] (03PS2) 10GWicke: Forward Accept also for /api/rest_v1/?doc [puppet] - 10https://gerrit.wikimedia.org/r/303825 [17:14:40] (03CR) 10Mobrovac: Forward Accept also for /api/rest_v1/?doc (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [17:15:32] (03PS2) 10Andrew Bogott: Include 'whois' package for mkpasswd [puppet] - 10https://gerrit.wikimedia.org/r/303826 (https://phabricator.wikimedia.org/T142216) [17:15:34] (03PS1) 10Andrew Bogott: Temporarily remove the code to generate labs root passwords. [puppet] - 10https://gerrit.wikimedia.org/r/303828 (https://phabricator.wikimedia.org/T142216) [17:16:17] (03CR) 10Andrew Bogott: [C: 032 V: 032] Temporarily remove the code to generate labs root passwords. [puppet] - 10https://gerrit.wikimedia.org/r/303828 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [17:17:21] PROBLEM - puppet last run on mw2064 is CRITICAL: CRITICAL: puppet fail [17:17:51] 06Operations, 06WMF-Legal, 06WMF-NDA-Requests: ZhouZ needs access to WMF-NDA group - https://phabricator.wikimedia.org/T98722#2537149 (10ZhouZ) > @ZhouZ: What type of account is a "@wikimedia.org account"? Also, Phabricator does not expose the email addresses of user accounts. I am just going off based on Q... [17:19:02] RECOVERY - Puppet catalogue fetch on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 2.534 second response time [17:20:47] (03CR) 10GWicke: "The 404 is coming from Apache, as we only rewrite /api/rest_v1/ to the RB backend in Varnish. We can set up a redirect for /api/rest_v1 to" [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [17:20:50] (03CR) 10Jcrespo: "Looking at https://phabricator.wikimedia.org/diffusion/OSDB/ I think you did something incorrect on merging- the code on dbtree was clearl" [puppet] - 10https://gerrit.wikimedia.org/r/303719 (owner: 10Dzahn) [17:22:27] 06Operations, 06WMF-Legal, 06WMF-NDA-Requests: ZhouZ needs access to WMF-NDA group - https://phabricator.wikimedia.org/T98722#1275510 (10AlexMonk-WMF) @wikimedia.org isn't just managed by OIT, non-staff run addresses there, for example anything going through OTRS. [17:22:40] (03PS3) 10Andrew Bogott: Include 'whois' package for mkpasswd [puppet] - 10https://gerrit.wikimedia.org/r/303826 (https://phabricator.wikimedia.org/T142216) [17:22:42] (03PS1) 10Andrew Bogott: Don't install root password if has_admin = false [puppet] - 10https://gerrit.wikimedia.org/r/303829 (https://phabricator.wikimedia.org/T142216) [17:23:33] (03PS1) 10Ottomata: 1.3.1 release [debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/303830 [17:23:40] paravoid, here is a thing that you will hate: https://gerrit.wikimedia.org/r/#/c/303829/ [17:24:54] andrewbogott: I don't remember what that class does [17:25:03] I don't _think_ it needs to be private anymore though [17:25:04] (03CR) 10Mobrovac: "> We can set up a redirect for /api/rest_v1 to /api/rest_v1/ in Apache or Varnish if we care about this." [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [17:25:04] it installs the root password [17:25:17] what root password? :) [17:25:22] there is no prod root password anymore [17:25:37] oh there is I suppose [17:26:33] (03CR) 10GWicke: "Lets get the canonical API docs fixed, first." [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [17:26:59] (03CR) 10Jcrespo: "It doesn't have to be *you*, anyone with deployment rights could have messed up the repo." [puppet] - 10https://gerrit.wikimedia.org/r/303719 (owner: 10Dzahn) [17:32:08] (03CR) 10Andrew Bogott: "I don't love this, but if we're not using realm checks then I don't know of a more graceful way to exclude a class from Labs hosts." [puppet] - 10https://gerrit.wikimedia.org/r/303829 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [17:33:01] (03PS1) 10Alex Monk: labs dnsrecursor metaldns: Change hook to ensure SOA records get passed properly but with NOERROR instead of NXDOMAIN [puppet] - 10https://gerrit.wikimedia.org/r/303833 (https://phabricator.wikimedia.org/T139438) [17:33:46] I don't understand why realm checks are so inherrently bad honestly [17:33:51] but anyway [17:33:53] dinner time :) [17:34:21] paravoid: me either, but I'm pretty sure if I add one someone will revert :) [17:35:48] there's nothing left in prod using has_admin=false right? [17:35:57] wasn't it labstore that used it or something [17:36:19] Krenair: this is the first time I've seen that flag [17:36:19] but yeah, probably labstore [17:37:44] https://gerrit.wikimedia.org/r/#/c/255118/ [17:39:22] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003, stat1002 and bast1001 for ovasileva - https://phabricator.wikimedia.org/T142502#2537219 (10ovasileva) [17:42:12] 06Operations, 10Wikimedia-Logstash, 03Discovery-Search-Sprint: Elasticsearch restarts are failing in the logstash cluster - https://phabricator.wikimedia.org/T142357#2537252 (10debt) p:05Triage>03High [17:42:35] 06Operations, 06WMF-Legal, 06WMF-NDA-Requests: ZhouZ needs access to WMF-NDA group - https://phabricator.wikimedia.org/T98722#2537255 (10AlexMonk-WMF) That said I think everyone using OTRS has probably got done sort of NDA at this point? [17:43:58] (03CR) 10Ottomata: [C: 032] 1.3.1 release [debs/python-kafka] (debian) - 10https://gerrit.wikimedia.org/r/303830 (owner: 10Ottomata) [17:44:16] (03PS1) 10Muehlenhoff: ipsec_allow: Restrict to domain networks [puppet] - 10https://gerrit.wikimedia.org/r/303837 [17:44:42] RECOVERY - puppet last run on mw2064 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:46:45] robh: do anything about some varnish machines that are coming down/have come down, in esams? [17:51:14] !log restarting kafka broker on kafka1013 to test eventlogging leader changes [17:51:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:52:35] (03PS1) 10Muehlenhoff: openldap::labtest: Restrict to production/labs networks [puppet] - 10https://gerrit.wikimedia.org/r/303839 [17:53:25] (03CR) 10Alex Monk: puppetmaster: Split extra_auth_rules from is_labs_master (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/303757 (owner: 10Alex Monk) [17:56:05] (03CR) 10Andrew Bogott: [C: 031] openldap::labtest: Restrict to production/labs networks [puppet] - 10https://gerrit.wikimedia.org/r/303839 (owner: 10Muehlenhoff) [17:57:17] (03PS3) 10Alex Monk: puppetmaster: Kill is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303761 [17:57:19] (03PS4) 10Alex Monk: puppetmaster: Split extra_auth_rules from is_labs_master [puppet] - 10https://gerrit.wikimedia.org/r/303757 [17:57:21] (03PS1) 10Alex Monk: puppetmaster: Remove rhodium.eqiad.wmnet auth rule [puppet] - 10https://gerrit.wikimedia.org/r/303842 [18:01:25] whoooop, european swats! :o [18:02:52] MatmaRex: :) [18:06:08] urandom: sorry what? [18:06:10] (03PS1) 10Eevans: Install config owned by dedicated user [puppet] - 10https://gerrit.wikimedia.org/r/303846 [18:07:26] robh: apparently there are some varnish servers that are coming down, or may have already come down, in esams. They're old, and out of warranty, but it has been suggested that we might use them for restbase staging instead of purchasing new machines. I was curious if you knew anything about them. [18:08:07] I don't sorry. I recall yo uasking that on a task but I didnt know the answer [18:08:19] robh: no worries; thanks [18:08:24] I imagine either bblack or mark would know about esams decoms. [18:08:38] k [18:08:40] this is for staging cluster right? [18:08:44] yes [18:08:46] It seemed odd to put staging in esams. [18:08:56] we have more limited rack space and its a caching site [18:09:11] ¯\_(ツ)_/¯ [18:09:17] keeping out of warranty hardware in the racks, contrary to popular belief, does have an ongoing cost as well. =] [18:09:20] it's only a suggestion so far [18:09:23] yep [18:09:45] esams is larger than ulsfo (other caching site) so it has over twice the rackspace though [18:09:54] so more a mar_k call than anyone =] [18:10:03] (03PS10) 10BBlack: VCL backends 2/N: sort misc req_handling [puppet] - 10https://gerrit.wikimedia.org/r/300579 (https://phabricator.wikimedia.org/T110717) [18:10:05] (03PS11) 10BBlack: VCL backends 5/N: use for all clusters [puppet] - 10https://gerrit.wikimedia.org/r/300656 [18:10:07] (03PS10) 10BBlack: VCL backends 3/N: add force-pass support [puppet] - 10https://gerrit.wikimedia.org/r/300581 (https://phabricator.wikimedia.org/T110717) [18:10:09] (03PS11) 10BBlack: VCL backends 4/N: subpaths and defaulting [puppet] - 10https://gerrit.wikimedia.org/r/300655 [18:10:11] (03PS10) 10BBlack: VCL backends 1/N [WIP] [puppet] - 10https://gerrit.wikimedia.org/r/300574 (https://phabricator.wikimedia.org/T110717) [18:10:12] robh: it was mark that made the suggestion, fwiw :) [18:10:23] (03PS3) 10BBlack: Forward Accept also for /api/rest_v1/?doc [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [18:10:24] worth a lot actually, heh [18:10:34] but yeah, I dunno about esams decoms, sorry (just the US based ones) [18:10:44] but i don't recall that he knew the specs off hand when the subject last came up [18:10:53] there isnt any tracking of esams spares right now, typically becuase we have none, heh [18:11:01] gotcha [18:11:39] (03PS4) 10BBlack: Forward Accept also for /api/rest_v1/?doc [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [18:13:12] (03CR) 10Andrew Bogott: [C: 032] Don't install root password if has_admin = false [puppet] - 10https://gerrit.wikimedia.org/r/303829 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [18:14:08] (03CR) 10Andrew Bogott: [C: 032] Include 'whois' package for mkpasswd [puppet] - 10https://gerrit.wikimedia.org/r/303826 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [18:14:32] (03PS4) 10Andrew Bogott: Fixups for labs password generation: [puppet] - 10https://gerrit.wikimedia.org/r/303826 (https://phabricator.wikimedia.org/T142216) [18:15:02] (03CR) 10BBlack: [C: 032] Forward Accept also for /api/rest_v1/?doc [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [18:15:15] (03PS5) 10BBlack: Forward Accept also for /api/rest_v1/?doc [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [18:15:29] (03CR) 10BBlack: [V: 032] Forward Accept also for /api/rest_v1/?doc [puppet] - 10https://gerrit.wikimedia.org/r/303825 (owner: 10GWicke) [18:16:07] (03PS1) 10Andrew Bogott: Revert "Temporarily remove the code to generate labs root passwords." [puppet] - 10https://gerrit.wikimedia.org/r/303847 [18:16:23] (03CR) 10Andrew Bogott: [C: 032] Fixups for labs password generation: [puppet] - 10https://gerrit.wikimedia.org/r/303826 (https://phabricator.wikimedia.org/T142216) (owner: 10Andrew Bogott) [18:16:31] (03PS5) 10Andrew Bogott: Fixups for labs password generation: [puppet] - 10https://gerrit.wikimedia.org/r/303826 (https://phabricator.wikimedia.org/T142216) [18:16:41] (03PS2) 10Andrew Bogott: Revert "Temporarily remove the code to generate labs root passwords." [puppet] - 10https://gerrit.wikimedia.org/r/303847 [18:18:36] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2537457 (10Eevans) From IRC: ```lang=irc 14:07 < urandom> robh: apparently there are some varnish servers that are coming down, or may have already come dow... [18:19:22] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [18:20:15] 06Operations, 10Cassandra, 06Services, 10hardware-requests: 9x or 15x additional Cassandra/RESTBase nodes - https://phabricator.wikimedia.org/T139961#2537461 (10RobH) a:05RobH>03mark It seems this was @mark's idea to use these systems, so I'm assigning this request to him to field the question about th... [18:20:52] (03CR) 10Andrew Bogott: [C: 032] Revert "Temporarily remove the code to generate labs root passwords." [puppet] - 10https://gerrit.wikimedia.org/r/303847 (owner: 10Andrew Bogott) [18:21:12] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [18:21:21] (03CR) 10Alex Monk: "Tested on labs-dnsrecursor-test.openstack.eqiad.wmflabs and labtestservices2001.wikimedia.org" [puppet] - 10https://gerrit.wikimedia.org/r/303833 (https://phabricator.wikimedia.org/T139438) (owner: 10Alex Monk) [18:21:33] (03CR) 10Eevans: "This seems to do the right thing on current production machines, but I suspect it doesn't take into account the move to scap3. @mobrovac," [puppet] - 10https://gerrit.wikimedia.org/r/303846 (owner: 10Eevans) [18:23:04] ostriches: around? [18:23:16] (03PS2) 10Eevans: Install config owned by dedicated user [puppet] - 10https://gerrit.wikimedia.org/r/303846 [18:25:40] (03CR) 10Chad: "You should restore this, see my inline comment." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [18:26:05] Danny_B: Yeah, I'm catching up. I ended up pretty sick so I was out yesterday. [18:26:20] I [18:26:20] I'll get you that data today :) [18:28:21] (03PS1) 10Andrew Bogott: Revert "Revert "Temporarily remove the code to generate labs root passwords."" [puppet] - 10https://gerrit.wikimedia.org/r/303848 [18:29:33] (03CR) 10Andrew Bogott: [C: 032 V: 032] Revert "Revert "Temporarily remove the code to generate labs root passwords."" [puppet] - 10https://gerrit.wikimedia.org/r/303848 (owner: 10Andrew Bogott) [18:30:08] (03Restored) 10Paladox: Gerrit: allows us to choose our auth type since production ldap does not work in labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [18:31:06] muchas gracias ostriches [18:32:30] spanish now :) [18:34:29] hi andrewbogott [18:34:31] (03PS1) 10BBlack: remove mobile-lb comments about re-use [dns] - 10https://gerrit.wikimedia.org/r/303849 [18:34:38] sorry about the refactors - me and krenair fell into it yesterday night. [18:34:41] what's the problem with it now? [18:34:53] yuvipanda: I've mostly worked around it [18:35:05] but we actually need classes that are specific to the labs puppetmaster [18:35:09] so I just made a new one with a different name [18:35:13] (03CR) 10BBlack: [C: 032] remove mobile-lb comments about re-use [dns] - 10https://gerrit.wikimedia.org/r/303849 (owner: 10BBlack) [18:35:25] andrewbogott that should just be in the role [18:35:31] no [18:35:44] can you show me the patches? [18:35:49] https://gerrit.wikimedia.org/r/#/c/303617/ [18:35:57] where would I put the file resource in that case? [18:36:13] role::labs::puppetmaster? [18:36:36] I mean, the actual file [18:36:45] it should be in the puppetmaster module [18:36:47] not in the role module [18:36:50] module/role/files/labs/? [18:36:50] as far as I know [18:37:09] why? it isn't anything puppetmaster specific, only labs uses it. [18:37:22] it's totally puppetmaster specific [18:37:29] it's labs puppetmaster specific [18:37:30] but I don't care about this enough to wage a holy war [18:37:33] rather than 'puppetmaster' [18:37:38] yes, and the labs puppetmaster is a kind of thing! [18:37:43] Which needs classes and files and such [18:37:48] But, ok [18:37:50] 06Operations, 10Ops-Access-Requests: Requesting access to stat1003, stat1002 and bast1001 for ovasileva - https://phabricator.wikimedia.org/T142502#2537219 (10JKatzWMF) just adding approval for this access, thanks! [18:37:59] I don't care, I'm just echoing what faidon said, I don't care [18:38:01] if you think the role is too restricting, you can totally have another module [18:38:03] I see. [18:38:11] if you want to rearrange my stuff again, please do, just ask him to review it not me [18:38:15] what I ultimately want to do is to kill the puppetmaster self role. [18:38:31] and having the puppetmaster module have role-ish things in it makes that difficult. [18:38:49] ok, sorry if you felt this was invasive of your work, andrewbogott :| didn't mean it to be. [18:39:00] not your fault — just [18:39:08] we need to agree ops-wide what our style guide is [18:39:12] otherwise it becomes impossible to change anything [18:39:39] I agree, I think these are all floating in the ether and not written down anywhere [18:39:41] esp. role vs module stuff [18:39:49] and where realm branches are acceptable and not [18:39:54] probably a topic for the offsite [18:40:24] yeah [18:41:57] andrewbogott for the password script, I think a param to puppetmaster called 'generate_passwords' would also be fine I think. That's what we did for the git differences. [18:42:19] basically each difference should be made descriptive and specific, and not be just a realm branch... [18:42:28] and then you can set that to true from the labs role, and false from the prod role. [18:43:01] krenair paravoid ^ [18:44:18] andrewbogott I see there's already a segment on this on the etherpad! I'll expand on it and stuff soon [18:56:03] twentyafterfour, could i do a quick scap3 service depl before you? [18:56:10] or it could go at the same time [18:56:24] It would yurik go for it [18:56:32] !log krinkle@tin Synchronized php-1.28.0-wmf.13/extensions/WikimediaEvents/modules/ext.wikimediaEvents.deprecate.js: Log RL splitRequest (duration: 02m 08s) [18:56:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:56:38] yurik: I can wait [19:00:04] twentyafterfour: Respected human, time to deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160809T1900). Please do the needful. [19:05:15] !log deployed/restarted kartotherian https://gerrit.wikimedia.org/r/#/c/303841/ and tilerator https://gerrit.wikimedia.org/r/#/c/303840/ [19:05:18] twentyafterfour, done [19:05:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:05:28] cc: MaxSem gehel ^ [19:05:32] thanks [19:12:07] (03CR) 10Dzahn: "eh.. does this mean you want it to have that additonal manual "git pull" step on the server and you ran that after each merge?" [puppet] - 10https://gerrit.wikimedia.org/r/303719 (owner: 10Dzahn) [19:15:48] !log twentyafterfour@tin Started scap: testwiki to 1.28.0-wmf.14 refs T139217 [19:15:49] T139217: MW-1.28.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T139217 [19:15:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:17:53] (03CR) 1020after4: "sounds to like it should be deployed with scap if you don't want auto-deploy with puppet. just my $0.0002" [puppet] - 10https://gerrit.wikimedia.org/r/303719 (owner: 10Dzahn) [19:20:51] (03CR) 10Dzahn: "sounds like that would make everything a lot more complex.. but for what advantage exactly" [puppet] - 10https://gerrit.wikimedia.org/r/303719 (owner: 10Dzahn) [19:40:27] !log restarting rabbitmq-server on labcontrol1001 [19:40:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:45:11] 06Operations, 06Services, 06Services-next, 05Security: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#2537844 (10GWicke) At our discussion at Wikimania, I made the case that the wikibooks use cases might be better served by a more book-s... [19:48:25] (03PS1) 10MarcoAurelio: Fix wgAddGroups/wgRemoveGroups for mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303856 (https://phabricator.wikimedia.org/T142492) [19:53:45] (03CR) 10MarcoAurelio: [C: 04-1] "This won't merge unless rebased." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303548 (https://phabricator.wikimedia.org/T111815) (owner: 10Muehlenhoff) [19:54:16] (03CR) 10MarcoAurelio: Support scaling of huge SVGs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303548 (https://phabricator.wikimedia.org/T111815) (owner: 10Muehlenhoff) [20:05:33] !log twentyafterfour@tin Finished scap: testwiki to 1.28.0-wmf.14 refs T139217 (duration: 49m 45s) [20:05:34] T139217: MW-1.28.0-wmf.14 deployment blockers - https://phabricator.wikimedia.org/T139217 [20:05:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:08:24] !log cr[12]-eqiad: added term relforge for instances->relforge100[12] in labs-support [20:08:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:10:08] (03PS1) 10Addshore: Add new ssh key for Addshore [puppet] - 10https://gerrit.wikimedia.org/r/303863 [20:10:25] do I need to verify that in some way or is it just generally good to go ^^ ? [20:13:06] I've often wondered that myself [20:13:17] is gerrit patch good enough or...? [20:13:47] 06Operations, 06Discovery, 10Elasticsearch, 10netops, 03Discovery-Search-Sprint: Enable access to relforge clusters from virtual machines running on labs - https://phabricator.wikimedia.org/T142211#2537985 (10BBlack) 05Open>03Resolved My comment above was under the false assumption that relforge100[1... [20:14:05] 06Operations, 06Discovery, 10Elasticsearch, 10netops, 03Discovery-Search-Sprint: Enable access to relforge clusters from virtual machines running on labs - https://phabricator.wikimedia.org/T142211#2537993 (10Gehel) Thanks to @BBlack, routers are now configured. I tested from `cirrus-browser-bot` and con... [20:15:03] error: insufficient permission for adding an object to repository database .git/objects [20:15:05] fatal: failed to write commit object [20:15:18] (03CR) 10Addshore: [C: 04-1] "regenerating...." [puppet] - 10https://gerrit.wikimedia.org/r/303863 (owner: 10Addshore) [20:15:42] ... can someone with root fix the permissions? of /srv/mediawiki-staging on tin? [20:16:17] actually I don't get it, the permissions look ok [20:16:52] ohhh there are files owned by root [20:17:07] plz can roots not commit to that repo as root? :-/ [20:17:40] !log root@tin:/srv/mediawiki-staging# find . -uid 0 -exec chown mwdeploy:wikidev {} \; [20:17:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:17:50] mutante: thank you [20:17:50] twentyafterfour: we were looking at that a little bit last week, it's not clear that it's a root doing it :/ [20:17:53] twentyafterfour: i dont think a person does that, it happens every day [20:18:01] oh [20:18:04] it used to happen but not that often [20:18:05] greg-g: how could non-root do it? [20:18:06] next assumption is puppet [20:18:26] (03PS2) 10Addshore: Add new ssh key for Addshore [puppet] - 10https://gerrit.wikimedia.org/r/303863 [20:18:34] hmm does puppet do any operations on that repo? [20:18:43] https://phabricator.wikimedia.org/T127093#2527467 [20:18:48] * greg-g shrugs [20:19:27] !log aaron@tin Synchronized php-1.28.0-wmf.14/extensions/FlaggedRevs: 52f54661a84f2066bc4c3a13e603ff8c6e5db357 (duration: 00m 55s) [20:19:27] no alerts for it again? [20:19:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:19:59] apergos: there are alerts [20:20:17] I mean, I don't see one in the backscroll [20:20:18] https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=tin&service=Improperly+owned+-0%3A0-+files+in+%2Fsrv%2Fmediawiki-staging [20:20:24] it was probably only 'soft' [20:20:33] (03PS1) 1020after4: group0 to 1.28.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303866 [20:20:37] yes, that is it [20:20:40] Current Attempt:2/10 (SOFT state) [20:20:48] meh [20:20:50] it is configured to only trigger after 10 attempts [20:20:52] (03PS2) 1020after4: group0 to 1.28.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303866 [20:21:55] so we really need to watch that for I dunno a 12 hour period and see if/when it goes off [20:21:57] bleah [20:22:12] also "unmerged change on tin" [20:22:19] (03CR) 1020after4: [C: 032] group0 to 1.28.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303866 (owner: 1020after4) [20:22:49] (03Merged) 10jenkins-bot: group0 to 1.28.0-wmf.14 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303866 (owner: 1020after4) [20:23:03] we have alert history [20:23:05] unmerged change on tin is normal during the train deployment [20:23:06] https://icinga.wikimedia.org/cgi-bin/icinga/history.cgi?host=tin&service=Improperly+owned+-0%3A0-+files+in+%2Fsrv%2Fmediawiki-staging [20:23:19] https://icinga.wikimedia.org/cgi-bin/icinga/history.cgi?host=tin [20:23:28] ah, sorry but I can't read here and there at the same time [20:23:44] (03PS1) 10Andrew Bogott: make-labs-root-password: Avoid writing a newline [puppet] - 10https://gerrit.wikimedia.org/r/303867 [20:23:52] doesnt tell us anything because it forgets the history [20:23:54] https://icinga.wikimedia.org/cgi-bin/icinga/histogram.cgi?host=tin&service=Improperly+owned+-0%3A0-+files+in+%2Fsrv%2Fmediawiki-staging&t1=1470009600&t2=1470774113&assumestateretention=yes [20:23:56] yup [20:24:27] 2016-08-09 20:17:51 that's the only one from yesterday and today [20:24:30] https://icinga.wikimedia.org/cgi-bin/icinga/trends.cgi?t1=1470687680&t2=1470774080&host=tin&service=Improperly+owned+-0%3A0-+files+in+%2Fsrv%2Fmediawiki-staging&assumeinitialstates=yes&assumestateretention=yes&assumestatesduringnotrunning=yes&includesoftstates=no&initialassumedhoststate=0&initialassumedservicestate=0&backtrack=4&timeperiod=thismonth&zoom=4 [20:24:39] so it's clearly not something puppet is setting off [20:24:44] there are specific timestamps ^^ [20:25:07] !log twentyafterfour@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to 1.28.0-wmf.14 [20:25:09] maybe a root cronjob? [20:25:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:26:05] (03CR) 10Jhobs: [C: 031] Promote new language switcher to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303003 (https://phabricator.wikimedia.org/T129505) (owner: 10Jdlrobson) [20:26:38] just a single root cronjob. and it's deleting stuff in /tmp [20:27:30] (03CR) 10Andrew Bogott: [C: 032] make-labs-root-password: Avoid writing a newline [puppet] - 10https://gerrit.wikimedia.org/r/303867 (owner: 10Andrew Bogott) [20:27:31] !log aaron@tin Synchronized php-1.28.0-wmf.13/extensions/FlaggedRevs: 3acc2bd85544072b7fd55abd2829f2bdde7aeef8 (duration: 00m 53s) [20:27:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:27:45] yeah it's not going to be one of those [20:30:06] (03PS1) 10Sbisson: Revert "Temporarily disable thank-you-edit notifications" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303890 [20:30:18] (03CR) 10jenkins-bot: [V: 04-1] Revert "Temporarily disable thank-you-edit notifications" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303890 (owner: 10Sbisson) [20:30:23] (03Abandoned) 10Sbisson: Revert "Temporarily disable thank-you-edit notifications" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303890 (owner: 10Sbisson) [20:32:29] legoktm: ping [20:32:36] 06Operations, 06Services, 06Services-next, 05Security: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#2538081 (10ssastry) >>! In T142226#2537844, @GWicke wrote: > At our discussion at Wikimania, I made the case that the wikibooks use cas... [20:32:39] mafk: hi [20:32:57] legoktm: hi, if I bribe you gently could you please take care of https://gerrit.wikimedia.org/r/#/c/303856/ ? [20:33:36] what am I being bribed with? [20:33:39] jouncebot: next [20:33:39] In 2 hour(s) and 26 minute(s): Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160809T2300) [20:33:54] twentyafterfour: are you done with the train? [20:34:04] legoktm: wiki-stroopwaffels [20:34:38] I don't see anything in logs from tin or mira that looksl ikely [20:34:41] legoktm: yes, at least I'm done deploying. I still need to clean up old branches and update the deploy notes [20:34:53] puppet ran at a different time, [20:35:10] twentyafterfour: is it okay if I sync something out or would you rather I wait? [20:35:17] legoktm: go for it [20:35:26] (03PS2) 10Legoktm: Fix wgAddGroups/wgRemoveGroups for mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303856 (https://phabricator.wikimedia.org/T142492) (owner: 10MarcoAurelio) [20:35:30] (03CR) 10Legoktm: [C: 032] Fix wgAddGroups/wgRemoveGroups for mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303856 (https://phabricator.wikimedia.org/T142492) (owner: 10MarcoAurelio) [20:35:57] (03Merged) 10jenkins-bot: Fix wgAddGroups/wgRemoveGroups for mediawikiwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303856 (https://phabricator.wikimedia.org/T142492) (owner: 10MarcoAurelio) [20:36:25] (03PS3) 10Dzahn: Gerrit: make auth_type configurable for labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 (owner: 10Paladox) [20:36:35] well there was a puppet run on mira at that time but it says nothing [20:36:46] (03PS1) 10Andrew Bogott: Replace code to generate labs root passwords. [puppet] - 10https://gerrit.wikimedia.org/r/303908 [20:37:35] twentyafterfour: hmm, git log HEAD...origin/master has 2 "group0 to 1.28.0-wmf.14" commits [20:38:08] (03CR) 10Andrew Bogott: [C: 032] Replace code to generate labs root passwords. [puppet] - 10https://gerrit.wikimedia.org/r/303908 (owner: 10Andrew Bogott) [20:38:22] am I blind? [20:38:29] apergos: twentyafterfour: mira has some root-owned files too, and i know it did not a couple days ago, and we did not use it for deployment.. hmm [20:38:30] oh I see [20:38:50] twentyafterfour: nvm, all good. [20:38:51] that's interesting [20:39:05] can it be scap itself somehow? [20:39:06] mutante: scap syncs files between deployment masters [20:39:07] maybe it can be tracked from the times on those [20:39:12] RECOVERY - Unmerged changes on repository mediawiki_config on tin is OK: No changes to merge. [20:39:16] yes there was the tin -> mira rsync [20:39:20] twentyafterfour: it actually does look like it synced them from tin to mira then, yea [20:39:35] scap-sync-file [20:39:39] do it fast because the next rsync will "fix" it [20:40:08] August 9, 20:37 [20:40:15] 3 minutes ago :p [20:40:18] uh, I'm syncing now [20:40:20] :-D [20:40:27] well if there is an rsync in progress [20:40:31] so maybe rsync is creating new files as root instead of syncing the ownership metadata? [20:40:39] then you have to wait for it to complete, some files will be root-owned when it's in process [20:40:41] !log legoktm@tin Synchronized wmf-config/InitialiseSettings.php: Fix wgAddGroups/wgRemoveGroups for mediawikiwiki - T142492 (duration: 00m 50s) [20:40:42] T142492: Can't add to autochecked group on mediawiki.org - https://phabricator.wikimedia.org/T142492 [20:40:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:40:47] mafk: ^ [20:40:49] (03PS6) 10Ori.livneh: graphite: Set xFilesFactor to 0 for sum/count. [puppet] - 10https://gerrit.wikimedia.org/r/300911 (owner: 10Krinkle) [20:40:51] Nemo_bis: fixed [20:40:51] but then it would have to do that _just_ for some files in .git/objects and nothing else [20:40:58] legoktm: can't thank you enough [20:40:59] i think it's always been in .git [20:41:02] and never outside that [20:41:09] mafk: no problem [20:41:11] RECOVERY - Unmerged changes on repository mediawiki_config on mira is OK: No changes to merge. [20:41:16] it could die before completing but that would have to be mira->tin [20:41:18] wouldn't it? [20:41:30] (03CR) 10Ori.livneh: [C: 032] graphite: Set xFilesFactor to 0 for sum/count. [puppet] - 10https://gerrit.wikimedia.org/r/300911 (owner: 10Krinkle) [20:41:37] it syncs both ways doesn't it? [20:41:43] it does? [20:41:48] or do we only ever sync from tin->mira? [20:41:49] that sounds.... problematic [20:41:51] I'm not sure [20:42:07] it does sound problematic, maybe I'm imagining it [20:42:14] someone could either look at the syslogs, or the cron jobs on the hosts, ior puppet [20:42:21] someone with slightly more brain left [20:42:45] Aug 9 20:00:01 mira CRON[28276]: (root) CMD (/usr/bin/rsync -avz --delete tin.eqiad.wmnet::trebuchet_server /srv/deployment > /dev/null 2>&1) [20:42:55] yes. tin -> mira [20:43:04] rsync does not use "p" [20:43:19] it uses -a [20:43:21] apergos: look for what? [20:43:43] ori: you might know off the top of your head, what sets up the sync between tin and mira [20:43:47] oh, look at this: [20:43:55] tin php: #012Fatal error: Uncaught exception 'Exception' with message 'Problem with command: git log --format=medium --cherry-pick --right-only --no-merges wmf.... [20:43:58] scap does the rsync [20:44:11] problem with git command? well... [20:44:28] LIKELY CAUSE: you need to run 'git fetch --all' in a sub directory' [20:44:33] yes, scap keeps them in sync [20:44:34] (03PS4) 10Paladox: Gerrit: make auth_type configurable for labs [puppet] - 10https://gerrit.wikimedia.org/r/303355 [20:45:08] I like the look of that mutante, sounds like a possibility [20:45:13] ori: we are trying to find out why we get some root owned files in .git/objects on a regular basis [20:45:14] timestamp? [20:45:31] Aug 9 20:40:42 [20:45:49] but that was the deployment that 20after4 just ran [20:46:07] so are there files left over on tin owned wrongly? [20:46:09] ..after we fixed the root-owned files [20:46:09] is there an example file you can point me to? [20:46:10] mutante: that's probably from me running update-deploy-notes [20:46:11] right now?? [20:46:14] scap opens an ssh connection from the deploy server to the co-master. A script to rsync /srv/mediawiki-staging is run on the co-master with the deploy server as the origin [20:46:34] bd808: so only tin->mira, this is what I expected. thanks, also thanks twentyafterfour and ori [20:46:34] ^ [20:46:48] mm, /srv/mediawiki-staging/.git/objects/40/c4f938d57e5d5bcaf36006dd4f3feef57f62d1 [20:46:51] so we can leave out rsyncs as being at all related [20:47:00] now? root owned? [20:47:09] apergos: right, just "where you type scap" to "the other co-master" [20:47:10] -r--r--r-- 1 root root 221 Aug 9 20:23 /srv/mediawiki-staging/.git/objects/40/c4f938d57e5d5bcaf36006dd4f3feef57f62d1 [20:47:13] ori: it's always a handful of seemingly random files in .git/objects/ [20:47:14] ooohhhh that's goood [20:47:15] (03PS1) 10Andrew Bogott: Make labs root passwords longer, but alphanumeric [puppet] - 10https://gerrit.wikimedia.org/r/303911 [20:47:40] ori: but afaict it's never outside .git [20:47:42] on tin, right? [20:47:50] it happens on both tin and mira [20:47:51] not mira, I want to ignore mira for the moment [20:47:54] ok [20:48:00] 06Operations, 06Services, 06Services-next, 05Security: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#2538145 (10ssastry) Just to be clear, I am not opposed to the new service. The output looks good and we've always known that browser-ba... [20:48:11] fwiw I didn't run any operations on mediawiki-staging since deploying the train [20:48:41] which was 20 minute ago [20:49:03] and I shouldn't have any way to make things owned by root [20:49:44] puppet runs "/Deployment::Deployment_server/Exec[deployment_server_sync_all" [20:49:44] mwdeploy can run scap-master-sync as root [20:50:17] (03PS2) 10Andrew Bogott: Make labs root passwords longer, but alphanumeric [puppet] - 10https://gerrit.wikimedia.org/r/303911 [20:50:24] per modules/scap/manifests/master.pp [20:50:29] eh, yeah, scap-master-sync is running rsync -avz which should preserve group owner and permissions. [20:50:39] 68 # Allow rsync of common module to mediawiki-staging as root. [20:50:39] 69 # This is for master-master sync of /srv/mediawiki-staging [20:50:39] (03CR) 10Andrew Bogott: [C: 032 V: 032] Make labs root passwords longer, but alphanumeric [puppet] - 10https://gerrit.wikimedia.org/r/303911 (owner: 10Andrew Bogott) [20:52:27] there could be something there that we didn't consider with the rsync, but that also doesn't account for anecdotal recent increase in objects owned by root. Deployment_server_sync_all should just be a salt thing /srv/deployment, IIRC. [20:52:36] ^ I bet ori's onto something [20:53:01] hmnm [20:53:08] preserving file ownership is famously hairy: you can run into issues when the uid / names mismatch [20:53:23] preserving file ownership with rsync, I mean [20:53:31] but ordinarily you'd get a numeric uid on the target [20:53:44] you'd more likely get root:root if something died/was interrupted [20:53:50] ah, yeah, I suppose we did run into problems with mwdeploy not having the same uuid on some servers. [20:53:51] don't we periodically have trouble with uids/gids due to ldap flakyness? [20:53:55] er uid [20:54:06] yea, when that UID part fails you'd usually get a random number [20:54:10] I've really only seen that on labs.... [20:54:18] but not 0 [20:54:56] most likely cause, IMO, is rsync creates the file as root and then fails to sync the ownership bits. [20:55:05] not sure HOW that happens but I bet that's what happens [20:55:20] that's my best guess but I really want to catch the job doing it [20:55:22] well [20:55:26] I want _someone_ to catch it :-D [20:55:31] :) [20:55:38] fwiw, wikidev does have the same gid on tin and mira. [20:56:27] yep,also kind of remember we made sure when creating mira [20:56:34] there is that "UID" page on wikitech [20:56:46] to "reserve" them [20:57:44] somewhere on phab there is an epic length explanation of why and how scap's rsync uses uids rather than names [20:58:11] where does scap-master-sync run? [20:58:18] on tin, or mira, or both? [20:58:43] typically on mira. technically on the co-master that scap is not being run from [20:59:08] 68 # Allow rsync of common module to mediawiki-staging as root. [20:59:43] here's the gory details about the numeric-id thing -- https://phabricator.wikimedia.org/T119165#1825437 [21:00:54] i think that is another issue and we fixed it [21:01:06] since the l10nupdate user meanwhile has the same UID [21:02:08] !log cleaned up stale branch php-1.28.0-wmf.7 [21:02:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:03:18] it is probably relevant that the object that was owned by root was the commit that twentyafterfour made to bump group0. Also maybe relevant that I don't think sync-wikiversions runs sync-masters-rsync [21:04:27] outstanding find on the object contents thcipriani [21:05:18] and that .git/objects/40/c4f938d57e5d5bcaf36006dd4f3feef57f62d1 is still owned by root on mira [21:05:45] (03PS1) 10Legoktm: Configure 'sourceUrl' for ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303914 [21:06:07] both on mira and tin. nice [21:06:29] so: possible that there is some process on mira to autoupdate the non-active scap master that then runs scap-master-sync from mira? [21:06:44] just sayin, there are 1,000+ pages tagged for speedy deletion in foundationwiki... [21:07:44] (03CR) 10Legoktm: [C: 032] Configure 'sourceUrl' for ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303914 (owner: 10Legoktm) [21:07:49] thcipriani: I thought there was a cron job or something in puppet to do that [21:07:57] * twentyafterfour can't remember exactly [21:08:10] (03Merged) 10jenkins-bot: Configure 'sourceUrl' for ExtensionDistributor [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303914 (owner: 10Legoktm) [21:08:16] l10nupdate [21:08:49] l10update never runs as root and uses scap to sync changes now [21:09:19] (03PS3) 10Reedy: Load WikimediaMessages via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303014 (https://phabricator.wikimedia.org/T140852) [21:09:21] (03PS1) 10Reedy: 6 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303915 (https://phabricator.wikimedia.org/T139800) [21:09:43] (03CR) 10Reedy: [C: 04-1] ".14 needs to be everywhere for this one and it's now dependant :)" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303915 (https://phabricator.wikimedia.org/T139800) (owner: 10Reedy) [21:09:54] !log legoktm@tin Synchronized wmf-config/CommonSettings.php: Configure 'sourceUrl' for ExtensionDistributor (duration: 00m 50s) [21:09:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:17:07] I wonder why if that file is still wrongly ownedI do't see a soft alert out of icinga [21:17:20] (03PS2) 10Reedy: 8 more to extension.json in extension-list [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303915 (https://phabricator.wikimedia.org/T139800) [21:17:21] does it only check tin? [21:17:22] (03PS4) 10Reedy: Load WikimediaMessages via wfLoadExtension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303014 (https://phabricator.wikimedia.org/T140852) [21:17:28] well it was wrongly owned on both [21:17:33] ah, nvm me :) [21:17:42] * greg-g goes back to managery things :P [21:17:55] I'm gonna go back to after-midnight things [21:18:02] it's just my ocd getting me [21:19:00] apergos: you mean the file called "readonly"? [21:19:10] that is because i explicitely made the check ignore that one [21:19:24] mutante: no the [21:19:26] /srv/mediawiki-staging/.git/objects/40/c4f938d57e5d5bcaf36006dd4f3feef57f62d1 [21:19:28] that one [21:19:43] uhm...ok [21:20:13] still root:root right now [21:20:14] 06Operations, 06Labs: Enable root passwords on Labs VMs - https://phabricator.wikimedia.org/T142216#2538272 (10Andrew) 05Open>03Resolved [21:20:21] I got no brain cells left for it though [21:20:50] lol [21:20:53] next time I do and I'm around for a deployment maybe I'll check the state of objects after every command [21:20:55] (03PS1) 10Andrew Bogott: Disable instance consoles in Horizon. [puppet] - 10https://gerrit.wikimedia.org/r/303916 [21:21:00] at least track it down to which thing sets it off [21:22:16] (03PS1) 10Andrew Bogott: Add labspice hostname, directed to misc-web [dns] - 10https://gerrit.wikimedia.org/r/303917 [21:23:53] (03CR) 10Andrew Bogott: [C: 032] Disable instance consoles in Horizon. [puppet] - 10https://gerrit.wikimedia.org/r/303916 (owner: 10Andrew Bogott) [21:25:04] 06Operations, 06Services, 06Services-next, 05Security: Productize the Electron PDF render service & create a REST API end point - https://phabricator.wikimedia.org/T142226#2538295 (10GWicke) > I don't know who will make the call for turning off OCG in favour of this new thing (unless it comes from users th... [21:26:11] hmm only other thing owned by root is the "readonly" remote. [21:27:01] so just that one commit object [21:27:09] dammit why am I still reading along in here [21:27:19] * apergos checks out for real this time (sorry) [21:28:08] hmm https://github.com/wikimedia/operations-puppet/blob/production/modules/role/manifests/deployment/server.pp#L66-L71 [21:28:47] (03PS1) 10Andrew Bogott: Set up 'labspice' in misc-web [puppet] - 10https://gerrit.wikimedia.org/r/303918 [21:28:53] thcipriani: the "readonly" remote file has an exception in the icinga check because it seems like that will always be there [21:29:49] scap does not work for sync. tool labs tools, right? [21:31:11] i think it might be /usr/local/lib/nagios/plugins/check_mediawiki_config-needs-merge [21:31:15] hrm so https://github.com/wikimedia/operations-puppet/blob/production/modules/monitoring/templates/check_git-needs-merge.erb runs as root [21:31:20] heh [21:31:21] nice [21:31:28] i was first! [21:31:35] su - $git_user -c "cd ${basedir} && git remote update readonly > /dev/null 2>&1" [21:31:38] :P [21:31:41] where git_user=root [21:31:56] yeah, objects would be the same even though remote is different [21:32:02] yes [21:32:09] that's your culprit [21:32:16] ^5 [21:32:26] (high-five) [21:32:30] oh I got it. [21:33:30] icinga checks should not have such side effects [21:33:33] (03CR) 10Andrew Bogott: [C: 032] Add labspice hostname, directed to misc-web [dns] - 10https://gerrit.wikimedia.org/r/303917 (owner: 10Andrew Bogott) [21:33:52] i would just get rid of the check, tbh. [21:34:18] (03CR) 10Andrew Bogott: [C: 032] Set up 'labspice' in misc-web [puppet] - 10https://gerrit.wikimedia.org/r/303918 (owner: 10Andrew Bogott) [21:35:40] (03PS1) 10Andrew Bogott: Turn on spiceproxy and consoleauth on the labs controller. [puppet] - 10https://gerrit.wikimedia.org/r/303921 [21:35:44] mafk: correct. there is no deploy server to use scap from in tool labs [21:36:33] mafk: but tool labs uses NFS, so scap's distributed sync mechanism isn't really too useful there [21:37:28] check context: https://phabricator.wikimedia.org/T83854 [21:37:30] (03CR) 10Andrew Bogott: [C: 032] Turn on spiceproxy and consoleauth on the labs controller. [puppet] - 10https://gerrit.wikimedia.org/r/303921 (owner: 10Andrew Bogott) [21:38:20] why is git_user not mwdeploy? [21:38:29] thcipriani: that's right, you can blame me :) [21:39:42] :) [21:40:05] way ahead of you, boss ;) [21:40:59] next weeks season finale root will find out they are not the father of trebuchet's baby and scap3 is adopted [21:41:02] I still don't like the idea of an icinga check making changes to .git in a deployment directory [21:41:33] +1 to that [21:41:34] lol [21:41:39] (yes I'm still lurking in here dammit) [21:41:48] apergos: I was worried you missed the exciting conclusion by 4 minutes [21:41:50] git ls-remote should be able to get you the remote head sha1 without making any changes locally [21:41:59] nah couldn't help myself [21:42:43] this is true. Although, I wonder why the setgid permission thing on /srv/mediawiki-staging isn't working in this instance? [21:43:11] apergos, lol, it is only 22:43pm bst for me :) [21:43:13] bd808: right, thanks [21:44:19] It is possible to use scap/scap3 in a Labs project by setting up a deploy server [21:44:43] I even wrote a wiki page about it recently -- https://wikitech.wikimedia.org/wiki/User:BryanDavis/Scap3_in_a_Labs_project [21:45:24] what setgid thing? [21:46:13] This URL returns an error https://en.wikipedia.org/wiki/?diff=733743327 [21:46:22] permissions for the /srv/mediawiki-staging dir are drwxrwsr-x mwdeploy wikidev [21:46:23] Looks like a performance issue [21:47:19] halfak: error? not sure I want to click if it is a bad perf issue :) [21:47:44] greg-g, ;) Fair point. I'm looking into what's up with that revisions via quarry [21:47:50] ES? [21:48:00] I mean, what error do you get? varnish timeout? [21:48:43] My random bet it's a wikidiff2 edge case problem [21:48:51] Looks like that revision has 1m chars in it [21:48:57] eek [21:48:58] Varnish 503 [21:49:16] ListeriaBot "Wikidata list updated" [21:49:57] It's this page: https://en.wikipedia.org/wiki/User:Jane023/Female_RKDartists [21:51:06] (03PS11) 10BBlack: nginx (1.11.3-1+wmf2) jessie-wikimedia; urgency=medium [software/nginx] (wmf-1.11.3) - 10https://gerrit.wikimedia.org/r/303170 [21:51:19] OMG it loaded! [21:51:27] It is a giant table [21:57:59] (03PS1) 10BryanDavis: l10nupdate: aquire scap lock before changing files [puppet] - 10https://gerrit.wikimedia.org/r/303923 (https://phabricator.wikimedia.org/T72752) [22:00:43] (03CR) 10BryanDavis: l10nupdate: aquire scap lock before changing files (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/303923 (https://phabricator.wikimedia.org/T72752) (owner: 10BryanDavis) [22:01:04] (03PS1) 10Andrew Bogott: Revert "Replace code to generate labs root passwords." [puppet] - 10https://gerrit.wikimedia.org/r/303924 [22:02:23] (03CR) 10Andrew Bogott: [C: 032] Revert "Replace code to generate labs root passwords." [puppet] - 10https://gerrit.wikimedia.org/r/303924 (owner: 10Andrew Bogott) [22:04:37] 06Operations, 06Labs: Enable root passwords on Labs VMs - https://phabricator.wikimedia.org/T142216#2538411 (10Andrew) 05Resolved>03Open [22:04:52] 06Operations, 06Labs: Enable root passwords on Labs VMs - https://phabricator.wikimedia.org/T142216#2527493 (10Andrew) [22:09:31] 06Operations, 06Security-Team, 10vm-requests: provide ganeti VM for security team sectools - https://phabricator.wikimedia.org/T138650#2538429 (10Dzahn) [22:21:21] (03PS1) 10Ladsgroup: ores: puppetize number of celery workers [puppet] - 10https://gerrit.wikimedia.org/r/303928 [22:24:43] 06Operations, 06Community-Tech, 10wikidiff2, 13Patch-For-Review: Deploy new version of wikidiff2 package - https://phabricator.wikimedia.org/T140443#2538520 (10Legoktm) I poked ops-l about this today. [22:29:43] (03PS9) 10Mattflaschen: Change login cookies (for 'Remember me') to a one year expiry. [mediawiki-config] - 10https://gerrit.wikimedia.org/r/230954 (https://phabricator.wikimedia.org/T68699) [22:34:31] mutante: Hey, it would be great if you review this: https://gerrit.wikimedia.org/r/303928 if you have some free time [22:34:38] not in a rush [22:38:08] 06Operations, 06Community-Tech, 10hardware-requests: Acquire new hardware for hosting cross-wiki watchlist database - https://phabricator.wikimedia.org/T142538#2538578 (10kaldari) [22:42:08] PROBLEM - puppet last run on mw2192 is CRITICAL: CRITICAL: puppet fail [22:49:35] !log re-enabling cr2-eqiad:xe-5/2/3 (link to cr2-codfw); vendor reported it as fixed [22:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:55:03] 06Operations, 06Community-Tech, 10MediaWiki-CrossWikiWatchlist, 10hardware-requests, 07Crosswiki: Acquire new hardware for hosting cross-wiki watchlist database - https://phabricator.wikimedia.org/T142538#2538613 (10kaldari) [23:00:04] RoanKattouw, ostriches, MaxSem, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160809T2300). [23:00:04] Jdlrobson: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:01:24] hm, guess I can do it... [23:01:53] jdlrobson, yt? [23:02:00] \o [23:02:28] MaxSem: yup [23:02:53] waaait, dafuq is going on in prod? [23:03:09] Warning: Creating default object from empty value in /srv/mediawiki/wmf-config/CommonSettings.php on line 686 [23:03:15] gotta fix that first [23:03:41] Amir1: i can do that now. are you here to check [23:03:44] MaxSem: legoktm merged soemthing recently [23:03:51] to fix mediawiki permissions [23:03:52] mutante: yes [23:03:52] .org [23:03:59] thanks [23:04:21] https://gerrit.wikimedia.org/r/#/c/303003/ but that's IS only [23:04:23] (03CR) 10Dzahn: [C: 032] ores: puppetize number of celery workers [puppet] - 10https://gerrit.wikimedia.org/r/303928 (owner: 10Ladsgroup) [23:04:40] er sorry that's the jdlrobson patch url [23:04:44] Amir1: it's active on the puppet master now [23:04:47] (03PS2) 10Jdlrobson: Promote new language switcher to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303003 (https://phabricator.wikimedia.org/T129505) [23:04:58] (03CR) 10jenkins-bot: [V: 04-1] Promote new language switcher to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303003 (https://phabricator.wikimedia.org/T129505) (owner: 10Jdlrobson) [23:05:03] https://gerrit.wikimedia.org/r/#/c/303856/ [23:05:09] thanks [23:05:15] I check right now [23:05:18] (03PS3) 10Jdlrobson: Promote new language switcher to all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303003 (https://phabricator.wikimedia.org/T129505) [23:06:58] mutante: I'm in scb1001 but I don't have access to run puppet agent there [23:07:14] I think I should request for that access [23:07:26] oh, hmm [23:07:28] i am running it [23:07:46] + CELERYD_CONCURRENCY: 24 [23:07:51] Reedy: you recently switched timeline to wfLoadExtension? [23:08:10] Amir1: it doesnt look like puppet restarted a service [23:08:15] should it? [23:08:24] Dereckson: Yeah, it was merged for .14 [23:08:38] Reedy: Warning: Creating default object from empty value in /srv/mediawiki/wmf-config/CommonSettings.php on line 686 [23:08:46] okay, I restart it manually [23:08:57] I won't matter since it's the old number [23:09:02] Dereckson: It might be worth just suppressing it [23:09:07] RECOVERY - puppet last run on mw2192 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [23:09:19] Amir1: scb1002 ? [23:09:20] Dereckson: https://gerrit.wikimedia.org/r/#/c/303303/ needs to be deployed when .14 is everywhere [23:09:22] delete all the $wgTimelineSettings or create it as a stdClass ? [23:09:32] in scb1001 right now [23:09:39] I'll probably do both [23:09:49] Dereckson: Unless we backport the fixes to .13 [23:10:00] it's either @, or stdClass [23:10:06] Amir1: i ran puppet on scb1002 but the change must have been applied already, nothing happened [23:10:16] also we have 2001 and 2002, doing them next [23:10:20] I'm not sure if it's worth backporting 3 changes [23:10:23] https://github.com/wikimedia/mediawiki-extensions-timeline/commits/master [23:10:44] the codfw doesn't really matter, doesn't get traffic [23:11:05] !log sudo service celery-ores-worker restart in scb1001 [23:11:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:11:49] nope, I'm just reverting [23:11:49] !log ladsgroup@scb1002:~$ sudo service celery-ores-worker restart [23:11:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:11:56] Amir1: we should still do it. the combination of puppet not restarting it but applying config changes.. some day in the future it will get traffic and be restarted and then there is a surprise change [23:12:22] let's make puppet to the service restart [23:12:39] (03PS1) 10Dereckson: Fix $wgTimelineSettings issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303936 [23:12:40] Reedy: MaxSem: ^ [23:12:51] okay, I will restart the celery worker in codfw nodes too [23:12:52] :) [23:12:54] thanks [23:13:44] !log ladsgroup@scb2001:~$ sudo service celery-ores-worker restart [23:13:44] ok, cool. the changes have been applied [23:13:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:13:49] (03CR) 10MaxSem: [C: 032] Fix $wgTimelineSettings issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303936 (owner: 10Dereckson) [23:13:58] let's test it [23:14:16] (03Merged) 10jenkins-bot: Fix $wgTimelineSettings issue [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303936 (owner: 10Dereckson) [23:14:44] MaxSem: Why revert Timeline -> timeline? [23:15:01] to revert the previous commit ;) [23:15:20] I'll abandon if Dereckson's fix works [23:15:33] !log ladsgroup@scb2002:~$ sudo service celery-ores-worker restart [23:15:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:19:17] PROBLEM - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 10.65.0.24 [23:21:06] RECOVERY - Juniper alarms on asw-d-eqiad.mgmt.eqiad.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [23:21:54] MaxSem: did you work it out? [23:22:07] still doing [23:22:14] k anything I can help with? [23:22:17] nope [23:22:36] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/303936/ (duration: 00m 49s) [23:22:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:23:47] PROBLEM - MediaWiki exceptions and fatals per minute on graphite1001 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [50.0] [23:24:08] for fuck's sake... [23:24:29] Notice: Undefined variable: wgContactConfig in /srv/mediawiki/wmf-config/CommonSettings.php on line 968 [23:25:44] because of the merge [23:25:47] RECOVERY - MediaWiki exceptions and fatals per minute on graphite1001 is OK: OK: Less than 1.00% above the threshold [25.0] [23:26:03] so... you deployed stuff [23:26:12] have you looked at errors? [23:26:45] That's nothing to do with my deploy [23:26:48] greg-g, we've got multiple extensions broken ^^^ [23:26:51] Neither of those are [23:26:59] It's from the train deploy [23:27:25] okay, merges [23:27:29] (03PS2) 10Gergő Tisza: Remove AbuseFilter B/C config [mediawiki-config] - 10https://gerrit.wikimedia.org/r/293109 [23:27:32] https://gerrit.wikimedia.org/r/#/c/298084/ breaks prod [23:28:01] Due to crappy way the config is built up [23:28:17] Easy to fix [23:28:33] yeah... so shit broke for the second time... https://gerrit.wikimedia.org/r/#/c/228602/ [23:28:59] Different reasons [23:29:41] Warning: Creating default object from empty value in /srv/mediawiki/wmf-config/CommonSettings.php on line 686 is still a thing btw [23:29:53] (03PS1) 10Alex Monk: dynamicproxy: Add nginx config to redirect www.wmflabs.org/wmflabs.org to wikitech [puppet] - 10https://gerrit.wikimedia.org/r/303938 (https://phabricator.wikimedia.org/T38885) [23:30:05] Notice: Undefined property: stdClass::$timelineFile in /srv/mediawiki/php-1.28.0-wmf.13/extensions/timeline/Timeline.php on line 147 [23:30:23] Backport the extension registration [23:30:32] And the removal of the stupid TimelineSettings class [23:32:33] and for ContactPage? $wgContactConfig['default'] = array_merge( $wgContactConfig['default'], $wmgContactPageConf ); [23:32:42] Wait [23:33:45] not the ideal to have in one wmf branch a way to configure the extension, in another one another way [23:34:12] * MaxSem reads on reverting wmf branches [23:34:30] I guess the answer is tough shit [23:34:33] Do both [23:34:36] Or put in a thing [23:34:47] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Add marktraceur to statistics-privatedata-users for access to stat1002 - https://phabricator.wikimedia.org/T140132#2538734 (10Dzahn) 05Open>03stalled [23:34:48] perhaps for the next extension transitions, would it be a good idea to make CS aware of the version, so we can use an if [23:35:07] (03PS1) 10Gergő Tisza: Remove $wgDisableAuthManager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303939 [23:35:14] (03PS1) 10Reedy: Define default ContactPage config until it's updated properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303940 [23:35:25] ^ I don't see why people see solving problems as so difficult [23:35:26] It's not [23:35:42] (03CR) 10jenkins-bot: [V: 04-1] Define default ContactPage config until it's updated properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303940 (owner: 10Reedy) [23:35:52] Well, if I didn't syntax fail [23:36:10] (03PS2) 10Gergő Tisza: Remove $wgDisableAuthManager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303939 [23:36:17] also, you're still not creating wgContactConfig fully [23:36:26] (03PS2) 10Reedy: Define default ContactPage config until it's updated properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303940 [23:37:34] (03CR) 10MaxSem: [C: 04-1] "Please set a default value for $wgContactConfig" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303940 (owner: 10Reedy) [23:38:10] (03PS3) 10Reedy: Define default ContactPage config until it's updated properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303940 [23:40:47] (03CR) 10MaxSem: [C: 032] Define default ContactPage config until it's updated properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303940 (owner: 10Reedy) [23:41:13] (03Merged) 10jenkins-bot: Define default ContactPage config until it's updated properly [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303940 (owner: 10Reedy) [23:42:48] Niharika: do you have more than one wikitech user? [23:42:58] like a work and a volunteer or so? [23:43:06] (03PS3) 10Gergő Tisza: Remove $wgDisableAuthManager [mediawiki-config] - 10https://gerrit.wikimedia.org/r/303939 [23:43:53] is stuff still broken? [23:43:59] / how can I help? [23:44:11] (03CR) 10Dzahn: "oh, hmm.. there are 2 users. niharika and niharika29, same email address, different users" [puppet] - 10https://gerrit.wikimedia.org/r/303543 (https://phabricator.wikimedia.org/T141593) (owner: 10Ema) [23:44:12] MaxSem: ? [23:44:23] !log maxsem@tin Synchronized wmf-config/CommonSettings.php: https://gerrit.wikimedia.org/r/#/c/303940/ (duration: 00m 49s) [23:44:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:44:47] legoktm, multiple errors due to wfLoadExtension migration [23:45:00] (03CR) 10Dzahn: "[terbium:~] $ ldaplist -l passwd niharika | grep uid" [puppet] - 10https://gerrit.wikimedia.org/r/303543 (https://phabricator.wikimedia.org/T141593) (owner: 10Ema) [23:46:13] now to backport timeline.... [23:47:06] delete the branch in gerrit, cherry pick the commit ontop of master, push the commit branch back in, update the mw core branch [23:47:50] look down, enjoy feet shot off... :P [23:47:52] 06Operations, 10Deployment-Systems: error on tin:/srv/mediawiki-staging: insufficient permission for adding an object to repository database .git/objects - https://phabricator.wikimedia.org/T127093#2538766 (10Dzahn) @Ori and @thcipriani figured it out 14:37 < ori> i think it might be /usr/local/lib/nagios/plu... [23:48:14] or just checkout the .14 commit :P [23:49:38] or press cherry pick in gerrit n times [23:50:00] grrr, git pull updates submodules.... [23:59:10] Reedy, Dereckson, legoktm: https://gerrit.wikimedia.org/r/303947