[09:05:21] Ruh roh - deleting anything in the user space on en-wiki is giving me database errors... [09:05:43] [Wsx@EApAAD0AAHyF31cAAADM] 2018-04-10 09:05:16: Fatal exception of type "Wikimedia\Rdbms\DBQueryTimeoutError" [09:06:29] what page were you checking and how? [09:06:57] https://en.wikipedia.org/wiki/User:Grepthorsoft [09:07:03] Just the standard page deletion [09:07:17] jynus: Use U5 as the reason if you're trying to reproduce it [09:07:31] I don't have delete rights on enwiki [09:07:33] It did it to me on another user page just a few minutes ago as well... [09:07:39] Well shitt [09:07:52] how many edits were on that page? [09:07:57] (revisions) [09:08:07] Just one [09:08:13] that is strange [09:08:22] Same with the other I was trying to delete, too [09:08:25] I've seen it failing on thousands of revisions [09:08:31] not with just 1 [09:08:43] Yeah, if there's a lot of them [09:08:50] Sure, it could time out or cause some hold-up [09:09:02] But yeah this is why I brought it up here [09:09:09] Cause it shouldn't be giving this problem [09:09:23] Especially with two different user pages I've tried to delete and with so few revs [09:10:37] did you click delete multiple times? I can see it retrying many times [09:13:11] Yes [09:13:24] I'm trying to delete 'Decentraland' now [09:13:29] Got the same error [09:13:31] ok, so I am just been told [09:13:42] there is some maintenance ongoing on archive [09:14:06] will be finished in ~5 minutes [09:14:15] Okay, I'll try the deletions then [09:14:20] Thanks for the info jynus [09:14:23] this should not happen anyway [09:14:27] so I will monitor it [09:14:51] kk [09:14:54] I was wondering that too [09:15:05] sorry about that, I will check it as a priority [09:15:15] Whether this "archive maintenance" you're speaking of would even involve page deletion? [09:15:21] jynus: No apologies needed [09:15:24] You're helping me out a lot [09:15:29] yeah, I will explain myself better [09:15:35] Np [09:15:45] deleted editions are moved to "archive" [09:15:50] aha [09:15:51] Okay [09:15:53] That makes sense [09:15:54] if they cannot be moved, the delete fails [09:16:02] And that db is under maintenance [09:16:13] yup I'm on the same page now ;-) [09:16:15] right now there seems to be some problems there, but the maintenance should normally not affect that [09:16:20] so this is strange [09:16:24] I see [09:16:30] this is better than losing data [09:16:34] but far from ideal [09:16:35] ^ [09:16:39] I agree [09:16:47] Yes, especially since I can't delete anything [09:17:07] attack pages, etc would remain public until fixed [09:17:25] I can't even move without redirect, since that deletes the source page and creates the target [09:17:46] (Sorry, thinking of temporary solutions aloud in case this happens...) [09:20:33] Oshwah: Want me to try? [09:20:51] can you check now? [09:20:55] maintenance is gone [09:20:56] jynus: One sec [09:21:10] Vermont: Nah, lets have one person do the testing so nobody gets confused ;-) [09:21:15] A'ight [09:21:27] jynus: Yup works now [09:21:35] The maintenance must've been what caused it [09:22:06] sorry about that, I will make sure I create an outage report [09:22:16] jynus: Not your fault; no need to apologize [09:22:28] And thanks for responding and keeping me updated :-D [09:22:31] we are responsible for this not to happen [09:22:35] Interesting issue... [09:22:49] jynus: Sure, but you don't have to apologize like it's your fault. [09:22:54] I'm a software engineer myself [09:22:57] Things happen [09:23:10] if we knew this was going to happen, we would have notified in advance [09:23:25] I've seen you guys do that before [09:23:31] With db maintenance and locking [09:23:38] But hey, nobody knew [09:23:40] some process got apparently stuck [09:23:47] Ah [09:23:56] I was gonna say, it'll be interesting to see what you find... [09:25:52] yes, followup at https://phabricator.wikimedia.org/T191875 [12:11:32] are cloud instances doing package updates automatically? [12:11:56] since my check alerted me several times that there are some, but when I took a look, there was nothing to do [12:13:34] Vps instances? [12:14:09] Sagan: on what project was that? [12:26:10] chicocvenancio: rcm [12:26:22] chicocvenancio: happend mutiple times already, that's why I'm asking [12:26:48] happens not with all upgrades, but with a lot [12:27:21] chicocvenancio: see here: https://phabricator.wikimedia.org/F16897639 [12:28:25] happend already in cases as well where more than one check said there are upgrades [12:29:37] I mean that's not a problem, I'm just wondering [12:32:53] * chicocvenancio investigates a bit [12:35:48] Sagan: yes if you look at modules/apt/manifests/unattendedupgrades.pp in ops/puppet updates across distro/wmf packages/security are automatic [12:35:50] chicocvenancio: ^ [12:35:58] you can turn off wmf and distro updates but not security [12:36:31] chasemp: I understood those to not be security updates [12:36:43] they are not, but all are on by default [12:36:53] security are just the only ones you can't turn off [12:36:59] if you set a key like profile::base::labs::unattended_wmf: 'absent' [12:36:59] chasemp: ah, ok, thx [12:37:07] wmf packages are not managed by unattended [12:37:12] chasemp: how often are those processed? [12:37:19] I think it's nightly? [12:37:29] fairly sure daily cron [12:38:35] hm, ok [12:50:10] hmmm [12:50:47] does something about this home dir on a newly created instance look weird? [12:50:49] https://usercontent.irccloud-cdn.com/file/s6RlIILl/image.png [12:50:51] or is that expected? [12:51:46] addshore: hm, on my new instance (3 days ago created) only I'm in that home dir [12:52:21] addshore: sure you got the right instance? (if you ssh to login-test and they are is one at a second project as well?) [12:52:43] login-test is not a good name for a cloud-vps instance [12:52:54] it could have been cached [12:52:55] Why are some of the dirs only owned by uids not by users with names? O_o also, why are the permissions all different (I guess those are my 2 questions) [12:53:02] from the last time it was created with that name [12:53:05] the names do colide accross projects [12:53:14] ^ that's why I've meant [12:53:59] so login-test is probably an instance in another project [12:54:17] So, this is specifically login-test.catgraph.eqiad.wmflabs [12:54:19] addshore: if you ssh with the full name? login-test.projectname.eqiad.wmflabs ? [12:54:34] addshore: dns is not respected [12:55:17] And the issue is "johannesk" can't log in to any instances he is creating [12:56:17] addshore: they can't login to login-test or to other instances as well? [12:56:23] other instances [12:56:26] only new instances [12:56:31] they can still log into old instances [12:56:37] can other users login? [12:57:43] i only see login-test.catgraph.eqiad.wmflabs named as `login-test`across cloud vps, so this might be from an old instance with the same name [12:57:58] addshore: in catgraph vps project? [12:59:53] chicocvenancio: indeed [13:13:23] addshore: is that the shell user? [13:13:56] "johannesk" is the home dir, so I assume it is also the shell user [13:13:59] I'm in another meeting now :) [13:14:25] there is no user with that shell [13:16:17] and I find nothing in ldap for that cname or uid [13:16:31] maybe it was a user that was renamed at one point? [13:16:33] sorry, the shell username is jkroll [13:26:23] addshore: can you ask for them to join here or contact me? It seems to be working and I don't see errors in the logs [13:27:47] also, the home dirs are like that because `/mnt/nfs/labstore-secondary-home` is mounted and symlinked to `/home` [13:53:30] chasemp: bd808: Checking to make sure https://phabricator.wikimedia.org/T189542 hasn't dropped off-radar. Seems like that may be broken right now following silver transition. [13:53:43] that = mediawiki cronjobs [13:54:44] Krinkle: I believe andrewbogott has a meeting to talk about that today or tomorrow but I'll let him confirm [13:54:54] thx [13:55:51] Krinkle: if you have a suggested fix that would be welcome. I don't really understand what those are :/ [13:56:10] me neither, I understand the mw side, but this is too much puppet to me. [13:56:18] I suspect bd808 would know. [13:56:37] also with regards to what the actual end-user impact is [13:56:38] Krinkle: It is on my radar [13:56:44] the name it wrong [13:56:57] but it is non-trivial to explain to them why [13:57:07] Krinkle: nothing is broken user facing [13:57:11] there is silver.dblist [13:57:18] that is why there is no hurry [13:57:21] jynus: ah, it's not trying to connect to a host called silver [13:57:21] but it is not right [13:57:29] nope, it is a dblist reference [13:57:33] It's using the dblist indirection, which stil has the old name [13:57:34] got it [13:57:35] Thanks :) [13:57:38] but it should not be called silver [13:57:43] Yeah, definitely [13:57:44] that is what I want to discuss with them [13:58:09] so we are pushing to get right and non-confusing already :-) [13:58:19] chicocvenancio: they are joining now! [13:58:54] Krinkle: inrelated, I send you and aaron an email, to see if you are ok with it [13:58:58] *un [13:59:04] chicocvenancio: >> JohannesK_WMDE [14:03:49] JohannesK_WMDE: are you having trouble logging in to an instance? [14:04:52] hi chicocvenancio. i can log in to my old instances, but not to several ones i recently created (1 week ago to 1 day ago) [14:05:27] $ ssh jkroll@wmde-wikidiff2-mobile-dev.eqiad.wmflabs [14:05:27] Permission denied (publickey). [14:07:39] ahhh wikidiff2 [14:08:17] * chicocvenancio checking [14:12:57] JohannesK_WMDE: are you logging in to other instances with that key? [14:15:16] JohannesK_WMDE: are you logging in to other instances with that key? [14:15:16] I see an error stating that your key type is `ssh-dss ` and that is not supported [14:16:09] and indeed I only see a `ssh-dss` key for your ldap user [14:50:28] I will let you know when I see JohannesK_WMDE and I will deliver that message to them [14:50:28] @notify JohannesK_WMDE it seems your ssh key is `ssh-dss` and that is not supported on stretch. I recommend you create a new key and associate that to your account. [15:02:54] addshore: the problem is in the type of the user's key. and did you see the previous message about the home dir? [15:53:32] JohannesK_WMDE: got the message? If you have any doubts I can help [15:56:41] hey chicocvenancio: got the message. that makes sense. thanks a lot!! [19:37:41] !log video depool v2ccelery on encoding03 [19:37:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Video/SAL [23:13:52] Was there some labs maintenance suddenly? Many bots disappeared simuntaniously? [23:17:07] Matthew_: Networking issues (for prod as well). Recovered by now. [23:17:15] Awesome. Thanks.