[00:02:27] SMalyshev, James_F: logstash-beta being empty looks like a bug to me. Probably something to poke godog about related to his logging pipeline changes [00:03:02] bd808: Ah, yeah, new logging changes happened last week, good point. [00:03:44] if you open the 30 day view you can see several stairstep drops in volume and then noting after the 2nd [00:03:51] *nothing [00:05:45] bd808: T213129 [00:05:46] T213129: Beta Cluster ("deployment-prep") logstash has gone silent since 2019-01-02Z01:14:14 - https://phabricator.wikimedia.org/T213129 [00:06:28] thanks James_F :) [00:06:44] Any time. [00:08:50] James_F: thanks. Yeah no logs looks like a problem, and since fluorine gets them it's probably something missing in the pipeline I assume? [00:10:01] Yeah, I only know enough to parrot the phrase "migrate to a new system" and then scurry away. :-) [00:36:36] !help [00:36:36] Danny_B|webchat: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [00:36:58] what's up Danny_B|webchat? [00:37:24] bd808: since the move of bastion i can't no longer connect to my instance [00:37:52] bd808: using the process of https://wikitech.wikimedia.org/wiki/Help:Access_to_Cloud_VPS_instances_with_PuTTY_and_WinSCP [00:38:08] i am getting the following message> [00:38:25] proxy: FATAL ERROR: Disconnected: Server protocol violation: unexpected SSH2_MSG_UNIMPLEMENTED packet [00:38:34] hmm... [00:39:28] that sounds like an encryption algorithm order problem [00:39:31] i was connected earlier today. then disconnected. then tried to connect and no luck. changed nothing in any config [00:40:05] the new bastions are a different OS and version of openssh which could have changed something [00:40:20] Danny_B|webchat: what's the instance name? Let me see if I can get in as a starting point [00:40:24] i can connect to bastion itself [00:40:35] i can connect to login.tools [00:41:02] i can't just proxy via bastion using the process described on the link [00:41:18] bd808: wildcat/dannyb [00:42:27] Danny_B|webchat: there are 2 instances -- https://tools.wmflabs.org/openstack-browser/project/wildcat -- which exact one are you trying? [00:43:03] bd808: the one i wrote - no dash. debian [00:44:16] ok. I can connect to that one, but not the older instance. Let me see if I can find anything useful in the bastion ssh logs [00:45:21] i didn't try the old one (stretch), i can try it. in any case that one is scheduled to be deleted anyway [00:46:20] same issue [00:48:36] Danny_B|webchat: the auth.log on the bastion has that error message in it: "Server protocol violation: unexpected SSH2_MSG_UNIMPLEMENTED packet [preauth]" and then a disconnect. Nothing more informative. [00:48:53] that's what i have as well [00:49:42] Danny_B|webchat: I'm seeing some google hits that suggest upgrading PuTTY -- https://forums.freebsd.org/threads/putty-and-10-3.56668/ [00:50:44] And a blog post suggesting changing the preferred KEX cipher -- https://fcbrossard.net/blog/unexpected-SSH2-msg-unimplemented-packet [00:53:37] https://globalroot.wordpress.com/2018/01/18/putty-error-unexpected-ssh2_msg_unimplemented-packet/ [00:56:29] using `ssh -vvv` I see this offered by the server -- KEX algorithms: curve25519-sha256,curve25519-sha256@libssh.org,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1 [00:59:28] * bd808 -> afk [01:01:05] bd808: i use kitty actually, which is better fork of putty. anyway, i updated to the most recent and getting the same error msg [01:03:22] Danny_B|webchat: the problem is pretty certainly encryption cipher support and probably specifically that your client is trying to use an older Diffie-Hellman cipher [01:07:13] bd808: https://fcbrossard.net/blog/unexpected-SSH2-msg-unimplemented-packet this didn't help [01:07:54] The "preferred" KEX ciphers are curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha25 per my local config (trying to find docs on wikitech for that) [01:08:55] https://globalroot.wordpress.com/2018/01/18/putty-error-unexpected-ssh2_msg_unimplemented-packet/ this didn't help as well [01:09:26] well, how about config bastion back to accessible way? [01:17:18] i have diffie hellman 14 as first and still no luck [01:19:35] Danny_B|webchat: I can see that the config flags that used to allow weaker KEX and MAC algorithms are explicitly turned off. I don't want to turn them back on without knowing why/when that changed. [01:20:23] bd808: by what you said about ssh -vvv it seems it should allow dh14 [01:20:44] as it ends up with ...,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1 [01:22:29] bd808: weird is that i can successfully log in into bastion. i can't use the proxy [01:23:36] oh... hmmm. so the failure only comes when you try to use the bastion as a jump host? [01:29:39] Danny_B|webchat: sorry I didn't understand that part before. That makes me think the failure is in the config for plink.exe or whatever you use with kitty as the proxy command. [01:33:51] Danny_B|webchat: I have to go deal with life things. If you can't figure out how to beat your proxy command into submission please open a bug in phab and I'll try to look some more later tonight [01:34:37] bd808: yes, plink [01:34:54] or actually klink which is just the clone [01:34:58] from kitty suite [01:40:15] bd808: we set a privileged-level password policy for everyone on wikitech (and testwikitech), but wfGetPrivilegedGroups does not mark users as privileged (that's mostly used for logging potentially suspicious things, like failed login attempts) [01:40:19] any thoughts about that? [01:44:57] context is https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/481115#message-e174834bd7a0760f4e44dc1c557e7b3e920e9653 [02:30:39] tgr: so $user->getGroups doesn't work somehow on wikitech? [02:31:17] bd808: that's unrelated [02:31:36] we had some rules like fishbowl => [ 'user' ] [02:32:00] which didn't work because getGroups() doesn't include 'user', you need getEffectiveGroups() for that [02:32:03] wikitech is a non-SUL wiki so that function boils down to array_intersect( $wmgPrivilegedGroups, $user->getGroups() ) [02:32:20] but wikitech does not have such a rule in the first place [02:33:30] so for say officewiki, the '+private' => ['user'] row will be merged with the default, making everyone privileged [02:33:42] well, once the getGroups bug is fixed [02:34:09] for wikitech it will just use the default, ie. only sysop and upwards is privileged [02:34:25] I'm just wondering if we want to change that [02:36:10] who is "we" here :) I don't want to get yelled at by MZ for flagging their password as weak. I already get yelled at enough for Cloud VPS project adminship requiring 2FA [02:37:28] * bd808 re-reads tgr's first message and finally sees the subtle bit he missed [02:38:22] I think it vaguely makes sense to have "priv'ed" requiements for all wikitechwiki users. [02:38:50] But bd808's point is well made. :-) [02:39:32] password policy is set at https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/481115/11/wmf-config/CommonSettings.php#451 [02:39:38] so that shouldn't be affected [02:39:39] wikitech is weird here because what you really want to protect is 'important' developer accounts [02:40:00] SUL wikis confusingly set password policies in three different ways, I think wikitech only has that one [02:40:04] shell accounts? [02:40:11] the only scary things that can be done on wikitech itself are covered by the existing config [02:40:14] Err. Prod shell accounts? [02:40:28] James_F: gerrit +2 [02:40:38] so on wikitech wfGetPrivilegedGroups should not be used for password policies [02:40:39] mostly I would guess [02:40:41] Isn't that configured through ldap and on-gerrit only? [02:41:02] yes, but wikitech is the only place that tgr can make a fancy password policy [02:41:08] it's used for a bunch of auditing though, like counting login failures [02:41:14] and the only place that a user would be told about it [02:41:18] :-) [02:41:43] tgr: you can count login failures in the ldap directory [02:41:55] but that won't feed into the same dashboards [02:42:00] do we, though? [02:42:12] probably not [02:42:37] there is plenty of small stuff - email changes, loading javascript of a different user [02:43:13] so the question is do we consider the average wikitech account sort of sensitive, or would this just spam the audit logs into pointlessness [02:44:12] tgr: honestly I think that a very small number of the total wikitech accounts are "sensitive" [02:44:25] I have https://gerrit.wikimedia.org/r/c/mediawiki/core/+/482589 in the works, that should enable an extension to use more granular logic [02:44:39] which sounds like it would be the better approach here [02:45:15] Off the top of my head I would say the nda, wmf, and ops ldap group members are the ones you actually care about [02:45:15] I'll leave it as it is for now then, thanks [04:06:50] bd808: any thoughts about T213016? I'm running into it on multiple boxes so it's probably not that rare [04:06:51] T213016: Sometimes Redis does not work on MediaWiki-Vagrant with PHP 7.2 - https://phabricator.wikimedia.org/T213016 [04:09:40] never mind, this is actually a different redis error [04:11:26] tgr: adding php-igbinary if needed seems fine with me. I've not seen redis meltdowns myself, but my use of MediaWiki on MediaWiki-Vagrant deploys is limited [04:12:04] oh, huh. redis wasn't even installed in this box somehow. [04:12:22] how did that happen? it worked in the past and AFAIK we always required redis [04:13:08] *shrug* there was a bug in the last few weeks about the php module I think, but not the service itself [04:15:02] sorry, I meant php-redis, not redis [04:17:34] we were requiring the php7.2-redis package and I think that was doing some weird things. It was at least causing noise on repeated Puppet runs. I fixed that in https://github.com/wikimedia/mediawiki-vagrant/commit/7b9908eeccab55f2cff1b28f6c9988e6d3967da2 [04:21:10] I was using current master, I could see the package { 'php-redis': line, but somehow the package did not get installed through multiple provisioning attempts [04:21:13] weird [04:37:36] xdebug is also not installed [04:38:09] for a change, now the package is there but the mods-available entry is only for 7.0 not 7.2 [11:21:50] Danny_B: could you show what you have in putty's proxy settings? [11:26:31] Danny_B: I noticed your server is showing "Did not receive identification string from" in /var/log/auth.log -- that usually means whatever got control of the connection is not speaking the SSH protocol properly.. I don't have much experience with plink but I'd try to increase verbosity and see what it complains about [11:32:40] gtirloni: i updated klink to the most recent version and it works now. however it is quite annoying because avast antivirus considers the new klink a virus so moves it into chest and i have to manually take it off the chest every single time i run the kitty connection to labs which requires klink to go through bastion as i can't connect directly to my instance [11:33:27] (notice with the file to the av team sent) [11:34:07] that's terrible. maybe add an exception if you haven't done so already? https://www.getavast.net/support/managing-exceptions [11:37:26] gtirloni: thanks for suggestion. added [11:39:06] bd808: gtirloni i think it would be handy if relevant pages on wikitech were upgraded in a way of what is newly required for connecting [12:01:09] Danny_B: that's a good idea, I've added a warning here https://wikitech.wikimedia.org/wiki/Help:Access_to_Cloud_VPS_instances_with_PuTTY_and_WinSCP [14:22:04] Nemo_bis: Your VM, dumps-stats.dumps.eqiad.wmflabs, is due for shutdown. Can you please respond on T204503 about what your plan is? Or if you don't care I can just delete it. [14:22:04] T204503: cloudvps: dumps project trusty deprecation - https://phabricator.wikimedia.org/T204503 [14:48:18] andrewbogott: sorry, I had misunderstood that this instance was fine for this round; replying [14:48:30] thanks! [14:49:19] Nemo_bis: The typical solution for this kind of thing is to create a new Stretch VM in the new region and then migrate files and services over [18:58:52] !log admin Deleted LDAP user uid=novaadmin,ou=people,dc=wikimedia,dc=org [18:58:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:59:02] errr.. no [18:59:13] !log admin Deleted LDAP user uid=neutron,ou=people,dc=wikimedia,dc=org [18:59:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [18:59:56] !log admin Definately did NOT delete uid=novaadmin,ou=people,dc=wikimedia,dc=org [18:59:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:53:42] !log deployment-prep adjusting puppet config on deployment-mwmaint01. remove "mediawiki_maintenance" role from "other classes" section and apply "mediawiki::maintenance" instead after role rename in gerrit:479131 for consistency with other mediawiki:: roles [19:53:44] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Deployment-prep/SAL [20:40:52] !help I created a new python file in a new subdirectory of a tool but whenever I attempt to submit it to the job queue it immediately returns the error "not an executable file"? I am confused as it runs perfectly fine for me [20:40:53] TheSandDoctor: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [20:41:21] TheSandDoctor: which chmod is the file? [20:41:56] good point. I never set one. What should it be to run on tools @Hauskatze? [20:42:19] TheSandDoctor: iirc they suggest to use "take [20:42:30] the <> can be omited [20:42:40] TheSandDoctor: what command are you using to submit it to the job queue? [20:43:17] jsub [20:43:46] TheSandDoctor: the full command [20:44:06] jsub -v PYTHONIOENCODING=UTF-8 -N $1 $2 [20:44:14] have it in a bash script [20:44:20] sh [20:44:21] * [20:44:29] ok, can you share the script? [20:44:42] the more information you give, the faster and easier we'll be able to help you [20:45:11] which tool is it? [20:45:33] its the same submit script I've used in the past. Tool is thesandbot. Python script I will PM [20:45:51] please don't pm me [20:45:57] k [20:46:01] just paste it here [20:46:10] [Tue Jan 8 20:43:44 2019] /mnt/nfs/labstore-secondary-tools-project/thesandbot/testing/getter3.py: not an executable file [20:46:29] is in the .err [20:46:44] you still haven't shared exactly what command you're running to queue the job [20:47:00] https://www.irccloud.com/pastebin/fZNDSebH/ [20:47:30] https://www.irccloud.com/pastebin/j5pxPnin/start.sh [20:47:59] TheSandDoctor: you need to `chmod a+x /mnt/nfs/labstore-secondary-tools-project/thesandbot/testing/getter3.py` [20:48:07] TheSandDoctor: can you ls -la ... [20:48:09] that file is not marked as executable [20:48:18] what I suspected from start [20:48:32] bd808: chmod 775 right? [20:49:00] actually got the text it was submitted this time. [20:49:16] what is take supposed to do? [20:49:20] Hauskatze: well at least u+x [20:49:28] TheSandDoctor: in addition to the chmod, I would suggest having your script submit something like /data/project/.../venv/bin/python3 path/to/script.py [20:49:31] 5 is read and execute iirc [20:50:39] "path/to/script" is relative I take it @legoktm [20:50:41] ? [20:50:52] if everything is absolute it just makes it easier imo [20:53:42] Hauskatze: yeah 1 == exec, 2 == write, 4 == read so 5 == read+exec. Symbolic names are easier :) [20:54:27] :) [20:55:00] thanks for your help everyone [20:57:49] TheSandDoctor: did you managed to get it running now? [20:57:52] yes [20:57:56] great :) [20:58:00] :) [20:58:06] bd808: did you saw the task on +23h replag on s5? [20:58:27] I've just logged in and my PC wants me to reboot. How frustrating :| [20:59:02] Hauskatze: no, I haven't seen that. I do have a guess as to why though... [20:59:21] a misbehaving bot? [20:59:28] or flagged revs [20:59:42] no, crash of one of the replica servers that we pull from [20:59:54] well, not one of the usual suspects then :) [21:00:01] I'll try to fetch the task for you [21:00:12] replag is caused by problems between prod and the wiki replicas you see, not by load on the wiki replicas [21:00:16] I was going to tag cloud-services-team but refrained for some reason, I had fear [21:00:55] Just working on some page data crunching for a bot task update. Currently generating a table of page IDs for the pages in a category for easier working with later and don't have the time to run it locally as I have to head in a sec and can't get my desktop crunching it rn. Thought I may as well send it to the cloud. [21:00:57] T213108 -- s5 sanitarium was on the server that crashed. DBAs are working on the replacement server [21:00:58] T213108: db1082 power loss resulted on mysql crash - https://phabricator.wikimedia.org/T213108 [21:01:02] https://phabricator.wikimedia.org/T213175 [21:01:07] @Hauskatze ^ [21:01:26] update/amendment* [21:01:46] TheSandDoctor: I'm just a regular botop on cloud, I'm not sure I can answer that question [21:01:54] it isn't a question [21:01:59] just telling you what that's doing [21:02:01] * Hauskatze re-reads [21:02:02] ;) [21:02:03] true [21:02:09] * Hauskatze auto-trouts