[01:36:10] does anyone know if its possible to change your instance shell name [01:42:06] Computer_Fizz, https://wikitech.wikimedia.org/wiki/LDAP/Renaming_users suggests that has never been done? at least not with those instructions [01:42:43] i only want to change my shell name [01:42:45] ("instance shell name" = uid, wikitech/horizon username = cn/sn/displayName) [01:42:46] not my wiki name [01:42:50] yes [01:43:03] hence the "This has only been tested changing the cn, sn LDAP attributes. It has not been tested with change of the uid ldap attribute and would almost certainly not work " [01:43:07] except that my shell name is different than my wiki name :/ [01:43:13] yes, so is mine [01:44:34] https://phabricator.wikimedia.org/T133968#2599177 said no too [01:51:19] I was looking at the WMF cloud VPS and keep hearing quota mentioned, but can’t seem to find what the actual quotas are? [01:51:45] * TheSandDoctor seems to remember reading them *somewhere* months - or possible a year or more - ago [01:52:44] TheSandDoctor, they're per-project and show up in the overview panel in Horizon [01:53:05] Oh - I thought they were just standard levels @Krenair [01:53:14] standard levels? [01:55:17] TheSandDoctor? [01:57:47] “” [01:57:56] Had me confused into thinking there were standard sizes [01:58:05] https://phabricator.wikimedia.org/project/profile/2880/ [01:58:22] * TheSandDoctor was reading about WMF Cloud VPSs today and was wondering [01:58:28] @Krenair: [01:58:31] Thanks [01:58:52] so your project will have a quota for number of instances, a quota for amount of VCPUs its instances can have total, a quota for amount of RAM its instances can have total, a quota for floating IPs [01:59:25] (and some obscure stuff you won't run into very often outside the biggest projects, like numbers around security groups and their rules IIRC) [02:00:32] if you write that you want enough of a quota bump for a large instance, they'll look at the m1.large definition (in terms of VCPUs and RAMs) and figure out how much extra room your quota needs [02:00:42] VCPUs and RAM* [02:03:41] TheSandDoctor, does that explain it? [02:04:27] Yes that does, thanks! What is the m1.large definition? [02:04:30] @Krenair: [02:08:56] 4 VCPUs, 8192 MB RAM, 80 GB disk [02:17:20] Thanks @Krenair for your answers! [02:17:40] I don’t have one yet - just wondering/looking as an upcoming bot task might be suited to VPS [02:17:42] Not sure yet [02:17:55] So looking to see requirements/definitions etc. [02:17:57] :) [02:44:10] TheSandDoctor, pulled all the flavour info from the nova API: https://phabricator.wikimedia.org/P9546 [02:45:00] Thanks @Krenair [02:45:03] (I don't know why bigdisk is available to the observer project, that project shouldn't have any instances much less bigdisk ones...) [02:45:08] Went above & beyond and I appreciate it [02:45:10] :) [02:46:05] well I realised this info is not particularly well advertised [02:46:15] it's available if you know where to go but it's a PITA to just reference [02:48:11] might consider trying to get it into openstack-browser, which reminds me I still have open changes in Differential for that project since May :( [09:53:09] !log project-proxy replacing SSL cert for star.wmflabs.org - for real this time (T237066) [09:53:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Project-proxy/SAL [09:53:12] T237066: Push renewed *.wmflabs.org certificate and new private key to cluster (expires 2019-11-16) - https://phabricator.wikimedia.org/T237066 [11:43:08] !log tools create puppet prefix `tools-k8s-haproxy` T236826 [11:43:11] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:43:12] T236826: Toolforge: new k8s: initial build of the new kubernetes cluster - https://phabricator.wikimedia.org/T236826 [11:43:25] !log tools create VMs `tools-k8s-haproxy-[1,2]` T236826 [11:43:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:54:10] !log tools point `k8s.tools.eqiad1.wikimedia.cloud` to tools-k8s-haproxy-1 T236826 [11:54:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:54:15] T236826: Toolforge: new k8s: initial build of the new kubernetes cluster - https://phabricator.wikimedia.org/T236826 [12:57:05] !log tools increasing project quota T237633 [12:57:09] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [12:57:09] T237633: Request increased quota for tools Cloud VPS project - https://phabricator.wikimedia.org/T237633 [13:01:01] !log tools creating puppet prefix `tools-k8s-worker` and a couple of VMs `tools-k8s-worker-[1,2]` T236826 [13:01:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:01:04] T236826: Toolforge: new k8s: initial build of the new kubernetes cluster - https://phabricator.wikimedia.org/T236826 [13:27:16] !log tools deployed registry-admission-webhook and ingress-admission-controller into the new k8s cluster (T236826) [13:27:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [13:27:20] T236826: Toolforge: new k8s: initial build of the new kubernetes cluster - https://phabricator.wikimedia.org/T236826 [18:40:02] !log git rebuilding gerrit-test5 as gerrit-test6 (with buster) to re-test T200739 [18:40:06] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [18:40:06] T200739: Upgrade to Gerrit 2.16.12 - https://phabricator.wikimedia.org/T200739 [20:17:40] if i am on a cumin server in cloud VPS. f.e. deployment-cumin02, i only get facts from within that project. like "which instances are using puppet class X" works fine within the scope of deployment-prep. i assume there is a no global cumin server that could search all projects, right? [20:18:40] uses openstack-browser tool and it's nice but cumin would be commandline [20:19:23] mutante: try cloud-cumin-01.cloudinfra.eqiad.wmflabs [20:19:36] andrewbogott: !:) thanks [20:19:47] if you don't have a login there yet I'll add you [20:21:10] andrewbogott: i can login (as root) [20:21:22] I don't think cumin likes you to be root though [20:21:38] yea, i worked around it on the other cumin by becoming my normal user [20:21:45] and then it asks to run it with sudp [20:21:47] sudo [20:21:51] mutante: your shell name is 'dzahn'? [20:22:36] andrewbogott: yea [20:22:41] ok, added you [20:22:53] * bd808 wonders if he removed mutante from the tools admins list like he was asked to do... [20:23:45] andrewbogott: thanks! i think there was a general issue with my key since i was temp deactivated [20:31:28] andrewbogott: the issue is i can't login with dzahn on any instance while root works. in different projects and before anything related to cloudinfra. [20:32:25] the key in Wikitech and in LDAP is what i have loaded and works via root auth keys though [20:33:25] mutante: hmmm... I only see one ssh key for you in ldap. It has the label "mutante@seaotter". Your developer account does not seem otherwise locked. Maybe you just need to set a new key via https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack ? [20:33:55] mutante: can you try bastion-restricted-eqiad1-01.bastion.eqiad.wmflabs [20:34:25] bd808: ack, thanks! that is the @seaotter key though [20:34:27] andrewbogott: trying [20:35:44] andrewbogott: works as root, getting asked for password as dzahn [20:35:50] root is unrelated [20:35:52] I see "Failed publickey for dzahn from" [20:36:03] yes, i know. but i am pointing it out because that's the same key [20:36:10] so i have that one loaded [20:36:42] ooh.. i see a possible issue in my config. [20:37:22] dzahn@primary.bastion.wmflabs.org: Permission denied (publickey). [20:37:35] is there something wrong with the bastion name? [20:38:51] mutante: it's likely that you're not allowed to use that bastion, and instead only the one above [20:38:58] yea, so i am setting "User dzahn" and using dzahn@primary.bastion in ProxyCommand [20:38:59] I was hoping to get you to try connecting without a proxy [20:39:35] andrewbogott: how would i do that to a .wmflabs name? [20:39:49] oh, good point. Let me get the public name [20:39:51] ah. is that .org [20:40:32] try bastion-restricted.wmflabs.org [20:41:47] ok, so it's not the proxy command, it just doesn't like your username and password [20:42:05] I'm not 100% sure if you should be using primary.bastion.wmflabs.org or bastion-restricted.wmflabs.org [20:42:14] but it seems like atm both are failing with the same error [20:42:43] yea, the behaviour seems the same [20:42:54] s/password/key/ [20:44:53] when i directly look at /etc/ssh/userkeys/root on bastion-restricted-eqiad-01 and my key in there [20:46:25] and then look at my LDAP entry from mwmaint1002 .. [20:46:34] that is the same key to me..just different comment [20:47:09] i think this is something about account deactivation/reactivation [20:47:22] the keys in /etc/ssh/userkeys/root allow you to ssh with root@ but not your own username [20:47:30] yes, i know [20:47:48] but the thing is that root@ works and dzahn@ does not like the key [20:48:04] and the key used for dzahn@ should be what is in LDAP, right [20:48:36] dzahn@ would be using the LDAP key(s). Let me see if I can get a better error message by trying to lookup your key the same way that PAM/sshd does [20:51:06] mutante: I think I figured it out. Your ssh key in LDAP has whitespace in it (line breaks from screen display it looks like) [20:51:43] arrg. thanks bd808. let me try readding that in wikitech [20:51:43] so it is not matching when sshd compares signatures [20:51:53] i compared like a part of the string to match [20:52:23] its hard to see the whitespace until I fetched it using `/usr/sbin/ssh-key-ldap-lookup dzahn` on a bastion [20:53:33] also... this is something we could make both the input and the lookup smarter about. toolsadmin.wikimedia.org (Striker) would have actually caught this when you provided the key [20:53:40] bd808: thank you. deleted key in wikitech. and it is fixed! [20:53:46] w00t [20:53:51] i mean re-added of course [20:55:47] i am also on cloud-cumin-01. cool. i am getting a totally unrelated error there but that's another story [20:57:53] something about "HTTPSConnectionPool(host='localhost', port=443): Max retries exceeded with url: /pdb/query/v4/resources" but i can report that via ticket [21:42:21] !log phabricator disassociating floating IP from phabricator-10 [21:42:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [21:42:42] !log phabricator deleting instance phabricator-10 [21:42:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Phabricator/SAL [21:52:57] bd808 hi, i'm wondering what should i do if nfs timeout when starting up? I just created a brand new instance and i'm trying to get mw vagrant up on buster, but nfs timesout. [21:53:18] The following SSH command responded with a non-zero exit status. [21:53:19] Vagrant assumes that this means the command failed! [21:53:19] mount -o vers=3,udp,noatime,rsize=32767,wsize=32767,async 192.168.122.1:/srv/mediawiki-vagrant /vagrant [21:54:18] paladox: *shrug* keep trying? The 5-6 mw-vagrant instances I have setup on Buster have had really good luck in not hitting that particular bug, but obviously you are not getting that luck. [21:54:30] oh [21:54:33] ok [21:54:35] thanks! [21:54:42] If there was a magic fix that I knew I would have documented it :) [21:55:15] !log toolsbeta killed pods for ingress admission controller to upgrade to new image T215531 [21:55:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Toolsbeta/SAL [21:55:18] T215531: Deploy upgraded Kubernetes to toolsbeta - https://phabricator.wikimedia.org/T215531 [21:57:00] ok :) [21:57:19] paladox: does this work for you? https://puppet-compiler.wmflabs.org/ [21:57:28] nope [21:58:38] paladox: seems like somebody was working on it but for me it's like the domainproxy entry is gone [21:58:55] oh [22:01:07] mutante: I don't see it on https://tools.wmflabs.org/openstack-browser/proxy/ -- Do you know which project used to "own" that proxy? [22:03:45] bd808: i think puppet-diffs [22:03:47] https://tools.wmflabs.org/openstack-browser/project/puppet-diffs [22:04:47] yea, there is no webproxy in the UI that i can see [22:04:51] i pinged jbond [22:05:22] since https://integration.wikimedia.org/ci/job/operations-puppet-catalog-compiler/ still links you there to get the results [22:05:31] looks like all of the instances there are new today. My guess is that jbond42 rebuilt them (yay!) but did not realize that the proxy would disappear when he deleted the older instances. [22:05:47] ack, that sounds likely [22:06:41] hi all and yes bd808 yo are spot on [22:06:45] let me see if 1001 is the one with the webserver [22:06:49] ah, hi [22:07:03] should the webproxy point to the first instance? [22:07:49] tbh im not sure let me take a look at what is configured now [22:32:41] jbond42: when you sort that out, can you help me understand what's happening on my freshly-built puppetmaster? [22:32:43] "Error: Could not retrieve catalog from remote server: Error 500 on SERVER: Server Error: Evaluation Error: Error while evaluating a Resource Statement, Duplicate declaration: File[/var/lib/puppet/ssl/certs/ca.pem] is already declared at (file: /etc/puppet/modules/base/manifests/puppet.pp, line: 56); cannot redeclare (file: /etc/puppet/modules/profile/manifests/puppetmaster/frontend.pp, line: 30) (file: [22:32:44] /etc/puppet/modules/profile/manifests/puppetmaster/frontend.pp, line: 30, column: 9) on node labtestpuppetmaster2001.wikimedia.org" [22:33:23] Even though I see the file being excluded in one of those cases by if $manage_ca_file [22:36:22] andrewbogott: im not sure if you have manage_ca_file: false it sholdn;t cause a problem. however there is a bug there if manage_ca_file: true and the localcacert is the same value in the mastr and main section, can you raise a phab task for that [22:37:07] as to the compiler thing im not sure of the issue, the backens all seem fine if i contact them directlty [22:37:27] jbond42: I don't have any hiera overrides for this host — it should be all defaults [22:37:39] this ls labtestpuppetmaster2001, I'm trying to rebuild it with stretch [22:38:04] andrewbogott: that means its on the ops network right? [22:38:11] ops/production [22:38:14] public network [22:38:22] labtestpuppetmaster2001.wikimedia.org [22:38:29] but of course you can't reach it because… no puppet :) [22:38:32] sorry ops realm [22:38:37] yes [22:39:00] yes then that is a bug [22:39:28] labtestpuppetmaster2001.wikimedia.org dosn;t pick up the hiera values from labs.yaml afaik [22:39:38] so it means manage_ca: true [22:39:44] right, it shouldn't [22:39:50] but I can override if you think that's the right fix [22:40:39] sorry, I'm not sure I understand enough yet to file a bug :) [22:41:16] sure ill back up [22:42:14] regarding the automatic "jessie deprecation" tickets. Looks like the live report look at the image name an instance was created from and we have cases where they are reported as jessie but are actually already stretch (because they were upgrade in place) [22:42:23] in a standrd puppet configueration the ssldir value is the same value for both the main and master sctions of the config however in the production puppet config we use seprate directories [22:42:39] of the top of my head im not sur where that is configured [22:43:15] furthr in the production puppet we manage the localcacert for both ssldirs [22:44:21] in the production puppet this is fine as they are seperate directories however in a default puppet install thess dirs wold be the same and the way the policy is written today means that we get a duplicate resource definition [22:45:23] are we only talking about masters here, or all puppetized hosts? [22:45:25] im gussing somehwere between how the production puppetmasters are defined and how the labtestpuppetmasters are defined the ssldir is different [22:45:32] just masters [22:46:04] and only the frontends at that [22:46:04] ok [22:46:11] I will hack around this for now as best I can :) [22:46:46] if you set manage_ca: false in hiredata/host/testpuppetmaster.yaml it should solve it [22:47:02] cool, trying that now [23:14:35] mutante: yes. problem #1 is that the image is all that I can query from OpenStack. problem #2 is that historically we have very very strongly discourages in-place upgrades as they are a signal of "pets, not cattle" instance management [23:15:49] bd808 hmm, it seems to have done it for 3+ times now :( [23:15:51] bd808: i agree with "pets, not cattle" except for puppetmasters it's really a pain to set them up instead of just upgrading in place which is pretty easy [23:16:56] i have been thinking maybe lsb_release -c via cumin [23:17:33] mutante: a tool running in Toolforge is not going to be able to ssh as root to 800 instances :) [23:17:57] bd808: hence the cumin server? [23:18:31] setting up a puppetmaster is not particularly hard [23:19:17] you may have to do a CA replacement which can be a little bit of work, but only for like the largest projects, and we have made some improvements to hopefully make that possible to do fairly seamlessly [23:19:18] ok.. but what exactly is the advantage of me deleting it to create the same thing afterwards [23:19:33] mutante: that would make it a manual process, or require me to invent many new things. The ~20 in-place upgraded instances are not worth that investment right now [23:19:45] bd808: ack [23:21:44] I honestly think the main benefit for a Cloud VPS maintainer of rebuilding instances from scratch every 1-2 years is ensuring that the documentation on how to do that is correct. Any instance could be irrevocably lost at any time (we lost a number due to a RADI crash earlier this year) [23:21:58] *RAID [23:22:30] we are working on things that should make that even less likely, but it is a thing that will alwasy be possible [23:29:10] !log planet - deleting puppetmaster [23:29:12] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Planet/SAL [23:30:43] bd808 oh! [23:30:51] bd808 it was ferm [23:30:52] well.. i just deleted it. created exactly 1 year ago. was upgraded in place because docs didnt work.