[03:22:08] hello - I restarted my k8s webservice job and the pod now seems unable to start; is this a known issue or related to the other recent problems we've been having? https://dpaste.de/nh4S [03:46:46] Earwig: is it still having that issue? I can look into it ~1h later [03:47:20] zhuyifei1999_: yes, still happening. thanks! [03:47:25] k [03:47:39] Earwig: hi! I hope you've been well. [03:48:00] hey harej, same to you! [04:00:28] (actually, I'll look at it right now) [04:02:03] https://www.irccloud.com/pastebin/xtnFqpYI/ [04:02:06] o.O [04:02:54] the container's not running, so it can't bring up a shell in it? [04:03:05] oh, good point [04:03:58] that would be awkward... [04:04:01] umm [04:04:26] I was able to get `webservice --backend=kubernetes python2 shell` to do the right thing, it seems [04:04:51] I'm not sure how the type of container that one starts might differ from the real one [04:05:57] does the error persist after restarting webservice? [04:06:44] yes, seems so [04:07:34] oh, that's interesting... [04:08:18] it might be only affecting one node [04:08:46] indeed [04:08:48] zhuyifei1999@tools-worker-1014.tools.eqiad.wmflabs: Permission denied (publickey,hostbased). [04:08:51] brought up the interactive one on tools-worker-1010 and it looks fine [04:09:38] on -1014, though... [04:09:38] I have no name!@:~$ [04:09:59] what a strange prompt, never seen that before [04:10:18] https://www.irccloud.com/pastebin/o1DZyVBu/ [04:12:33] !log tools restarting nscd on tools-worker-1014 in an attempt to fix seemingly-not-attached-to-LDAP [04:12:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:12:55] didn't work [04:14:05] uh, should be nslcd [04:14:40] !log tools restarting nslcd on tools-worker-1014 in an attempt to fix that, service failed to start, looking into logs [04:14:41] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:16:33] !log tools logs: https://phabricator.wikimedia.org/P8095 [04:16:34] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:19:04] shall I just depool this node and reboot [04:23:30] !log tools drained tools-worker-1014.tools.eqiad.wmflabs [04:23:31] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:24:33] Earwig: this host will no longer be scheduled until I try to fix (or file a bug so someone else fix it). could you try restarting the webservice? [04:24:46] yep, looks like we're now on a functional node [04:25:24] thank you for the help! [04:25:39] np [04:30:47] !log tools `nslcd -nd` complains about 'nslcd: bind() to /var/run/nslcd/socket failed: Address already in use'. SIGTERMed a background nslcd, `rmdir /var/run/nslcd/socket`, and `nslcd -nd` seemingly starts to work [04:30:49] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:31:26] !log tools then started nslcd vis systemctl and `id zhuyifei1999` returns correct stuffs [04:31:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:33:11] !log tools, the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT [04:33:12] zhuyifei1999_: Unknown project "tools," [04:33:16] !log tools the issue was, /var/run/nslcd/socket was somehow a directory, AFAICT [04:33:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:34:03] !log tools uncordon tools-worker-1014.tools.eqiad.wmflabs [04:34:04] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:41:41] !log tools nslcd also broken on tools-worker-1005 [04:41:43] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:43:22] !log tools this one has logs full of 'Can't contact LDAP server' [04:43:23] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:44:51] !log tools puppet is also failing bad here 'Error: Could not request certificate: getaddrinfo: Name or service not known' [04:44:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:48:58] !log tools that host's resolv.conf is badly broken https://phabricator.wikimedia.org/P8096. The last Puppet run was at Thu Feb 14 15:21:09 UTC 2019 (2247 minutes ago) [04:49:00] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:52:22] !log copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet [04:52:22] zhuyifei1999_: Unknown project "copied" [04:52:27] !log tools copied the resolv.conf from tools-k8s-master-01, removing secondary DNS to make sure puppet fixes that, and starting puppet [04:52:28] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [04:58:35] !log tools puppet logs: https://phabricator.wikimedia.org/P8097. Docker is failing with 'Failed to load environment files: No such file or directory' [04:58:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [05:00:44] !log tools fixed by restarting flannel. another puppet run simply started kubelet [05:00:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [07:34:12] JJMC89: thanks [13:22:10] hello, are there any proxy issues? I'm seeing the occasional 500 error on https://warper.wmflabs.org but nothing in my apache error logs [13:29:00] chippy: if that uses toolsbd, that is expected, since we have ongoing issues with that database and we are working in a replacement [13:29:39] arturo, nope it's not using that. I think it might be something on my instance. Restarted the webserver in the meantime [13:29:51] ok [13:30:18] very odd bug (arn't they all) [13:30:34] :-) [13:33:50] !log clouddb-servies add myself as user and projectadmin [13:33:51] arturo: Unknown project "clouddb-servies" [13:33:58] !log clouddb-services add myself as user and projectadmin [13:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [13:43:07] !log clouddb-services T193264 create 'clouddb-services-puppetmaster-01' instance [13:43:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [13:43:10] T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020 - https://phabricator.wikimedia.org/T193264 [13:47:08] !log clouddb-services T193264 create 'clouddb-services-puppetmaster' puppet prefix to store puppet/hiera config for this project puppetmaster [13:47:10] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [13:54:54] !log clouddb-services T193264 create 'clouddb10' puppet prefix to store puppet/hiera config for database servers in this project [13:54:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [13:54:57] T193264: Replace labsdb100[4567] with instances on cloudvirt1019 and cloudvirt1020 - https://phabricator.wikimedia.org/T193264 [13:59:18] !log clouddb-services T193264 switched clouddb1001/1004 to the new project local puppetmaster [13:59:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Clouddb-services/SAL [14:21:31] Hi all, not sure if this is the right channel but I was told I might find help here. PAWS is currently responding with a 504. Is this a known problem? https://paws.wmflabs.org [14:22:29] Yep, it's known issue with toolsdb affecting multiple tools [14:23:17] The cloud team is working on fixing it, though I wouldn't expect things to be stable before Tuesday, and it might take longer [14:26:04] Ok, thanks for the quick reply and good luck with the fixes! [14:26:24] For my specific use case I got lucky and discovered that I can find a read-only verison of my notbooks here: https://paws-public.wmflabs.org [14:27:24] Yes, paws-public doesn't use the db and is unaffected. [14:27:49] You did remind to change the erroressage to something more useful, I'm doing that now [15:54:39] !tools move paws to sqlite in memory until toolsdb is up T216208 [15:54:40] T216208: ToolsDB overload and cleanup - https://phabricator.wikimedia.org/T216208 [16:13:20] Hi, why I can't connect to trusty instance of toolforge [16:13:27] I get error about wrong key [16:13:36] !help [16:13:36] Zoranzoki21: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [16:15:06] hi Zoranzoki21, can you add more context to your problem? Do you have a precise error message to show? [16:15:47] $ ssh login.tools.wmflabs.org -l zoranzoki21 [16:15:47] zoranzoki21@login.tools.wmflabs.org: Permission denied (publickey,hostbased). [16:17:17] I can connect on gerrit without problems, but no on toolforge [16:20:01] Zoranzoki21: have you changed your keypair recently? [16:20:18] : Yes [16:20:22] Changed on wikitech and gerrit [16:20:27] On gerrit I can connect without problems [16:20:35] On Toolforge (Trusty and Stretch) no [16:21:32] gerrit and toolforge do not share ssh keys [16:22:07] Zoranzoki21: could you double-check that the key on wikitech is correct? (https://wikitech.wikimedia.org/wiki/Special:Preferences#mw-prefsection-openstack ) [16:22:19] paladox: I know but I use same keys, so I said it [16:22:26] ok [16:22:44] : Yes, they are [16:22:49] *they is [16:26:21] Zoranzoki21: can you also check via https://toolsadmin.wikimedia.org/profile/settings/ssh-keys ? [16:26:33] I'm wondering if something went wrong in storing the key via wikitech [16:27:31] I have problem [16:27:50] When I copy key from file it paste wrong key [16:28:05] And on toolsadmin and on wikitech [16:30:17] Might have been a little too bunt on that ticket, but I remember the days when the toolserver had serious issues [16:30:56] Now I pasted piece by piece and I will try again [16:33:10] No works again [16:34:34] Keys are same everywhere [16:35:10] Too I have problems on Gerrit. CI no runs tests for my account [16:35:41] Zoranzoki21: there is now a different key in ldap, so that does seem to work correctly. [16:36:43] No works still and CTRL+C CTRL+V pastes wrong key in wikitech [16:36:47] I no know why [16:36:59] Zoranzoki21: I would try restarting your computer [16:37:14] I will. After restart I will come back and try again [16:37:22] * Zoranzoki21 restarting PC [16:37:37] Betacommand: I can understand their frustration. In the end, one of the main issues is that there are a lot of tools on labs that are effectively critical, but without the support needed for those kinds of tools. [16:40:53] valhallasw`cloud: frustration is one thing, understanding that wmflabs is not a production level environment, and that hiccups happen is also important [16:42:38] Maybe we should have more regular outages so people don't think wmflabs is always up ;-) [16:42:45] I am back [16:42:48] Connection works now [16:43:01] But I have second problem [16:43:05] CI tests no works for me [16:43:31] Zoranzoki21: can you link the gerrit changeset? [16:43:46] All patches which I make [16:43:55] valhallasw`cloud: I guess I come from the days when toolserver replag was measured in years, not seconds, and they where solaris boxes [16:44:43] Gives one a different mind set [16:44:52] : https://gerrit.wikimedia.org/r/#/q/owner:Zoranzoki21+status:open [16:47:33] Zoranzoki21: hmm. Could it be that those repos just don't have CI set up? [16:47:39] no [16:47:42] translatewiki haves it [16:48:02] mediawiki/core haves it [16:48:15] integration/config haves it [16:48:31] hmm [16:48:33] When I run recheck at https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/GlobalBlocking/+/490943/ on https://integration.wikimedia.org/zuul/ nothing happens [16:48:56] you are not whitelisted [16:49:04] paladox: I? [16:49:17] yup you are not whitelisted, as it works for me [16:49:39] paladox: I am whitelisted [16:49:47] nope you are not [16:50:00] https://gerrit.wikimedia.org/r/380989 [16:50:25] paladox: How? I am removed or? [16:50:37] ohoh [16:50:46] well im not sure then, but it works for me [16:51:24] https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/zuul/layout.yaml#319 [16:51:35] https://gerrit.wikimedia.org/r/plugins/gitiles/integration/config/+/master/zuul/layout.yaml#576 [16:51:41] I am on white list [16:52:24] ... Where is my email on gerrit? https://snag.gy/yT6vjG.jpg [16:52:52] Zoranzoki21: under 'identities', I think. [16:53:05] There was my email before [16:53:05] although that main email entry should also be there :/ [16:53:17] contact info [16:54:02] I think the main entry is pulled from ldap, so should be the same as the one in wikitech [16:56:32] but ldap lists the same email as the zuul config. Odd. [16:57:15] Zoranzoki21: I would suggest checking with Hashar during the coming week -- it might be related to the outage, but I can't say for certain. [16:57:28] and I don't exactly know how the whitelist is supposed to work [16:57:43] valhallasw`cloud the whitelist matches the email in gerrit [16:57:51] gerrit only pulls the email when you first register [16:58:08] aha [16:59:18] Zoranzoki21: can you check whether the email is listed under https://gerrit.wikimedia.org/r/#/settings/web-identities ? And if not, add it. [16:59:31] After adding it (or if it's already there), check whether it's listed as primary under https://gerrit.wikimedia.org/r/#/settings/contact [16:59:39] preferred* [16:59:55] ah [17:00:01] his email is not listed valhallasw`cloud [17:00:19] in PolyGerrit when clicking the author field (it shows the email now), for him it's not showing [17:00:42] http://gerrit.wikimedia.org/r/accounts/zoranzoki21 [17:00:54] is not showing it either [17:01:03] but http://gerrit.wikimedia.org/r/accounts/paladox shows it for me. [17:01:48] https://gerrit.wikimedia.org/r/#/settings/web-identities there is listed [17:02:13] https://gerrit.wikimedia.org/r/#/settings/contact I have option to choose my email but I get error about realm [17:02:24] realm? [17:02:24] realm does not allowing editing name [17:02:34] yeh, you want to edit the email field [17:03:13] On https://gerrit.wikimedia.org/r/#/settings/contact at Preferred Email nothing no shows but I have option to choose my email [17:03:20] And when I choose my email and click on save change [17:03:22] *changes [17:03:24] I get this error [17:03:33] click register new email [17:04:00] Ok [17:04:06] I get error Resource already exists [17:04:29] When I want to use zorandori4444@gmail.com [17:04:37] did you create a new gerrit account? [17:04:45] if yes, then you won't be able to use that email [17:04:54] if it's asscoiated with a prevous account [17:04:58] paladox: I have only one account [17:05:09] And it is Zoranzoki21 [17:05:12] "error Resource already exists" suggests otherwise. [17:05:43] it's associated _with the Zoranzoki21 account_ [17:06:59] Zoranzoki21 does the email show in the select thingy for Preferred Email? [17:07:05] paladox: yes [17:07:23] oh, that's why then, make sure you select that and click save [17:07:46] paladox: realm.... [17:07:51] realm? [17:07:55] Again [17:07:58] that results in the " realm does not allowing editing name" error. [17:08:03] yes [17:08:08] it is what I mean [17:08:41] Zoranzoki21: weird idea. Did you login as 'zoranzoki21' or as 'Zoranzoki21'? [17:08:50] zoranzoki21 [17:08:55] can you try with a capital Z? [17:09:01] Yes, same happening [17:09:03] ah ok [17:09:15] Is the email accoiated to your account on wikitech? [17:09:23] I've just tested and i got the same error.... [17:09:23] same as in: results in the same 'realm does not allow editing name' error? [17:09:42] paladox: yes, Zoranzoki21's email is in ldap. [17:09:46] ok [17:09:54] im not sure then :( [17:10:35] Zoranzoki21: I suggest opening a ticket in https://phabricator.wikimedia.org/project/profile/330/ [17:10:44] paladox: do you know who is responsible for the prod gerrit instance? [17:10:46] bugger [17:10:50] i broke it for me now [17:10:58] valhallasw`cloud yup, the releng team. [17:12:09] Should I ask on releng channel? [17:12:16] aha [17:12:21] i think i know how to fix this [17:12:57] Wrong button :) [17:12:59] I am back agian [17:13:01] *again [17:13:07] How I can fix it paladox? [17:13:35] Zoranzoki21 clone https://gerrit.wikimedia.org/r/#/admin/projects/All-Users [17:13:40] then cd into the folder [17:13:47] edit .git/config [17:14:11] you will see refs/heads, replace that with just refs/* [17:14:12] so like [17:14:13] fetch = +refs/*:refs/remotes/origin/* [17:14:16] then git pull [17:14:31] then git checkout origin/users/self [17:14:40] Betacommand: I don't think you were blunt at all. Then again maybe I was? [17:14:59] valhallasw`cloud found a fix [17:15:00] :) [17:15:06] Zoranzoki21: are you still having trouble logging in to Toolforge? [17:15:14] paladox: what is this black magic :D [17:15:23] valhallasw`cloud you clone https://gerrit.wikimedia.org/r/#/admin/projects/All-Users [17:15:32] paladox: yes, I read the steps :-) [17:15:34] you then edit .git/config (fetch) [17:15:40] replace it with [17:15:40] fetch = +refs/*:refs/remotes/origin/* [17:15:42] chicocvenacio: no [17:15:42] then git pull [17:15:49] then git checkout origin/users/self [17:15:57] Zoranzoki21: cool! [17:16:04] then add preferredEmail = [17:16:21] under fullName (in account.config) [17:16:31] then git add -A --all && git commit [17:16:34] name the commit [17:16:44] then git push origin HEAD:refs/users/self [17:16:56] the power of NoteDB :D [17:17:21] if it was still using the db, we would have had to have someone edit the db [17:18:13] I cannot start wmopbot, I am receiving the error "User s53213 already has more than 'max_user_connections' active connections", someone knows how to fix that? [17:18:34] danilo: toolsdb is overloaded [17:18:53] Current fix is to wait a few days [17:19:18] HEAD detached at origin/users/self [17:19:18] nothing to commit, working tree clean [17:19:19] https://lists.wikimedia.org/pipermail/cloud-announce/2019-February/000135.html [17:19:35] Zoranzoki21 yup [17:19:40] do you see a account.config file? [17:19:49] Yes [17:19:55] edit that [17:20:00] with [17:20:04] under fullName [17:20:07] put preferredEmail = [17:20:27] And then I save it [17:20:31] yup [17:20:37] then git add -A --all && git commit [17:20:54] then [17:21:03] I enter commit message [17:21:07] yup [17:21:13] And on end git push origin HEAD:refs/users/self [17:21:18] then you git push origin HEAD:refs/users/self [17:21:46] paladox: There is problem [17:21:47] Zoran@Zoran-PC MINGW32 ~/Desktop/development/All-Users ((95c2e18...)) [17:21:47] $ git push origin HEAD:refs/users/self [17:21:47] Enumerating objects: 5, done. [17:21:47] Counting objects: 100% (5/5), done. [17:21:47] Compressing objects: 100% (3/3), done. [17:21:51] Writing objects: 100% (3/3), 447 bytes | 223.00 KiB/s, done. [17:21:53] Total 3 (delta 0), reused 0 (delta 0) [17:21:55] remote: Processing changes: refs: 1, done [17:21:57] remote: error: invalid preferred email '' for account '4879' [17:21:59] To ssh://gerrit.wikimedia.org:29418/All-Users [17:22:01] ! [remote rejected] HEAD -> refs/users/self (invalid account configuration) [17:22:03] error: failed to push some refs to 'ssh://zoranzoki21@gerrit.wikimedia.org:29418/All-Users' [17:22:05] Zoran@Zoran-PC MINGW32 ~/Desktop/development/All-Users ((95c2e18...)) [17:22:07] $ [17:22:07] did you put <> [17:22:17] it needs to be preferredEmail = zorandori4444@gmail.com [17:22:21] Yes [17:23:23] I removed <> [17:23:37] ok, then git add -A --all && git commit -a --amend [17:24:16] Same problem still [17:24:28] same problem? [17:24:33] how does the config look? [17:24:56] [account] [17:24:56] fullName = Zoranzoki21 [17:24:56] preferredEmail = zorandori4444@gmail.com [17:25:10] for me it looks like https://phabricator.wikimedia.org/P8098 [17:25:23] Zoranzoki21 did you do git add -A --all && git commit -a --amend? [17:25:35] Yes [17:25:58] After removing <>? [17:26:06] chicocvenacio: Yes [17:26:07] And now [17:26:10] did you git push origin HEAD:refs/users/self [17:26:42] paladox: yes [17:26:52] what does the new error show? [17:27:20] https://phabricator.wikimedia.org/P8099 [17:27:35] wait [17:27:40] that's still showing <> [17:28:48] that either means you did not ammend the commit with the updated change or it still has <> in the file [17:28:50] I will try everything again from start until the end [17:28:55] ok [17:33:46] Works :) [17:33:48] Thanks!! [17:48:15] Urbanecm: someone just sent me this, thanks :) https://screenshots.firefox.com/TSBRS9QV7JfinUmf/fr.wikipedia.org [20:10:38] Yw,framawiki. [22:50:06] Hey, any news on the behavior of tools.labsdb? [22:56:33] GoranSM1: https://lists.wikimedia.org/pipermail/cloud-announce/2019-February/000135.html [22:57:47] andrewbogott: Ouch... Good luck, my friend, and thanks for an update! [23:50:32] * legoktm huggles all the sane users on T216208 [23:59:11] Hi, what's the procedure for raising the memory limit of a k8s webservice tool? I know we have the limits in /data/project/.system/config/*.web-memlimit, but these don't seem to affect k8s containers. Should I just file a ticket?