[00:08:28] mahmoud-montage, `kubectl get pods`, get the pod name from that, then `kubectl logs ` [00:17:56] AntiComposite: unfortunately, get pods is telling me "No resources found.", does the pod only exist after it successfully starts up? [00:18:15] (this is montage-dev btw, montage itself is working fine on k8s, making this kind of confusing) [00:19:20] Your job is not running, starting...............(venv)tools.montage-dev@tools-sgebastion-07:~$ kubectl get podsNo resources found. [00:19:53] take a look in uwsgi.log, anything there? [00:21:59] python errors should go there by default and it was last changed at 21:12, same as service.log [00:22:05] nope, just the goodbye message from when I shut it down [00:22:44] what webservice command are you running? [00:33:56] variants of "webservice --cpu 1 --mem 2 --backend=kubernetes python2 restart" [00:34:14] (which is what works for regular montage) [00:38:10] looking [00:38:11] hmm [00:38:27] the actual service.log entry is 2020-03-01T21:12:56.167173 Could not start webservice - webservice tool exited with error code 1 [00:39:30] here we are [00:39:50] you have a deployment with ready 0/1 [00:40:01] a replicaset with desired 1, current 0 [00:40:02] no pods [00:40:07] so describing the replicaset yields this [00:40:19] Warning FailedCreate 12m replicaset-controller Error creating: pods "montage-dev-6cb97d4b66-6plt2" is forbidden: minimum memory usage per Container is 100Mi, but request is 2 [00:42:33] mahmoud-montage ^ [00:42:50] oh whoa! [00:43:25] ok so I totally had as a note for myself to file a phabricator to have webservice specify the unit for memory [00:43:36] :D [00:44:25] Krenair: what's the command for the replicaset again? [00:44:37] oh I just did 'kubectl describe rs' [00:44:55] (rs being short for replicaset, because I'm feeling lazy) [00:48:46] hey, as long as it works [00:49:02] ok my replicaset is pretty borked, bc I'm seeing "cpu: 500m" [00:49:29] i fixed my command and the service.manifest, but basically the same result [00:50:04] Krenair ^ [00:50:16] what's the new command you're running mahmoud-montage ? [00:50:29] plain and simple, all defaults: webservice python2 restart [00:51:50] and my describe rs output looks pretty different between "montage" and "montage-dev" [00:52:08] Warning FailedCreate 3s (x5 over 43s) replicaset-controller (combined from similar events): Error creating: pods "montage-dev-7cfb5db66-r5dc6" is forbidden: minimum memory usage per Container is 100Mi, but request is 2k [00:55:10] hmm and which command do I use to see that kinda output? [00:55:21] kubectl describe rs [00:56:15] oh weird, I swear I did that and I didn't see it there, maybe it's a bit delayed? [00:57:20] is service.manifest only periodically sync'd or something? because I'm changing it, but I'm getting the same errors [00:57:33] I suppose technically between the replicaset being created in the cluster and you running describe, the replicaset-controller might not have tried to reconcile yet. seems unlikely though [00:57:54] you're editing the service.manifest by hand? [00:58:11] yes, because it doesn't unset the values when I run it by default [00:58:16] or should I delete it? [00:58:28] this is despite the '# Please do not edit manually at this time.' header at the top? [00:58:57] sorry, when I run webservice python2 restart (with defaults) it seems to be reading the values from that file, not unsetting them, which I think is the design [00:59:52] i guess i can just move it away and let a new one get generated [01:00:16] let's try it and see what happens. am not familiar with toolforge's webservice tool [01:01:26] yeah it generated a new one, without the "2000" value in there, but still no luck with the describe rs :/ [01:01:47] looks like it still has wonky values in it ("memory: 2k", "cpu: 500m") [01:04:28] so what command are you running to make it generate that? [01:05:12] i'm just running "webservice --backend kubernetes python2 restart" [01:05:29] and it just generates invalid memory requests by default? :/ [01:05:49] hey there we go [01:05:53] dunno what you did but you have a pod now [01:05:56] there may be a bug in there, maybe? (500m is the default amount of memory) [01:06:15] i stopped and started instead of restarting [01:07:49] webservice stop / webservice start [01:08:08] still, i learned a bunch about kubectl debugging, so thanks Krenair and AntiComposite [01:08:27] you know, back in the day you'd fight for an afternoon with 15 lines of code and feel kinda dumb [01:08:57] thanks to the miracles of modern devops technology, you can fight for an afternoon with just 1 single command and still feel kinda dumb :) [01:09:02] :D [01:09:11] but all's well that ends well [01:09:15] thanks again! [01:09:22] you're welcome [01:10:48] (and Lodewijk + friends from Wiki Loves Monuments thank you too) [01:31:16] I gave up trying to set up defaults on my service starts and just stuck the command line in a file and ran it by sh ... [01:42:18] 'sh web-start' vs. 'webservice --backend kubernetes php7.3 start' : bit simpler that way [01:57:14] add a shebang and flip the executable bit and you can use `./web-start` instead, save you a whole character [01:59:32] * AntiComposite needs to do that, typing out or reverse i-searching for `kubectl delete jobs/anticompositebot.harvcheck` and `kubect apply --validate=true -f harvcheck_job.ymal` gets a bit tedious [02:00:41] (protip if you didn't know/remember: you can press Crtl-R in bash to search your history in reverse chronological order) [02:00:42] incidentally I tried that, but it didn't seem to work - does it specifically require the current directory prefix? [02:01:34] if that's the case it's rather pointless [02:01:51] `web-start` wouldn't work, since your current directory isn't on your PATH. the `./` at the beginning specifies that you mean the `web-start` in this directory [02:03:15] eh I'll just leave it as I have it set then, less hassle to type [13:59:15] Is there anything that can be done about ReFill being down (like restarting a service) or are we dependent on a developer to fix it? https://phabricator.wikimedia.org/T246378 [14:46:04] I got disconnected so am not sure if there were any responses to my post? Is there anything that can be done about ReFill being down (like restarting a service) or are we dependent on a developer to fix it? https://phabricator.wikimedia.org/T246378 [14:46:59] !help [14:46:59] CurbSafeCharmer: If you don't get a response in 15-30 minutes, please create a phabricator task -- https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=wmcs-team [14:54:40] CurbSafeCharmer: refill is down? I can take a look [14:55:51] Thanks bstorm_ - the tool responds, but gets stuck at "Waiting for an available worker". [14:56:18] Pod is running...but that might be a webservice instead of the custom deployment they use [14:57:11] Hmm...maybe that's over in refill-api [14:57:14] checking that [14:57:32] the api worker was breaking things in the past, which is why I know all this lol [14:57:51] :) [15:00:22] Yup, that's it. I see a normal webservice rather than the custom thing [15:00:32] CurbSafeCharmer: Happy to hear bstorm_ is helping [15:00:35] I'll have to edit it for the new cluster [15:11:37] heh, this is going to need a quota bump to run [15:15:13] !log tools.refill-api bumped cpu quota limit to 3 [15:15:14] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.refill-api/SAL [15:16:54] !log tools.refill-api reconfigured deployment to work according to the setup of the 2020 kubernetes cluster [15:16:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.refill-api/SAL [15:18:51] !log tools.refill-api increasing max cpu per container to 2 [15:18:52] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.refill-api/SAL [15:23:15] CurbSafeCharmer_: looks like it is coming up now... [15:25:20] Hrm. Lot of pickle errors in the log, but a log is more than it had before [15:26:45] I think it's all set. That looks like programming issues and text parsing, not backend stuff [15:30:15] !log tools.refill-api the correct deployment file is now refill.yaml. The last version is refill-old.yaml (which will certainly not work). [15:30:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.refill-api/SAL [15:42:54] Thanks very much @bstorm_ [15:43:10] np [16:37:14] !log tools.stewardbots Add Urbanecm as maintainer [16:37:16] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [16:54:46] !log admin [codfw1dev] deleted python3-os-ken debian package in cloudnet2003-dev which was installed by hand and had depedency issues [16:54:48] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [17:06:38] !log tools.zppixbot Hard restart of webservice. Running Pod and requiested version in service.manifest did not match. [17:06:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.zppixbot/SAL [17:12:07] Im confused by the zppixbot log msg what exactly does it mean? [17:13:10] Zppix: Someone fixed your broken web tool [17:13:31] Zppix: something strange had happened to your webservice there. The $HOME/service.manifest file said it should be running on PHP 7.3, bu the actual container that was running was PHP 5.6 [17:13:43] I have no idea how that happened. [17:14:15] Maybe when i migrated it to the 2020 cluster it started ignoring the service.manifest bd808 [17:15:23] My guess was that maybe someone edited the service.manifest and then ran `webservice restart` thinking that would change things. Pro tip: it will not [17:15:45] I saw mahmoud-montage confuseed in the backscroll when I logged on today about that same thing [17:16:18] service.manifest is really a log of the last `webservice ... start` state and nothing more [17:17:25] bd808: possibly idk [19:20:52] !log tools.stewardbots Kill sulwatcher temporarily to test something [19:20:54] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:24:46] !log tools.stewardbots Start sulwatchers from public_html rather than from non-gity folder [19:24:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:26:25] !log tools.stewardbots Use Py3 venv in ~/bigbrother.sh [19:26:26] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:26:57] !log tools.stewardbots tools.stewardbots@tools-sgebastion-07:~$ mv git git.bak [19:26:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:30:14] !log tools.stewardbots tools.stewardbots@tools-sgebastion-07:~$ rm ~/core [19:30:15] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:33:36] !log tools.stewardbots Make ~/StewardBot/restart_stewardbot.sh use the correct values (add public_html to the path) [19:33:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:37:37] !log tools.stewardbots tools.stewardbots@tools-sgebastion-08:~$ rm backup_tutti_i_db.sql # empty file [19:37:38] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [19:38:01] !log tools.stewardbots tools.stewardbots@tools-sgebastion-08:~$ rm ~/tuttiidb.sql # empty file [19:38:02] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [20:32:26] !log Adding uncommited live hacks to /hat-web-tool/delete.php to make it work [20:32:26] Urbanecm: Unknown project "Adding" [20:32:32] !log tools.stewardbots Adding uncommited live hacks to /hat-web-tool/delete.php to make it work [20:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [20:47:33] !log tools.stewardbots Undo, now pushed to gerrit ready to review [20:47:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [21:06:55] !log tools.stewardbots Move ~/public_html to ~/stewardbots and symlink ~/public_html to ~/stewardbots (T246702) [21:06:58] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [21:06:58] T246702: Make tools.stewardbot look clear to a newbie - https://phabricator.wikimedia.org/T246702 [21:12:52] !log tools.stewardbots tools.stewardbots@tools-sgebastion-08:~$ rm -rf ~/StewardBot/ ~/SULWatcher/ (T246702) [21:12:55] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [21:12:55] T246702: Make tools.stewardbot look clear to a newbie - https://phabricator.wikimedia.org/T246702 [21:18:39] !log tools.stewardbots Move several files to ~/00_backup; exact list at task (T246702) [21:18:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [21:18:42] T246702: Make tools.stewardbot look clear to a newbie - https://phabricator.wikimedia.org/T246702 [21:29:43] !log tools.stewardbots tools.stewardbots@tools-sgebastion-08:~$ rm -rf data # empty folder [21:29:45] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [21:31:03] !log tools.stewardbots tools.stewardbots@tools-sgebastion-08:~$ rm -rf conf # T246702 [21:31:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [21:31:05] T246702: Make tools.stewardbot look clear to a newbie - https://phabricator.wikimedia.org/T246702 [21:31:50] !log tools.stewardbots Shut SULWatcher down for maintenance [21:31:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stewardbots/SAL [22:26:36] !log tools starting first pass of elasticsearch data migration to new cluster T236606 [22:26:40] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:26:40] T236606: Rebuild Toolforge elasticsearch cluster with Stretch or Buster - https://phabricator.wikimedia.org/T236606 [23:28:44] !log tools.pagepile-visual-filter deployed eeac67f247 ( improvement) [23:28:46] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.pagepile-visual-filter/SAL