[01:03:22] hmm wikibugs keeps quitting. I thought it has an anti flood feature? [01:20:58] paladox: I'm not sure what's wrong tbh. It's lagging a lot [02:10:32] PROBLEM - Puppet errors on tools-exec-1438 is CRITICAL: CRITICAL: 70.00% of data above the critical threshold [0.0] [02:45:29] RECOVERY - Puppet errors on tools-exec-1438 is OK: OK: Less than 1.00% above the threshold [0.0] [03:47:50] PROBLEM - Puppet errors on tools-exec-1418 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [0.0] [04:13:39] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [04:17:52] RECOVERY - Puppet errors on tools-exec-1418 is OK: OK: Less than 1.00% above the threshold [0.0] [04:50:08] bd808, are you there? [04:53:40] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [06:32:01] PROBLEM - Puppet errors on tools-exec-1434 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [06:48:14] 10Labs, 10Tool-Labs: Tool Labs 2017-06-29 Labstore100[45] kernel upgrade issues - https://phabricator.wikimedia.org/T169289#3393676 (10madhuvishy) Incident documentation - https://wikitech.wikimedia.org/wiki/Incident_documentation/20170629-Labstore_Kernel_Upgrade [07:12:00] RECOVERY - Puppet errors on tools-exec-1434 is OK: OK: Less than 1.00% above the threshold [0.0] [08:30:50] RECOVERY - Puppet errors on tools-exec-1433 is OK: OK: Less than 1.00% above the threshold [0.0] [11:21:59] PROBLEM - Puppet errors on tools-exec-gift-trusty-01 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [11:25:39] 10Labs, 10Operations, 10Puppet: Create a cron to clean clientbucket every day or hour - https://phabricator.wikimedia.org/T165885#3279509 (10Kelson) Yes, please. My multiple labs instances run out of space in /var and this basically blocks "dpkg". As non-puppet expert, this took me a bit of time to figure ou... [11:36:55] 10Quarry, 10AutoWikiBrowser, 10WorkType-NewFunctionality: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#3397982 (10Josve05a) [11:37:04] 10Quarry, 10AutoWikiBrowser, 10WorkType-NewFunctionality: Quarry run result in AWB make list - https://phabricator.wikimedia.org/T134141#2255546 (10Josve05a) [11:51:59] RECOVERY - Puppet errors on tools-exec-gift-trusty-01 is OK: OK: Less than 1.00% above the threshold [0.0] [12:48:22] My bot is now on Wikitech. [12:48:47] I am able to run it from the terminal, it is working fine. [12:49:13] I need to type on command to run the application. [12:49:20] But how do I automate it? [12:50:56] acagastya1: with 'on wikitech', do you mean Tool Labs or PAWS? [12:51:09] Tool Labs. [12:52:10] Ok, and is this a bot that should run continuously or every X period of time (e.g. every day) [12:52:52] It should be always run. [12:53:26] acagastya1: ok, then please take a look at https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Grid#Submitting_continuous_jobs_.28such_as_bots.29_with_.27jstart.27 for how to set that up [12:53:43] Okay. [12:58:41] valhallasw`cloud: So after a casual reading, it looks like jstart commans is to be used, which will run the binary file. Is that correct? [12:58:53] Correct. [12:59:33] And then I can logout from the terminal, and close the terminal, and the program will run? [13:01:03] Yes, it will run on the grid in the background [13:01:15] output will be written to files, .out and .err [13:01:29] Okay. [13:02:51] 10Labs, 10DBA: Set up replication for kbp.wikipedia.org - https://phabricator.wikimedia.org/T169431#3398147 (10valhallasw) [13:02:57] So, the things I have done so far: log in to tools-bstation, clone the git repository of the application, found where the binary file is located, test it, and `jstart --help`. [13:03:30] Did I miss something that I should have done? [13:03:55] acagastya1: jstart commandname [13:04:13] (you have to tell jstart /what/ you want to start) [13:04:19] No, that is what I will do, after reading about the parameters. [13:04:37] Ah. No, normally that should be it [13:04:48] in some cases you might need to use -mem to reserve more memory for the job [13:05:05] Yes, I was about to ask you about memory. [13:06:17] The gpy bot used to run out of memory, and quit IRC. It is supposed to run continuously. -- So ho do I know if I should use it or not? [13:07:15] acagastya1: https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Grid#Returning_the_status_of_a_particular_job [13:07:51] uh wait [13:08:03] that section is a bit chaotic [13:08:11] the answer is 'use qacct to get information about the last job run' [13:08:41] it will show you maxvmem (maximum virtual memory used) -- if this is 256M or close, the job was likely killed because of that [13:09:04] in the jobname.err, you'd typically see a message like 'Killed', or 'KeyboardInterrupt' [13:09:29] Okay. So I will not come to know until it crashes. [13:10:40] But since this bot should run always, it will start after it runs out of memory, since exit status was not zero? [13:10:54] *start => restart [13:11:11] No -- the memory limits are enforced on a higher level, and will also kill the restart script [13:12:09] Oh, then I will have to jstart it again, after seeing the memory usage. But what if the memory was, let's say 300M? Can I ask for more than 256M? [13:12:20] Yes, with -mem you can ask for any amount [13:12:30] but large amounts can cause the job to take a long time to schedule/start [13:13:01] Okay. [13:14:06] But, I am going to format my laptop this month, to install other OS, and I would like new pgp keys. That means I can not login to the tools bstation. [13:14:24] Is there a way to change the keys? [13:14:50] I assume you mean SSH keys? [13:14:56] You can upload new SSH keys on https://toolsadmin.wikimedia.org/ [13:15:03] or you can backup your existing private keys [13:18:02] I want to ditch this OS, and would like new keys. Since it is possible, as you say (SSH keys, my bad), I will use the new one. [13:19:47] valhallasw`cloud: `jstart -continuous ./node_modules/.bin/teleirc` would be okay? [13:20:03] I am not specifying memory for now. [13:20:12] Yes, although -continuous is not necessary if you use jstart [13:22:27] Okay, it says the job is submitted. [13:22:38] you can check the status with `qstat` [13:23:32] Well, that did not involve most of the things listed in https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#node.js_web_services [13:27:22] valhallasw`cloud: `qstat` says the state is r, (r for running?) and submit/start at 13:22:01. [13:27:36] I assume it is UTC. [13:28:57] Well, it is 1427 UTC and it is not working. [13:29:17] acagastya1: take a look at the log files [13:30:58] 134. [13:35:07] '134'? [13:38:38] Exit status 134. [13:38:54] and what's in the logs of the command? [13:39:00] "/mnt/nfs/labstore-secondary-tools-project/enwnbot/node_modules/teleirc/bin/teleirc exited with code 134" [13:39:01] i.e. commandname.log / commandname.err? [13:39:09] FATAL ERROR: v8::Context::New() V8 is no longer usable [13:39:22] Ok, that sounds like not enough memory. [13:40:24] But the directory was `.bin`, while that path in the error log omits the `.` [13:40:44] That shouldn't happen. No? [13:40:46] the .bin path is probably a symlink [13:44:29] valhallasw`cloud: Was I supposed to do these things also? https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Web#node.js_web_services [13:44:41] No. That is about web services, not about bots. [13:45:20] Yes, but my application is based on NodeJS. [13:46:23] And my node application is located in node_modules directory. [13:49:36] Yes, but it's not a webservice. So that section is not relevant. [13:55:08] Then I should run `qacct`? [14:00:32] As I mentioned, the V8 error message indicates you have not requested enough memory [14:02:56] 10Labs, 10DBA: Set up replication for kbp.wikipedia.org - https://phabricator.wikimedia.org/T169431#3398186 (10Peachey88) [14:02:58] 10Labs, 10DBA: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3398189 (10Peachey88) [14:03:33] 10Labs, 10DBA: Set up replication for kbp.wikipedia.org - https://phabricator.wikimedia.org/T169431#3398147 (10Peachey88) The task for the DBAs to do their side of things for replication got stalled and looks like they never got repinged. [14:07:38] 10Labs, 10DBA: Prepare and check storage layer for kbp.wikipedia.org - https://phabricator.wikimedia.org/T160869#3398194 (10Peachey88) 05stalled>03Open [16:27:07] PROBLEM - Puppet errors on tools-exec-1407 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [16:33:20] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1402 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [0.0] [17:02:08] RECOVERY - Puppet errors on tools-exec-1407 is OK: OK: Less than 1.00% above the threshold [0.0] [17:08:22] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [18:37:45] 10Labs, 10Tool-Labs: Tool Labs 2017-06-29 Labstore100[45] kernel upgrade issues - https://phabricator.wikimedia.org/T169289#3398463 (10bd808) [18:49:09] valhallasw`cloud: thanks for helping acagastya1. The part they didn't convey to you well is that they want to run a continuous k8s job for that bot. Roughly following https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Kubernetes#Kubernetes_continuous_jobs [18:49:30] I will let you know when I see acagastya1 around here [18:49:30] @notify acagastya1 [19:32:29] !log tools Restarted maintain-kubeusers on tools-k8s-master-01 [19:32:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:40:00] !log tools Disabled puppet on tools-k8s-master-01 to try and fix maintain-kubeusers [19:40:05] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [19:44:41] PROBLEM - Puppet errors on tools-worker-1007 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [19:46:28] bd808: I see [19:46:43] is that something we suggest in general for nodejs tools? [19:47:15] I'd say yes. The version of node we have on trusty is really ancient [19:47:25] Ah, right [19:48:12] running custom jobs on k8s is a bit harder than jstart but really seems to be pretty stable [20:25:09] ImportError: /data/project/ci/venv/lib/python3.4/lib-dynload/_ctypes.cpython-34m-x86_64-linux-gnu.so: undefined symbol: _PyTraceback_Add [20:25:12] weird [20:28:58] hmm [20:29:12] bd808, valhallasw`cloud: any ideas what's wrong? https://paste.fedoraproject.org/paste/ZYzlPYL3OgyRyeBscn8mGA/raw this is brand new venv I created [20:30:02] It seems like we've seen that before... this is a trusty py3 venv? [20:30:11] yes [20:31:15] https://stackoverflow.com/questions/33223713/python-ctypes-import-error-in-virtualenv [20:31:46] ... time to move this work to Kubernetes? [20:31:53] "It appears that somehow you're using a 3.4.3+ build of the _ctypes extension module with an older version of Python 3.4." [20:32:02] k8s supports cronjobs now? [20:32:04] :) [20:32:10] well.... [20:32:48] there is a scheduler in it. I'm not sure what I would trust it for [20:33:09] https://phabricator.wikimedia.org/source/tool-ci/browse/master/build_table.py is the script, it's pretty simple [20:34:16] I mean, is every py3 venv suddenly broken now if it tries to use _ctypes? [20:34:40] it really could be [20:35:22] you could run it as a continuious job by following https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Kubernetes#Kubernetes_continuous_jobs or experiment with https://wikitech.wikimedia.org/wiki/Help:Tool_Labs/Kubernetes#Kubernetes_cronjobs [20:35:44] we probably should figure this trusty problem out though [20:36:54] if I don't upgrade pip, it doesn't import _ctypes so it works [20:37:00] (venv3)tools.ci@tools-bastion-03:~$ pip --version [20:37:01] pip 1.5.4 from /mnt/nfs/labstore-secondary-tools-project/ci/venv3/lib/python3.4/site-packages (python 3.4) [20:37:33] fun. so you either get wheels or things that work [20:54:41] RECOVERY - Puppet errors on tools-worker-1007 is OK: OK: Less than 1.00% above the threshold [0.0] [20:58:07] 10Labs, 10Tool-Labs, 10Tools-Kubernetes: Move kubernetes authentication to using X.509 client certs - https://phabricator.wikimedia.org/T144153#3398604 (10bd808) I chatted with @yuvipanda about this on irc a bit while wondering out loud how to detach the Kubernetes master from NFS which came up while dealing... [23:26:22] 10Labs, 10Tool-Labs, 10Tools-Kubernetes: newer npm for nodejs Kubernetes instances - https://phabricator.wikimedia.org/T169451#3398652 (10bd808) [23:27:14] 10Labs, 10Quarry: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3398666 (10yuvipanda) [23:30:08] 10Labs, 10Quarry: Consider moving Quarry to be an installation of Redash - https://phabricator.wikimedia.org/T169452#3398678 (10yuvipanda) You'd have to do some amount of proxy magic to get it to work with mediawiki auth. We should have an authenticator running as a separate app. The proxy should check for a c...