[00:06:16] !log tools.wikidata-exports Restarted webservice. Stuck in CrashLoopBackOff due to T196589 [00:06:18] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikidata-exports/SAL [00:06:18] T196589: Some kubernetes webservices stuck in CrashLoopBackOff after cluster restart - https://phabricator.wikimedia.org/T196589 [00:11:26] !log tools.wikisource-penguin-classics Moved webservice from grid to kubernetes after deleting orphan kubernetes deployment from a year ago [00:11:27] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikisource-penguin-classics/SAL [00:13:05] !log tools.wm-commons-emoji-bot Stopped webservice. In infinite restart loop with message "re-evaluating native module sources is not supported. If you are using the graceful-fs module, please update it to a more recent version." [00:13:05] bd808: Unknown project "tools.wm-commons-emoji-bot" [00:13:21] oh really stashbot ? [00:17:14] !log tools.toolchecker `kubectl delete deploy/toolchecker` to remove what looks to be an orphan Kubernetes deployment [00:17:14] bd808: Unknown project "tools.toolchecker" [00:17:21] more lies stashbot [00:19:50] !log tools.stashbot Restarted bot due to lost LDAP connection [00:19:53] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [00:19:56] !log tools.wm-commons-emoji-bot Stopped webservice. In infinite restart loop with message "re-evaluating native module sources is not supported. If you are using the graceful-fs module, please update it to a more recent version." [00:19:57] bd808: Unknown project "tools.wm-commons-emoji-bot" [00:20:03] !log tools.toolchecker `kubectl delete deploy/toolchecker` to remove what looks to be an orphan Kubernetes deployment [00:20:03] bd808: Unknown project "tools.toolchecker" [00:20:26] * bd808 is confused [00:39:22] plurals? [00:51:26] Hi. [00:52:19] Is tools.toolchecker unrelated to http://tools.wmflabs.org/checker/ ? [02:22:42] hi! which version of lighttpd are we using? can't find it anywhere [02:24:18] I guess it'd depend what image you're using... [02:25:19] Reedy: no idea. how do I check? [02:29:52] Just using webservice start in a tool? [02:30:33] As if so... that's running trusty? So.. 1.4.33-1+nmu2ubuntu2 presumably [02:32:18] Reedy: it's already running. I stopped and started it again but it doesn't show any version info [02:32:34] did you just do `webservice start` ? [02:32:58] yes [02:33:22] So, it'll be running on gridengine... Which is using an ubuntu 14.04/trusty image... So should be lighttpd 1.4.33 [02:33:23] just said "Starting webservice." and that was it [02:33:39] Reedy: awesome, thank you [04:40:15] chicocvenancio: thanks for taking care of it [06:14:14] (03CR) 10jenkins-bot: Localisation updates from https://translatewiki.net. [labs/tools/heritage] - 10https://gerrit.wikimedia.org/r/437903 (owner: 10L10n-bot) [08:59:05] (03CR) 10MarcoAurelio: [C: 032] build: Updating mediawiki/mediawiki-codesniffer to 19.0.0 [labs/tools/stewardbots] - 10https://gerrit.wikimedia.org/r/436936 (owner: 10Libraryupgrader) [11:01:27] !log tools T196137 force rotate all exim panilog files to avoid rootspam `aborrero@tools-clushmaster-01:~$ clush -w@all 'sudo logrotate /etc/logrotate.d/exim4-paniclog -f -v'` [11:01:30] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [11:01:30] T196137: toolforge: prometheus issue is filling up email queue - https://phabricator.wikimedia.org/T196137 [14:01:12] paladox, your issue with phab-tin is resolved? [14:04:26] andrewbogott: yep [14:04:30] Thanks [14:04:35] great [16:16:33] Ursula: Yes, tools.toolchecker is a separate thing. It is actually a monitoring tool that the Cloud Services team uses to test if we can start/stop jobs in Toolforge. [17:02:09] (03PS1) 10Arturo Borrero Gonzalez: openstack: eqiad1: add fake hiera keys [labs/private] - 10https://gerrit.wikimedia.org/r/438052 [17:02:41] (03CR) 10Arturo Borrero Gonzalez: [C: 032] openstack: eqiad1: add fake hiera keys [labs/private] - 10https://gerrit.wikimedia.org/r/438052 (owner: 10Arturo Borrero Gonzalez) [17:02:44] (03CR) 10Arturo Borrero Gonzalez: [V: 032 C: 032] openstack: eqiad1: add fake hiera keys [labs/private] - 10https://gerrit.wikimedia.org/r/438052 (owner: 10Arturo Borrero Gonzalez) [17:11:20] (03PS1) 10Arturo Borrero Gonzalez: openstack: eqiad1: add more hiera keys placeholders for passwords [labs/private] - 10https://gerrit.wikimedia.org/r/438053 (https://phabricator.wikimedia.org/T196633) [17:14:34] (03CR) 10Arturo Borrero Gonzalez: [V: 032 C: 032] openstack: eqiad1: add more hiera keys placeholders for passwords [labs/private] - 10https://gerrit.wikimedia.org/r/438053 (https://phabricator.wikimedia.org/T196633) (owner: 10Arturo Borrero Gonzalez) [17:15:01] Anyone know how/when https://tools.wmflabs.org/cdnjs/ is updated/synced? Curious when TensorFlow.js will become available (already in upstream: https://github.com/cdnjs/cdnjs/commit/9215233f06f541a97daff45e133c70b24a1f43a6) [17:19:48] bearloga: bstorm_ probably knows best :) [17:20:03] It is daily [17:21:18] bearloga: one thing, though, it is not synced from the github repo anymore. It is synced from the cdnjs API. That means if there is any lag in getting it in the upstream API, it will be the daily run *after* that happens. [17:23:34] I'd expect changes to show up there around 7am UTC the following day. [17:23:43] bstorm_: thanks for clarifying! I'm still confused why TF.js isn't up yet even though it was added April 12th, so that initial version should be listed by now, right? Could there be a bug? [17:23:46] as long as they don't lag at getting them in their own API [17:23:58] Oh, so maybe that's the issue then? [17:24:08] Oh definitely :) There could be many things. [17:24:11] Although weird that their own API would be behind by like 2 months [17:24:39] I can take a look for any bugs or whatever. I haven't got notifications of failures of the job, but that doesn't mean there haven't been any interesting events. [17:26:19] bstorm_: thanks! That'd be great. I'll make a Phab task so we can continue this convo outside IRC :) what tag should I use? [17:26:32] I think we do have a problem, actually. [17:26:54] bearloga: Toolforge is a good tag here [17:27:01] chasemp: noted, thanks! [17:27:03] Sure, I'm not 100% sure what to tag it with, but if you mention me, I'll catch it [17:27:08] :) [17:34:25] Hello. I have a question. One of my tools, CopyPatrol (https://tools.wmflabs.org/copypatrol/) isn't functioning correctly after the reboots that happened yesterday. To be precise, it was looking at the database that another tool maintains and showing up that data in the interface. It's not doing that anymore. Any guesses for what could be happening here? [17:40:53] Niharika: I'm a bit distracted I'm sorry, but I don't understand the behavior you are describing. post reboots you cannot access the database of another tool from your tool where you could before? [17:41:21] there were a few happenings recently, one of them I believe is the database server's were rebooted but I can't figure out how that would effect this change [17:41:34] can you be more specific about what the other tool is etc? [17:42:39] chasemp: That's my guess. I'm working on adding more logging to see what's happening. The other tool is Eranbot which has the database `s51306__copyright_p` to store edits that are copyright violations. [17:42:51] The database is updating just fine. [17:43:09] I restarted the webservice in CopyPatrol but that didn't fix anything. [17:43:20] is the error permissions oriented? [17:43:46] bstorm_: does this make any sense to you^? [17:44:56] I don't see any errors yet so I'm adding more logging to figure it out. [17:45:22] * chasemp nods [17:45:25] Trying to reason through it... [17:45:43] ToolsDB was rebooted on Tuesday. [17:46:37] It stopped working after around 2018-06-06 18:18 UTC. [17:46:58] Ok, so that is around labvirt reboots, not DB reboots [17:48:04] How does CopyPatrol access data from Eranbot? Does it work through a db connection or some intermediary service? [17:48:45] bstorm_: Usual DB connection. [17:50:26] Niharika: is it Grid or k8s? [17:50:43] or VPS [17:51:24] chicocvenancio: I think it's Grid. [17:51:55] * chicocvenancio has some time waiting for monster docker builds/push [17:55:37] Niharika: its k8s [17:56:11] chicocvenancio: Ahhh. Does it have a different way to restart the service than `webservice start`? [17:57:23] no, it'd be `webservice --backend=kubernetes start` for the first time, if nothing mangles the service.manifest `webservice start` should work after that [17:58:15] It's been running for a year now, so definitely not the first time. [17:58:38] I did restart the webservice and it says it's running. [17:59:14] and the service.manifest is valid, I ussually leave the `--backend=kubernetes` for readability though [17:59:49] Ah. [18:01:13] error.log seems very unhelpful indeed [18:08:13] Niharika: Using `sql tools` as the copypatrol user I can see 2 tables in the s51306__copyright_p: copyright_diffs and wikiprojects [18:08:36] bd808: Right. We're accessing the first one. [18:08:37] does copypatrol write to that shared db? [18:08:41] It does. [18:09:08] and is that what is failing? [18:09:46] bd808: logs do not indicate anything helpful [18:09:54] I think so. Nothing in the logs. [18:10:10] `show grants` includes GRANT ALL PRIVILEGES ON `s51306\_\_copyright\_p`.* TO 's53018'@'%' [18:10:23] which looks correct [18:11:08] yeah, I guess s53018 is CopyPatrol [18:11:44] Oh, also, the same failure happens for Plagiabot which is the staging instance for CopyPatrol. [18:11:59] It works fine but new records don't show up in the interface. [18:12:22] Since old records do show up, it seems like it's able to read the DB. [18:12:55] I'm gonna add some logging (after my meeting) and get back to you all. [19:16:21] (03PS2) 10Rosalieper: [WIP]Added a script for Download of images. [labs/tools/Commons-twitter-bot] - 10https://gerrit.wikimedia.org/r/437067 (https://phabricator.wikimedia.org/T190163) [20:03:43] o/ musikanima.l fixed the issue. It was something wrong with Eranbot. Thanks for all your help! [20:04:36] !log shinken Enabling puppet and running on shinken-01. "The last Puppet run was at Wed Jun 6 15:58:12 UTC 2018 (1653 minutes ago). Puppet is disabled. silencing shinken during labvirt reboots" [20:04:37] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Shinken/SAL [20:04:41] andrewbogott, ^ [20:07:19] Krenair: thanks for noticing that [20:07:23] Notice: /Stage[main]/Shinken::Shinkengen/Exec[/usr/local/bin/shinkengen]/returns: executed successfully [20:07:23] Info: /Stage[main]/Shinken::Shinkengen/Exec[/usr/local/bin/shinkengen]: Scheduling refresh of Service[shinken] [20:07:23] Notice: /Stage[main]/Shinken/Service[shinken]/ensure: ensure changed 'stopped' to 'running' [20:07:23] Info: /Stage[main]/Shinken/Service[shinken]: Unscheduling refresh on Service[shinken] [20:08:12] np [20:08:31] Krenair: thanks for re-enabling