[00:37:39] Anyone able to force qdel 56453 [00:46:52] Betacommand: done [00:47:48] gtirloni: thanks [00:48:11] I hate zombie jobs [00:48:52] +1 [00:49:39] Noticed that several jobs had been hanging around longer than they should have so I tried to kill them. That job wouldnt die [00:49:50] !log admin nfs-exportd interval changes from 60 to 300s (T217086) [00:49:53] gtirloni: Failed to log message to wiki. Somebody should check the error logs. [00:49:53] T217086: Investigate why the new Son of Grid Engine grid landed in a worse state when NFS was filled than the old Sun Grid Engine grid did - https://phabricator.wikimedia.org/T217086 [00:50:25] gtirloni: its been throwing those errors all day [00:50:45] Betacommand: sometimes the grid master loses sync with what's running on grid nodes and that could happening.. specially if NFS misbehaved, a grid node had to be forcefully restarted.. then when it comes back, the master doesn't seem to recover well from that [00:51:22] gtirloni: I was referring to stashbot [00:52:28] oh yeah, there have been some changes to Gerrit regarding how it does OAuth and that broke stashbot.. people are actively working on a fix. Meanwhile we've been !log'ging things here just to keep track of them,at least on IRC logs. Posting to Phab tasks continues to work so it's only the SAL records that are being impacted for now [00:52:41] s/Gerrit/Gerrit+Wikitech [00:59:17] gtirloni: https://tools.wmflabs.org/sal/tools is also working, no mangling IRC logs in the future [01:00:28] ah cool, thanks [01:00:30] Or https://tools.wmflabs.org/sal/admin in that case [01:01:10] so SAL is working, that's nice. I was wrong [01:01:20] but yeah, that error will continue until tomorrow or so I guess [01:03:41] You're right. It should also go to Wikitech, and that's the broken part [01:13:44] ok, that makes sense now.. SAL != wikitech page with SAL logs [01:14:59] bed time, g'night all [01:15:24] * gtirloni leaves wishes of NFS behaving :) [01:34:55] if things are not fixed before trusty dies we will loose a lot of stuff [01:39:00] Betacommand, what things? [01:39:36] Krenair: a quite a few tools have yet to be migrated [01:39:55] Betacommand, what things? [01:39:57] oops [01:40:15] Krenair: https://tools.wmflabs.org/trusty-tools/ [01:40:53] well, the maintainers are listed :) [01:41:33] Krenair: they are also getting nag emails too [01:46:19] it's not like the scripts will be deleted or anything -- they just won't run [01:46:32] inconvenient but not catastrophic [01:55:48] yeah [01:55:56] if the tool is not getting maintained it should probably not be running [09:36:09] hello! [09:36:19] I can't login to horizon [09:36:36] don't know if this is a known issue [09:39:56] I just checked for my account, same issue as onimisionipe ("An error occurred authenticating. Please try again later.") [09:50:26] This was announced onimisionipe gehel [09:50:42] https://lists.wikimedia.org/pipermail/cloud-announce/2019-March/000146.html [09:50:49] missed the announcement, but thanks! [09:51:08] ah.. Ok [09:55:25] horizon login doesn't work right now due to Oauth in wikitech being out of service [10:51:00] Funny! de.wikipedia.org is blacked out today. Okay. So when I am logged in, all other languages show that black screen too. When I am not logged in, I can read/edit other languages. Somehow strange, isn't it? [10:52:08] I guess it is an unintended effect of the blackout js [10:53:12] you can hide centralnotice as well via a css hack and edit finely [10:54:49] I know, just too lazy [10:55:12] the black screen/banner is probably due to setting the interface language to de [10:55:23] However: the rumor that IPs are second class users is proven wrong by this side effet ;^) [10:59:12] Just reading an IP which has the effect of a black non-german language too? Strange! [11:15:33] Much easier than css-hack: Change the interface language to something different than German and you can read all. [13:49:10] !log admin converted openstack cronjobs to systemd timers (T210818) [13:49:13] gtirloni: Failed to log message to wiki. Somebody should check the error logs. [13:49:13] T210818: Move admin cron jobs to systemd timers - https://phabricator.wikimedia.org/T210818 [14:00:37] Seems that NFS is pretty busy :-( [14:07:48] hey [14:24:09] do we know who to blame about NFS? asking for a friend O:-) [14:27:11] Sun Microsystems [14:28:42] :-D [14:28:51] might be some bzip2 running on sgebastion-07 … might be something different [14:35:40] I'm running a relatively expensive migration on a SQL db, I assume that cannot affect NFS? If so I can stop it. [14:36:11] I dont think that would affect NFS but i (more than likely) wrong [14:36:25] pintoch: it can not affect NFS, but depends on how you'd do it [14:36:57] it's a Python process sending SQL queries from a kubernetes pod [14:37:03] * chicocvenancio has seen sql dumps to filesystem to change tables [14:37:25] pintoch: you should be fine unless you're writing files [14:37:44] great, that's fine then, it's all in-house [15:18:42] NFS seems to be better now, thank you to anyone who could be even remotely responsible for that :) [15:47:35] Jepp! bzip2 does not run anymore :-) [17:08:42] Whats wrong with horizon? [17:13:21] ok, I'm sending another email [17:20:45] https://lists.wikimedia.org/pipermail/cloud-announce/2019-March/000148.html Zppix [17:24:14] arturo: ugh what isnt affected by this vandalism [17:24:23] This is starting to be annoying [17:25:25] let's not give them the pleasure :) [17:25:51] I know... im trying [17:27:36] yeah, but it's indeed pretty annoying. no doubt. [17:39:16] !log tools.wikiloves Stopping webservice for Trusty migration (T216365) [17:39:20] JeanFred: Failed to log message to wiki. Somebody should check the error logs. [17:39:21] T216365: Upgrade wikiloves to Strech - https://phabricator.wikimedia.org/T216365 [17:43:28] thanks JeanFred :-) [17:57:41] !log tools.stashbot Test [17:57:42] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.stashbot/SAL [17:58:22] bd808: will stashbot log the stuff that was failing before it was fixed or no? [17:58:27] onwiki that is [17:59:24] Zppix: not automatically, no. We have data in the elasticsearch data store, but I do not have a guess about when or if that will all be recorded to the wiki pages [17:59:39] ok [17:59:52] * bd808 has never scripted this and not done it manually for many many months [18:01:01] !log tools.wikiloves Deleted virtualenv, recreated in interactive container shell, reinstalled requirements, restarted webservice (T216365) [18:01:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiloves/SAL [18:01:03] T216365: Upgrade wikiloves to Strech - https://phabricator.wikimedia.org/T216365 [18:17:50] !log git Rebooting gerrit-test.git and gerrit-mysql.git to try to troubleshoot issue with Icinga2 not properly executing checks suddenly [18:17:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Git/SAL [18:50:58] !log tools.wikiloves Deleted crons from old Bastion, restored them on new Bastion (T216365) [18:51:01] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.wikiloves/SAL [18:51:01] T216365: Upgrade wikiloves to Strech - https://phabricator.wikimedia.org/T216365 [18:58:16] so... [18:58:22] what do i do against this: [18:58:23] terminate called after throwing an instance of 'std::runtime_error' what(): locale::facet::_S_create_c_locale name not valid [18:58:41] old ubuntu image.. [18:59:19] i tried apt-get install locales; locale-gen en_US.UTF-8; update-locale LANG=en_US.UTF-8; reboot as advised in various places online, but it keeps happening [19:03:34] is !_log working again? :) [19:04:15] Yup [19:04:18] it never stopped, just the wikitech part (/me checks if the wikitech oauth thing is fixed) [19:04:26] it is :P [19:07:31] great [19:09:01] cool, congrats to the folks who made it happen [19:11:00] Reedy: what was the page to add/remove a member to/from a tools project? [19:11:05] on wikitech I mean [19:11:09] it was a special page [19:11:27] https://wikitech.wikimedia.org/wiki/Special:NovaProject [19:14:01] nope, that's not it [19:15:26] https://wikitech.wikimedia.org/w/index.php?title=Special:NovaRole&projectid=stewardbots <-- not this either [19:15:27] sigh [19:15:37] I had a link somewhere [19:15:40] *fetches* [19:15:42] https://toolsadmin.wikimedia.org/tools/ [19:16:23] auth doesn't seem to work there yet though. [19:18:03] yup, that's why I want to do it via wikitech as in the past [19:18:16] but I cannot find the page where I used to do it [19:19:31] I think that is not possible anymore [19:27:47] that sounds like a security problem [19:27:55] it must be possible to log in somewhere and remove a user [19:28:30] well, sure. Toolsadmin would be that place [19:28:46] I thought you can't log in there [19:29:03] and it is currently inacessible. removing someone could be done via LDAP probably, if you have an emergency [19:30:46] true [19:32:16] !log admin restarting keystone on cloudcontrol1003 [19:32:20] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Admin/SAL [19:34:20] it was some NovaSomething page [19:36:12] I vaguely remember studying these flows last year and I think tool membership only has one self-serviceable point (toolsadmin) [19:48:55] !log tools.bambots Tool is non-responsive, notified maintainer on his talk page [19:48:57] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools.bambots/SAL [19:58:54] !log wmflabsdotorg Set proxy-eqiad.wmflabs.org to CNAME to proxy-eqiad1.wmflabs.org T218938 [19:58:56] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Wmflabsdotorg/SAL [19:58:57] T218938: designate_floating_ip_ptr_records_updater is critical - https://phabricator.wikimedia.org/T218938 [21:26:00] !log tools T217280 cleared error state from a couple queues and rebooted tools-sgeexec-0901 and 04 to clear other issues related [21:26:03] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:26:04] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 [21:51:04] !log tools T217280 rebooted and cleared "unknown status" from tools-sgeexec-0909 after depooling [21:51:08] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [21:51:08] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280 [21:53:31] !log tools T217280 rebooted and cleared "unknown status" from tools-sgeexec-0914 after depooling [21:53:33] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [22:07:33] bd808: The last Puppet run was at Thu Mar 21 19:23:16 UTC 2019 (163 minutes ago). [22:07:33] <-- looks quite some time ain't it? [22:07:53] tools-sgebastion07 via login.tools.wmflabs.org [22:08:28] its not horrible, but its long enough to indicate some problem in the puppet manifests [22:10:46] filed T218959 [22:10:46] T218959: Puppet runs failing on tools-sgebastion-07.tools.eqiad.wmflabs - https://phabricator.wikimedia.org/T218959 [22:11:57] thanks [22:12:22] instance.pp is mentioned there but probably the error is elsewhere [22:12:25] as usual [22:33:47] Die chwule Sa [22:33:50] Sau [22:34:00] !sortier [22:34:08] !geöllform [22:34:16] Ui [22:34:26] Wrong chat [22:45:35] hauskatze: heh. This time I think it was in instance.pp [23:16:36] bd808: now that you're online, is it still possible to add/remove tool maintainers via wikitech? [23:16:57] I remember it being possible via some sort of Special:NovaSomething page [23:17:01] in the past [23:18:49] hauskatze: not maintainers of a specific tool, no. That functionality moved to https://toolsadmin.wikimedia.org/ [23:19:06] ack [23:28:12] !log tools T217280 depooled, reloaded and repooled tools-sgeexec-0938 [23:28:17] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Tools/SAL [23:28:18] T217280: LDAP server running out of memory frequently and disrupting Cloud VPS clients - https://phabricator.wikimedia.org/T217280